arungeek
Hacks and Tweaks (beta)



Programming

December 29, 2012
 

Installing Hadoop on Mac OSX Mountain Lion (Step by Step Instructions)

More articles by »
Written by: arunenigma
Tags: , , ,

elephant_rgb-380x285

Setting up a single node Apache Hadoop instance on OS X is pretty simple and much the same as on any other Linux/Unix machines, with a small bit of customer configuration. See here for official instructions. This tutorial provides a quick way of getting your OS X Hadoop instance up and running.

Java

Java is installed by default on OSX. As Hadoop 1.0.3 still recommends Java 6, it can be installed from the command line if desired. You can check the java version with the following command:

$ java -version

Configure SSH

Hadoop uses SSH access to manage its nodes. For our single node setup instance, we need to configure SSH access to our local machine for our Hadoop user. In OS X, this requires that the Remote Login option is enabled on the Sharing Preference.

First, we have to generate an SSH key for our user. From the command line, run the following command:

$ ssh-keygen -t rsa -P “”

Save the public and private keys to the default location. When the keys are generated, run the following command:

$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

The last step is to verify that the SSH is working correctly. To do this, run the following:

$ ssh localhost

This should result in a successful login. Repeat with your actual host name.
Downloading and Installing Hadoop

Hadoop may be downloaded from http://www.apache.org/dyn/closer.cgi/hadoop/common/. Select the 1.0.3 release; it will be named hadoop-1.0.3.tar.gz. I like to install Hadoop under /opt/hadoop, but this is a matter of preference; adjust the following to fit your preferences. If /opt doesn’t exist, create it:

$ sudo mkdir /opt

Now untar the Hadoop tar file:

$ sudo tar xvf ~/Downloads/hadoop-1.0.4.tar

Now create a symbolic link to hadoop:

$ sudo ln -s /opt/hadoop-1.0.3 hadoop

Finally change the ownership to your user:

$ sudo chown -R :staff hadoop hadoop-1.0.3

Configuring and Testing Hadoop

Once installed, there will be four configuration files that will need to be update. See the documentation for what these files can do
Configure: hadoop-env.sh

With your favorite editor, open the file /opt/hadoop/conf/config/hadoop-env.sh to make some environmental updates. Uncomment #JAVA_HOME and specify the command path to dynamically load your Java

# The java implementation to use. Required.

export JAVA_HOME=$(/usr/libexec/java_home)

Next, uncomment HADOOP_HEAPSIZE and make it 2000. Technically this is optional, but recommended.

# The maximum amount of heap to use, in MB. Default is 1000.

export HADOOP_HEAPSIZE=2000

mountain-lion-hero

 

Starting with OS X Lion, a bug was introduced that caused issues when working with the name node. The error typically shows up as:

“Unable to load realm info from SCDynamicStore”

To fix this issue, add the following to your hadoop-env.sh file:

export HADOOP_OPTS=”-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk”

To recap, the hadoop-env.sh should contain the following.

export HADOOP_OPTS=”-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk”
export JAVA_HOME=$(/usr/libexec/java_home)
export HADOOP_HEAPSIZE=2000

Configuring: core-site.xml

This file controls the default file system and where the temporary files are stored. Remember, the directories for the temp files must be writeable by the Hadoop user. Note, replace vader.local with your system’s host name or with localhost.

Configuring: hdfs-site.xml

Next, we need to configure HDFS. The hdfs-site.xml is used to configure HDFS itself. In this case, I specify for HDFS to only store one copy of the file and tell HDFS where to store its data.

Note, the dfs related directories must be writable by the Hadoop user.

Configuring: mapred-site.xml

Specify the job tracker location and also set the maximum map and reduce jobs. In this example, the max number of jobs is limited to 2, but can be changed depending on your system.

Initializing HDFS

We need to initialize HDFS before we can use it. This also verifies that the Hadoop user can access the directories. From /opt/hadoop, run the following command:

$ bin/hadoop namenode -format

You should see output like the following:

arun:hadoop prasath$ bin/hadoop namenode -format
12/10/03 21:03:59 INFO namenode.NameNode: STARTUP_MSG:

21:03:59 INFO common.Storage: Storage directory /opt/HDFS/name has been successfully formatted.

Shutting down NameNode at arun.local/192.168.1.13 ************************************************************/

This completes the setup. Now it is time to start up Hadoop and verify that it all works.
Starting Hadoop

From /opt/hadoop, run the following command to start all the Hadoop services.

$ bin/start-all.sh

You will see each service start; if there are no errors, continue on to run an example test job.
Find Pi to verify Hadoop

To test the installation, run the example Pi calculation job. Again, from /opt/hadoop run the following command: hadoop {your username}

$ bin/hadoop jar /opt/hadoop/hadoop-examples-*.jar pi 10 100

 

You should see output similar to:

 



About the Author

arunenigma
Computer Science Graduate Student @ Case Western Reserve University, Cleveland, USA



 
 

 
decompile

How to convert(decompile) .pyc to py files

How to convert *.pyc to *.py files ? A few days before, I  accidentally deleted all my Python code while renaming some files. Luckily I had the .pyc files in the working directory. After googling for ways to convert .pyc back ...
by arunenigma
 

 
 
vCard

Adding vCard to your blog, website or portfolio

What is a vCard? In a nutshell, its the industry standard for electronic business cards. You can attach them to emails or posts, just about anything. The great thing about a vCard is that it holds all the information that you...
by arunenigma
 

 
 
stevejobs_afterlife

Where is Steve Jobs ? Afterlife Revelation

A Buddhist temple in Thailand says it has pinpointed Steve Jobs’ whereabouts in the afterlife, and the late Apple co-founder is a mid-level angel living in a parallel universe (but still in California). Wat Phra Dhammakaya ma...
by arunenigma
 

 

 
github-round

Adding files to Repo from your computer

Creating a Repo and adding files into the Repo from your computer Things to do in Github Create a repo eg. Repo name is data-mining Things to do in your computer Create a folder with any name(e.g. gitty) on desktop with all you...
by arunenigma
 

 
 
java

Java Program To Multiply Two Matrices

This is a simple java program that teaches you for multiplying two matrix to each other. We are going to make a simple program that will multiply two matrix. Two dimensional array represents the matrix.  Now, make this progra...
by arunenigma
0