arungeek
Hacks and Tweaks



Programming

December 29, 2012
 

Installing Hadoop on Mac OSX Mountain Lion (Step by Step Instructions)

More articles by »
Written by: arunenigma
Tags: , , ,

elephant_rgb-380x285

Setting up a single node Apache Hadoop instance on OS X is pretty simple and much the same as on any other Linux/Unix machines, with a small bit of customer configuration. See here for official instructions. This tutorial provides a quick way of getting your OS X Hadoop instance up and running.

Java

Java is installed by default on OSX. As Hadoop 1.0.3 still recommends Java 6, it can be installed from the command line if desired. You can check the java version with the following command:

$ java -version

Configure SSH

Hadoop uses SSH access to manage its nodes. For our single node setup instance, we need to configure SSH access to our local machine for our Hadoop user. In OS X, this requires that the Remote Login option is enabled on the Sharing Preference.

First, we have to generate an SSH key for our user. From the command line, run the following command:

$ ssh-keygen -t rsa -P “”

Save the public and private keys to the default location. When the keys are generated, run the following command:

$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

The last step is to verify that the SSH is working correctly. To do this, run the following:

$ ssh localhost

This should result in a successful login. Repeat with your actual host name.
Downloading and Installing Hadoop

Hadoop may be downloaded from http://www.apache.org/dyn/closer.cgi/hadoop/common/. Select the 1.0.3 release; it will be named hadoop-1.0.3.tar.gz. I like to install Hadoop under /opt/hadoop, but this is a matter of preference; adjust the following to fit your preferences. If /opt doesn’t exist, create it:

$ sudo mkdir /opt

Now untar the Hadoop tar file:

$ sudo tar xvf ~/Downloads/hadoop-1.0.4.tar

Now create a symbolic link to hadoop:

$ sudo ln -s /opt/hadoop-1.0.3 hadoop

Finally change the ownership to your user:

$ sudo chown -R :staff hadoop hadoop-1.0.3

Configuring and Testing Hadoop

Once installed, there will be four configuration files that will need to be update. See the documentation for what these files can do
Configure: hadoop-env.sh

With your favorite editor, open the file /opt/hadoop/conf/config/hadoop-env.sh to make some environmental updates. Uncomment #JAVA_HOME and specify the command path to dynamically load your Java

# The java implementation to use. Required.

export JAVA_HOME=$(/usr/libexec/java_home)

Next, uncomment HADOOP_HEAPSIZE and make it 2000. Technically this is optional, but recommended.

# The maximum amount of heap to use, in MB. Default is 1000.

export HADOOP_HEAPSIZE=2000

mountain-lion-hero

 

Starting with OS X Lion, a bug was introduced that caused issues when working with the name node. The error typically shows up as:

“Unable to load realm info from SCDynamicStore”

To fix this issue, add the following to your hadoop-env.sh file:

export HADOOP_OPTS=”-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk”

To recap, the hadoop-env.sh should contain the following.

export HADOOP_OPTS=”-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk”
export JAVA_HOME=$(/usr/libexec/java_home)
export HADOOP_HEAPSIZE=2000

Configuring: core-site.xml

This file controls the default file system and where the temporary files are stored. Remember, the directories for the temp files must be writeable by the Hadoop user. Note, replace vader.local with your system’s host name or with localhost.

Configuring: hdfs-site.xml

Next, we need to configure HDFS. The hdfs-site.xml is used to configure HDFS itself. In this case, I specify for HDFS to only store one copy of the file and tell HDFS where to store its data.

Note, the dfs related directories must be writable by the Hadoop user.

Configuring: mapred-site.xml

Specify the job tracker location and also set the maximum map and reduce jobs. In this example, the max number of jobs is limited to 2, but can be changed depending on your system.

Initializing HDFS

We need to initialize HDFS before we can use it. This also verifies that the Hadoop user can access the directories. From /opt/hadoop, run the following command:

$ bin/hadoop namenode -format

You should see output like the following:

arun:hadoop prasath$ bin/hadoop namenode -format
12/10/03 21:03:59 INFO namenode.NameNode: STARTUP_MSG:

21:03:59 INFO common.Storage: Storage directory /opt/HDFS/name has been successfully formatted.

Shutting down NameNode at arun.local/192.168.1.13 ************************************************************/

This completes the setup. Now it is time to start up Hadoop and verify that it all works.
Starting Hadoop

From /opt/hadoop, run the following command to start all the Hadoop services.

$ bin/start-all.sh

You will see each service start; if there are no errors, continue on to run an example test job.
Find Pi to verify Hadoop

To test the installation, run the example Pi calculation job. Again, from /opt/hadoop run the following command: hadoop {your username}

$ bin/hadoop jar /opt/hadoop/hadoop-examples-*.jar pi 10 100

 

You should see output similar to:

 



About the Author

arunenigma
Computer Science Graduate Student @ Case Western Reserve University, Cleveland, USA



 
 

 
Factory_1

Python Factory Design Patterns using Switch Case

I googled for Factory Method Design Pattern in Python but couldn’t find a good resource. So, I  am sharing an example program to demonstrate this design pattern in Python which I frequently use. The factory method pattern is...
by arunenigma
 

 
 
Gospers_glider_gun

Conway’s Game of Life Implemetation in Python with cool patterns

he Game of Life (or simply Life) is not a game in the conventional sense. There are no players, and no winning or losing. Once the “pieces” are placed in the starting position, the rules determine everything that ha...
by arunenigma
 

 
 
bin-tree

Python AVL Tree Implementation with ASCII visualization

n computer science, an AVL tree is a self-balancing binary search tree. It was the first such data structure to be invented. In an AVL tree, the heights of the two child subtrees of any node differ by at most one; if at any tim...
by arunenigma
 

 

 
bst

Binary Search Tree in Python with ASCII art visualization

Binary search tree implementation in Python with: in, post and pre-order traversals. Also includes methods for insertion, deletion and search of nodes. Deletion is fairly complex and is made possible by keeping track of parents...
by arunenigma
 

 
 
fibonacci

Python, Memoization, Dynamic Programming, Fibonacci Series and some Fun!

ython can implement the recursive formulation directly, caching return values. Memoization is a method where if a call is made more than once with the same arguments, and the result is returned directly from the cache. For exam...
by arunenigma