Philip D. Coyne

Hadoop

What do I try to do:   Experiment using Hadoop in analyze large quantiy of data for scienitifc and text mining application.  I will start with installation and configuring Single Node Hadoop, load data needed, using Pig and Hive to transform and load data  onto HDFS.

Single Node

Pre-requisite software tools

  • OS:  Ubuntu 14.01.1 on 64-bit machine with 16GB RAM
  • JVM 1.7 (bundled with Ubuntu 14.01.1)
  • Hadoop 2.6.0 download
  • WinSCP to copy file from Windows to Ubuntu environemnt
  • putty if on Windows; or totalTerminal on Mac OS

Prep the Environment

  • Assume that when Ubuntu is installed, the hostname is ubuntuboninc and a user boninc is created.   From Windows, use putty to ssh into unbutuboninc; from Mac OS use total terminal applicaiton:
    • ssh boninc@ubuntuboninc
  • Add hadoop user, named hdUser
    • sudo useradd -m hdUser
  • Add password for hdUser
    • sudo passwd hdUser (enter passowrd when system asks)
  • Change hdUser shell to use bash shell
    •  sudo chsh -s /bin/bash hdUser
  • Allow hdUser to be able for sudo
    • sudo adduser -g admin hdUser

Hadoop Installation (Installation directory: /usr/local/hadoop-2.6.0)

  • On ubuntuboninc machine, in /home/hdUser, create intall directory
    • mkdir install
  • Use WinSCP to copy hadoop 2.6.0 tar file to ubuntuboninc machine, directory /home/hdUser/install
  • Uncompress hadoop tar to /user/local
    • cd /usr/local
    • tar -xzf /home/hdUser/install/hadoop-2.6.0.tar.gz /usr/local/hadoop-2.6.0
  • Create a group name hadoop
    • sudo groupadd hadoop
  • Change hadoop files to hdUser and group hadoop
    • sudo chown -R /usr/local/hadoop-2.6.0

Prepare Hadoop Environment Variables

  • Use an editor to edit the file:
    • hadoop-env.sh
    • in directory: /usr/local/hadoop-2.6.0/etc/hadoop
  • Ensure the set the following two environment variables:
    • export JAVA_HOME = ${JAVA_HOME}  # this is set in /home/hdUser/.bashrc
    • export HADOOP_PREFIX=/usr/local/hadoop-2.6.0
Test:
  • chmod +x /usr/local/etc/hadoop/hadoop-env.sh
  • /usr/local/etc/hadoop/hadoop-env.sh ## execute the environement shell
  • Try the following command
    • $ bin/hadoop
If the above environment variables are set up correctly, the above command will display the usage docummentation for hadoop script.

HDFS Configuration

<TBA>

Multi Nodes

<To be Added>