Hadoop | AkbarAhmed.com

Start/Stop Hadoop Services (CDH4)

akbarsahmed

July 6, 2012

Introduction Starting and stopping services is a common part of Hadoop administration. Hadoop Region Server Start a Region Server sudo service hbase-regionserver start Stop a Region Server sudo service hbase-regionserver stop Check the Status of a Region Server sudo service hbase-regionserver status

Install Zookeeper for Cloudera Hadoop (CHD4) in Pseudo mode on Ubuntu 12.04 LTS

akbarsahmed

3 comments

July 6, 2012

Introduction Zookeeper provides cluster management for Hadoop. In this post, I’m going to install Zookeeper in Pseudo mode, so please use these instructions for setting up a developer’s workstation, not for a production cluster. Installation The zookeeper package should already be installed, but we’ll double check. sudo apt-get install zookeeper Next, we’ll install the Zookeeper …

HBase 0.92.1 Files and Directories (CDH4)

akbarsahmed

no comments

July 6, 2012

Introduction You will need to know the location of binaries, configuration files, and libraries when working with HBase. Directories Configuration /etc/hbase/conf is the location for all of HBase’s configuration files. HBase uses Debian Alternatives, so there are a number of symlinks to the configuration files. /etc/hbase/conf is a symlink to /etc/alternatives/hbase-conf. /etc/alternatives/hbase-conf is a symlink …

Zookeeper 3.4.3 Files and Directories (CDH4)

akbarsahmed

no comments

July 6, 2012

Introduction You will need to know the location of binaries, configuration files, and libraries when working with Zookeeper. Zookeeper 3.4.3 is a part of Cloudera Distribution Hadoop (CDH4). Directories /etc/zookeeper/conf /etc/zookeeper/conf is the location for all of Zookeeper’s configuration files. Zookeeper uses Debian Alternatives, so there are a number of symlinks to the configuration files. …

Find and Replace Text with sed

akbarsahmed

1 comment

July 2, 2012

Introduction sed provides a quick and easy way to find and replace text via it’s search command (‘s’). Sample File Copy and paste the following text into a file named practice01.txt. Author: Akbar S. Ahmed Date: July 1, 2012 Subject: Sed sed is an extremely useful Unix/Linux/*nix utility that allows you to manipulate a text …

What is sed?

akbarsahmed

no comments

July 2, 2012

Introduction sed is short for Stream EDitor, which is a utility that allow you to parse and transform text one line at a time. sed is a useful tool, along with grep and awk, when manipulating text files. It is also often overlooked when working with Hadoop, although the use of sed, awk and grep …

Change the Hadoop MapReduce v2 (YARN) ShuffleHandler Port

akbarsahmed

1 comment

June 28, 2012

Introduction If you are running Hadoop on a development machine, then it’s likely that you’ll run into a situation where multiple services require port 8080. I recently ran into this issue where both the Pentaho User Console and the Hadoop MapReduce ShuffleHandler were trying to use port 8080. One solution is to change the port …

Install Sqoop 1.4.1 for Cloudera Hadoop (CHD4) on Ubuntu 12.04 LTS

akbarsahmed

1 comment

June 26, 2012

Introduction Sqoop is a tool to import data from an SQL database into Hadoop and/or export data from Hadoop into an SQL database. Sqoop can import/export from HDFS, HBase and Hive. It’s extremely common to use SQL databases as part of the setup in for Hadoop. Often, a SQL database will serve as an upstream …

Install Cloudera Hadoop (CDH4) with YARN (MRv2) in Pseudo mode on Ubuntu 12.04 LTS

akbarsahmed

25 comments

June 26, 2012

Introduction These instructions cover a manual installation of the Cloudera CDH4 packages on Ubuntu 12.04 LTS and are based on my following the Cloudera CDH4 Quick Start Guide (CDH4_Quick_Start_Guide_4.0.0.pdf). Installation prerequisites sudo apt-get install curl Verify that Java is installed correctly First, check that Java is setup correctly for your account. echo $JAVA_HOME The output …

Create a .bash_aliases file

akbarsahmed

3 comments

June 25, 2012

Introduction This is my personal .bash_aliases file that is mainly used for Cloudera CDH4 (Hadoop) and Pentaho. As a result, many of my aliases are specific to these software packages. I plan to update this post as my .bash_aliases file expands. I will also push my .bash_aliases file into Git to make it easier to …