All posts filed under: Hadoop

Hadoop Distributions

Leave a comment
Hadoop

The following is a repost of my answer to a question on LinkedIn, but I thought it may prove useful to people evaluating Hadoop distributions. The following is a substantially over simplified set of choices (in alphabetical order): Amazon: Apache Hadoop provided as a web service. Good solution if your data is collected on Amazon…saves you the trouble of uploading gigs and gigs of data. Apache: Apache Hadoop is the core code based upon which […]

Understing the Hadoop High Availability (HA) Options

Leave a comment
Hadoop

Once you start to use Hadoop in your day-to-day business operations, you’ll quickly find that uptime is an important consideration. No one wants to explain to the CEO why a report is not delivered. While most of Hadoop’s architecture is designed to work in the face of node failure (such as the DataNodes), other components such as the NameNode must be configured with an HA option. The following is a quick and dirty list of […]

HBase Command Line Tutorial

comments 8
Hadoop / HBase

Introduction Start the HBase Shell All subsequent commands in this post assume that you are in the HBase shell, which is started via the command listed below. hbase shell You should see output similar to: 12/08/12 12:30:52 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 0.92.1-cdh4.0.1, rUnknown, Thu Jun 28 18:13:01 PDT 2012 Create a Table We will […]

Debugging HBase: org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a viable location to assign region -ROOT

comments 2
Hadoop

Introduction I ran into an annoying error in HBase due to the localhost loopback. The solution was simple, but took some trial and error. Error I was following the HBase logs with the following command: tail -1000f /var/log/hbase/hbase-hbase-master-freshstart.log The following error kept poping up in the log file. org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a viable location to assign region -ROOT Solution sudo vi /etc/hosts I changed: 127.0.0.1       localhost 127.0.1.1       freshstart to: #127.0.0.1      localhost #127.0.1.1      freshstart 192.168.2.15   freshstart 127.0.0.1      localhost 192.168.2.15 is my […]

How to add numbers with Pig

Leave a comment
Hadoop

Introduction We’re going to start with a very simple Pig script that reads a file that contains 2 numbers per line separated by a comma. The Pig script will first read the line, store each of the 2 numbers in separate variables, and will then add the numbers together. Create the Sample Input File cd vi pig-practice01.txt Paste the following into pig-practice01.txt. 5 1 6 4 3 2 1 1 9 2 3 8 Create […]

Install Pig 0.9.2 for CDH4 on Ubuntu 12.04 LTS x64

Leave a comment
Hadoop

Introduction Installing Pig is drop dead simple. Installation sudo apt-get install pig Check the Pig version. pig --version Setup the Environment We’re going to set the environment variables system-wide for Pig programming. sudo vi /etc/environment Paste the following environment variables into the environment file. HADOOP_MAPRED_HOME="/usr/lib/hadoop-mapreduce" PIG_CONF_DIR="/etc/pig/conf" source /etc/environment That’s it. You can now start to write and run pig jobs.

How to view files in HDFS (hadoop fs -ls)

Leave a comment
Hadoop

Introduction The hadoop fs -ls command allows you to view the files and directories in your HDFS filesystem, much as the ls command works on Linux / OS X / *nix. Default Home Directory in HDFS A user’s home directory in HDFS is located at /user/userName. For example, my home directory is /user/akbar. List the Files in Your Home Directory hadoop fs -ls defaults to /user/userName, so you can leave the path blank to view […]

hdfs dfsadmin -metasave

Leave a comment
Hadoop

Introduction hdfs dfsadmin -metasave provides additional information compared to hdfs dfsadmin -report. With hdfs dfsadmin -metasave provides information about blocks, including> blocks waiting for replication blocks currently being replication total number of blocks hdfs dfsadmin -metasave filename.txt Run the command with sudo -u hdfs prefixed to ensure you don’t get a permission denied error. CDH4 runs the namenode as the hdfs user by default. However if you have changed the ssudo -u hdfs hdfs dfsadmin […]

hdfs dfsadmin -report

comment 1
Hadoop

Introduction hdfs dfsadmin -report outputs a brief report on the overall HDFS filesystem. It’s a userful command to quickly view how much disk is available, how many datanodes are running, and so on. Command Run the command with sudo -u hdfs prefixed to ensure you don’t get a permission denied error. CDH4 runs the namenode as the hdfs user by default. However if you have changed the sudo -u hdfs hdfs dfsadmin -report You will […]

Install HBase 0.92.1 for Cloudera Hadoop (CHD4) in Pseudo mode on Ubuntu 12.04 LTS

Leave a comment
Hadoop

Introduction HBase is a tabular-oriented database that runs on top of HDFS. It is modeled on Google’s BigTable. In this post, I’m going to install HBase in Pseudo mode, so please use these instructions for setting up a developer’s workstation, not for a production cluster. When should you use HBase HBase should be used when you need random read/write access to the data in Hadoop. While HBase gives you random seeks, it does so at […]