AkbarAhmed.com

Engineering Leadership

Introduction

Sqoop is a tool to import data from an SQL database into Hadoop and/or export data from Hadoop into an SQL database.

Sqoop can import/export from HDFS, HBase and Hive.

It’s extremely common to use SQL databases as part of the setup in for Hadoop. Often, a SQL database will serve as an upstream datasource, such as a persistence layer for an MQ server, and as a downstream repository, such as a datamart in a BI reporting layer.

Installation

First, we’re going to install MapReduce 1 (MRv1) and the Hadoop Client as these are dependencies for sqoop.

After these two packages are installed, we will need to verify that MRv2 is running, and not MRv1.

sudo apt-get install hadoop-client hadoop-0.20-mapreduce
sudo apt-get install sqoop

The sqoop configuration files are installed into the following directory:
/etc/sqoop/conf which is a symlink to /etc/sqoop/conf.dist

To use Sqoop with YARN (MRv2) we need to verify that the HADOOP_MAPRED_HOME environment variable is set to the correct path.

There are 3 places where we should verify this variable.

grep HADOOP_MAPRED_HOME /etc/default/hadoop

The output should be:
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce

grep HADOOP_MAPRED_HOME /etc/default/hadoop-mapreduce-historyserver

The output should be:
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce

grep HADOOP_MAPRED_HOME /etc/hadoop/conf/hadoop-env.sh

The output should be:
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce

Lastly, I recommend that the HADOOP_MAPRED_HOME be set as a system-wide environment variable to help ease development for the software engineers (assuming you’re only going to use YARN. If you’re using MRv1, then don’t set this variable).

sudo bash -c 'echo HADOOP_MAPRED_HOME=\"/usr/lib/hadoop-mapreduce\" >> /etc/environment'
source /etc/environment
echo $HADOOP_MAPRED_HOME

Finally, we’ll verify that the environment variable is correctly set for the sudo user.

sudo env | grep HADOOP_MAPRED_HOME

Verify the sqoop installation

sqoop version

The output should include:
Sqoop 1.4.1-cdh4.0.0

Additional Reading

http://archive.cloudera.com/cdh4/cdh/4/sqoop/SqoopUserGuide.html

One thought on “Install Sqoop 1.4.1 for Cloudera Hadoop (CHD4) on Ubuntu 12.04 LTS

  1. Andy says:

    hello,

    I already have hadoop 1.0.3 on my ubuntu but couldn’t install sqoop.
    When I tried it, it displays cannot find package sqoop.
    Please help me in this

Leave a comment