Introduction
Sqoop is a tool to import data from an SQL database into Hadoop and/or export data from Hadoop into an SQL database.
Sqoop can import/export from HDFS, HBase and Hive.
It’s extremely common to use SQL databases as part of the setup in for Hadoop. Often, a SQL database will serve as an upstream datasource, such as a persistence layer for an MQ server, and as a downstream repository, such as a datamart in a BI reporting layer.
Installation
First, we’re going to install MapReduce 1 (MRv1) and the Hadoop Client as these are dependencies for sqoop.
After these two packages are installed, we will need to verify that MRv2 is running, and not MRv1.
sudo apt-get install hadoop-client hadoop-0.20-mapreduce sudo apt-get install sqoop
The sqoop configuration files are installed into the following directory:
/etc/sqoop/conf
which is a symlink to /etc/sqoop/conf.dist
To use Sqoop with YARN (MRv2) we need to verify that the HADOOP_MAPRED_HOME environment variable is set to the correct path.
There are 3 places where we should verify this variable.
grep HADOOP_MAPRED_HOME /etc/default/hadoop
The output should be:
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
grep HADOOP_MAPRED_HOME /etc/default/hadoop-mapreduce-historyserver
The output should be:
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
grep HADOOP_MAPRED_HOME /etc/hadoop/conf/hadoop-env.sh
The output should be:
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
Lastly, I recommend that the HADOOP_MAPRED_HOME be set as a system-wide environment variable to help ease development for the software engineers (assuming you’re only going to use YARN. If you’re using MRv1, then don’t set this variable).
sudo bash -c 'echo HADOOP_MAPRED_HOME=\"/usr/lib/hadoop-mapreduce\" >> /etc/environment' source /etc/environment echo $HADOOP_MAPRED_HOME
Finally, we’ll verify that the environment variable is correctly set for the sudo user.
sudo env | grep HADOOP_MAPRED_HOME
Verify the sqoop installation
sqoop version
The output should include:
Sqoop 1.4.1-cdh4.0.0
Additional Reading
http://archive.cloudera.com/cdh4/cdh/4/sqoop/SqoopUserGuide.html
hello,
I already have hadoop 1.0.3 on my ubuntu but couldn’t install sqoop.
When I tried it, it displays cannot find package sqoop.
Please help me in this