All posts tagged: CDH4

Cloudera Distribution Hadoop

HBase 0.92.1 Files and Directories (CDH4)

Leave a comment
Hadoop

Introduction You will need to know the location of binaries, configuration files, and libraries when working with HBase. Directories Configuration /etc/hbase/conf is the location for all of HBase’s configuration files. HBase uses Debian Alternatives, so there are a number of symlinks to the configuration files. /etc/hbase/conf is a symlink to /etc/alternatives/hbase-conf. /etc/alternatives/hbase-conf is a symlink to /etc/hbase/conf.dist Logs /var/log/hbase contains all of the HBase log files. Files Configuration Files The following configuration files are located […]

Zookeeper 3.4.3 Files and Directories (CDH4)

Leave a comment
Hadoop

Introduction You will need to know the location of binaries, configuration files, and libraries when working with Zookeeper. Zookeeper 3.4.3 is a part of Cloudera Distribution Hadoop (CDH4). Directories /etc/zookeeper/conf /etc/zookeeper/conf is the location for all of Zookeeper’s configuration files. Zookeeper uses Debian Alternatives, so there are a number of symlinks to the configuration files. /etc/zookeeper/conf is a symlink to /etc/alternatives/zookeeper-conf. /etc/alternatives/zookeeper-conf is a symlink to /etc/zookeeper/conf.dist Files Configuration Files The following configuration files are […]

Change the Hadoop MapReduce v2 (YARN) ShuffleHandler Port

comment 1
Hadoop

Introduction If you are running Hadoop on a development machine, then it’s likely that you’ll run into a situation where multiple services require port 8080. I recently ran into this issue where both the Pentaho User Console and the Hadoop MapReduce ShuffleHandler were trying to use port 8080. One solution is to change the port used by the Hadoop MapReduce ShuffleHandler, which is what I’m going to configure below. Configuration sudo vi /etc/hadoop/conf/mapred-site.xml Add the […]

Install Sqoop 1.4.1 for Cloudera Hadoop (CHD4) on Ubuntu 12.04 LTS

comment 1
Hadoop

Introduction Sqoop is a tool to import data from an SQL database into Hadoop and/or export data from Hadoop into an SQL database. Sqoop can import/export from HDFS, HBase and Hive. It’s extremely common to use SQL databases as part of the setup in for Hadoop. Often, a SQL database will serve as an upstream datasource, such as a persistence layer for an MQ server, and as a downstream repository, such as a datamart in […]

Install Cloudera Hadoop (CDH4) with YARN (MRv2) in Pseudo mode on Ubuntu 12.04 LTS

comments 25
Hadoop

Introduction These instructions cover a manual installation of the Cloudera CDH4 packages on Ubuntu 12.04 LTS and are based on my following the Cloudera CDH4 Quick Start Guide (CDH4_Quick_Start_Guide_4.0.0.pdf). Installation prerequisites sudo apt-get install curl Verify that Java is installed correctly First, check that Java is setup correctly for your account. echo $JAVA_HOME The output should be: "/usr/lib/jvm/jdk1.6.0_31" Next, check that the JAVA_HOME environment variable is setup correctly for the sudo user. sudo env | […]