Create a database in MySQL


Creating a database in MySQL is exceptionally as you only have to execute one SQL statement after logging in.


Open a terminal and run the following commands.

mysql -u root -p

This will open the mysql> prompt.


Google Wallet on Android

What is the Google Wallet Android app?

I have finally had an occasion to use the Google Wallet app on my Galaxy Nexus. So what is the Google Wallet Android app? It’s basically an Android app that acts like a digital credit card that you can use to purchase coffee at Peet’s, medicine at CVS and so on.

There is a special reader in the store that you tap your phone against when checking out (kinda like a credit card swiper except that you tap your phone instead of swiping your card). After tapping, you are prompted to enter a PIN (just like an ATM card), everything is automatically paid for, and your ready to pick up your bag and walk out the door.

To use Google Wallet, you will need a phone with an NFC chip. As of June 2012, the only phone that supports Google Wallet (to my knowledge) is the Galaxy Nexus.

A cool app

I have now used Google Wallet to make a few purchases and I have to say that it’s a nice upgrade to a plastic credit card. I have found that if I enter my PIN before getting to the checkout counter, I can just Tap and Go.

The coolness factor of the technology also helps. I got a free coffee at Peet’s the other day because the barista loved that I paid with my phone.


I’m not too worried about security at this point at I don’t have my credit cards linked to Google Wallet. Currently, I’m using the Google Prepaid Card so if I lose my phone it’ll be like losing a gift card.

However, for this form of payment to go mainstream security will be very important, which is why I would like to see apps from the the banks and credit card companies.

A Multi-App Future?

Much as we have multiple debit and credit cards in our wallet, I think the future may be to have multiple digital payment apps, such as a Visa app, an American Express app, and so on. It would be nice if during payment a list of payment apps is displayed so that we can select one, much as we select a credit card when we open our wallet.

Another benefit of multiple apps would be security. One issue I see with a single app controlling every credit card is that it forms a single point of failure (in terms of a security breach). Personally, I also think the banks and financial institutions have a much longer history writing software that secures our payments, so an app from Visa may have tighter security that one from Google, Microsoft, Apple or another non-financial institution.


Google Wallet is a cool and useful app. Unfortunately, the number of phones that support the app are limited, and the NFC readers are only in a few big name stores. But, it’s nice to use the future today (unless you’re from Japan, then paying with your phone is old hat).

Install JDK 7 u5 on Ubuntu 12.04 LTS (as a secondary JDK)


I had installed JDK 6.0 update 31 in an earlier post. However, I now need to write a Java application that requires the features available in JDK 7.

In this post, I will install JDK 7 update 5 as a secondary JDK, while JDK 6.0 u31 will be the primary JDK. It’s perfectly normal to have multiple JDKs on a single machine to support the requirements of different applications. Fortunately, it’s easy to use a different JDK on a per application basis.


I have a 64 bit version of Ubuntu 12.04 LTS installed, so the instructions below only apply to this OS.

  1. Download the Java JDK from
  2. Click Accept License Agreement
  3. Click dk-7u5-linux-x64.tar.gz
  4. Login to with your Oracle account
  5. Download the JDK to your ~/Downloads directory
  6. After downloading, open a terminal, then enter the following commands.


Open a terminal, then enter the following commands:

cd ~/Downloads
tar -xzf jdk-7u5-linux-x64.tar.gz

The jvm directory is used to organize all JDK/JVM versions in a single parent directory. As this is our 2nd JDK, we’ll assume that the jvm directory already exists.

sudo mv jdk1.7.0_05 /usr/lib/jvm

The next 3 commands are split across 2 lines per command due to width limits in the blog’s theme.

sudo update-alternatives --install "/usr/bin/java" "java"  \
	"/usr/lib/jvm/jdk1.7.0_05/bin/java" 2
sudo update-alternatives --install "/usr/bin/javac" "javac"  \
	"/usr/lib/jvm/jdk1.7.0_05/bin/javac" 2
sudo update-alternatives --install "/usr/bin/javaws" "javaws"  \
	"/usr/lib/jvm/jdk1.7.0_05/bin/javaws" 2
sudo update-alternatives --config java

You will see output similar to the following (although it’ll differ on your system). Read through the list and find the number for the Oracle JDK installation (/usr/lib/jvm/jdk1.7.0_05/bin/java)

There are 2 choices for the alternative java (providing /usr/bin/java).

  Selection    Path                               Priority   Status
* 0            /usr/lib/jvm/jdk1.7.0_05/bin/java   2         auto mode
  1            /usr/lib/jvm/jdk1.6.0_31/bin/java   1         manual mode
  2            /usr/lib/jvm/jdk1.7.0_05/bin/java   2         manual mode

Press enter to keep the current choice[*], or type selection number:

On my system I did entered 1 to keep JDK 1.6.0 u31 as my primary JDK (change the number that is appropriate for your system). To enter 1, press 1 on the keyboard, then press Enter.

sudo update-alternatives --config javac
There are 2 choices for the alternative javac (providing /usr/bin/javac).

  Selection    Path                                Priority   Status
* 0            /usr/lib/jvm/jdk1.7.0_05/bin/javac   2         auto mode
  1            /usr/lib/jvm/jdk1.6.0_31/bin/javac   1         manual mode
  2            /usr/lib/jvm/jdk1.7.0_05/bin/javac   2         manual mode

Press enter to keep the current choice[*], or type selection number:

I entered 1 then pressed Enter to keep JDK 1.6.0 u31 as my primary javac command.

sudo update-alternatives --config javaws
There are 2 choices for the alternative javaws (providing /usr/bin/javaws).

  Selection    Path                                 Priority   Status
* 0            /usr/lib/jvm/jdk1.7.0_05/bin/javaws   2         auto mode
  1            /usr/lib/jvm/jdk1.6.0_31/bin/javaws   1         manual mode
  2            /usr/lib/jvm/jdk1.7.0_05/bin/javaws   2         manual mode

Press enter to keep the current choice[*], or type selection number:

I entered 1 then pressed Enter to keep JDK 1.6.0 u31 as my primary javaws command.

As a final step, let’s test each of the commands to ensure everything is setup correctly.

java -version

The output should be:
java version "1.6.0_31"
Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)

javac -version

The output should be:
javac 1.6.0_31

javaws -version

The output should be:
Java(TM) Web Start 1.6.0_31, which is followed by a long usage message.

That’s it, the JDK 7 u5 is installed.

Create a DB connection to MySQL in Kettle


Creating a DB connection from Kettle to MySQL involves creating a MySQL use who can access the DB in question, installing the JDBC driver, and creating a connection.

Install the MySQL JDBC driver

Download the MySQL JDBC driver from

Login to, then click Download.

cd ~/Downloads
tar -xzf mysql-connector-java-5.1.20.tar.gz
cd mysql-connector-java-5.1.20
cp mysql-connector-java-5.1.20-bin.jar ~/bin/data-integration/libext/

Create a MySQL user

In this post, I am going to create a connection to the Sakila DB.

mysql -u root -p

At the MySQL command prompt, enter the following (replace ‘password’ with your password):

mysql> GRANT ALL ON sakila.* TO akbar@localhost IDENTIFIED BY 'password';

Create the DB connection in Kettle

cd ~/bin/data-integration
  1. Click the New in the PDI toolbar.
  2. Click Database connection.
  3. Enter information similar to what’s shown below:
  4. PDI Database Connection

  5. Click Test.
  6. Click OK.

In Explorer in the left pane of PDI, right-click on the Sakila database connection, click Explore.

You should now be able to view the tables in the Sakila database.

Create a Kettle repository


Open Kettle

cd ~/bin/data-integration

To run Spoon:


Create a new repository

Run the following in a terminal.

mkdir ~/kettle

The steps below are performed within the PDI UI.

    Repository Connection

    Repository Connection

  1. In the Repository Connection dialog box, click the small green plus symbol.
  2. Repository Type

    Repository Type

  3. In the Select the repository type dialog box, select Kettle file repository.
  4. Click OK
  5. In the File repository settings dialog box, enter the following information:
    • Base directory: /home/akbar/kettle
    • Read-only repository?: Leave unchecked
    • Hide hidden folders and files: Leave unchecked
    • ID: kettle-repo
    • Name: Kettle Repository
  6. Click OK.
  7. Click OK.

Install MySQL 5.5 on Ubuntu 12.04 LTS


Installing the MySQL package on Ubuntu is extremely simple.


Open a terminal and enter the following commands.

sudo apt-get install mysql-client mysql-navigator mysql-server

Type Y to accept the additional packages. Press Enter.

After downloading and during installation, the MySQL configuration dialogs will display in the terminal.

In the first dialog, press Enter.

Enter a password for the MySQL root user. Press Enter.

Reenter the root password. Press Enter.

That’s it, MySQL is now installed and ready for use.

Change the Hadoop MapReduce v2 (YARN) ShuffleHandler Port


If you are running Hadoop on a development machine, then it’s likely that you’ll run into a situation where multiple services require port 8080. I recently ran into this issue where both the Pentaho User Console and the Hadoop MapReduce ShuffleHandler were trying to use port 8080.

One solution is to change the port used by the Hadoop MapReduce ShuffleHandler, which is what I’m going to configure below.


sudo vi /etc/hadoop/conf/mapred-site.xml

Add the following as a new property by adding it just before the </configuration> element.

    <description>Default port that the ShuffleHandler will run on. ShuffleHandler is a service run at the NodeManager to facilitate transfers of intermediate Map outputs to requesting Reducers.</description>

Then restart the YARN daemons.

Install Sqoop 1.4.1 for Cloudera Hadoop (CHD4) on Ubuntu 12.04 LTS


Sqoop is a tool to import data from an SQL database into Hadoop and/or export data from Hadoop into an SQL database.

Sqoop can import/export from HDFS, HBase and Hive.

It’s extremely common to use SQL databases as part of the setup in for Hadoop. Often, a SQL database will serve as an upstream datasource, such as a persistence layer for an MQ server, and as a downstream repository, such as a datamart in a BI reporting layer.


First, we’re going to install MapReduce 1 (MRv1) and the Hadoop Client as these are dependencies for sqoop.

After these two packages are installed, we will need to verify that MRv2 is running, and not MRv1.

sudo apt-get install hadoop-client hadoop-0.20-mapreduce
sudo apt-get install sqoop

The sqoop configuration files are installed into the following directory:
/etc/sqoop/conf which is a symlink to /etc/sqoop/conf.dist

To use Sqoop with YARN (MRv2) we need to verify that the HADOOP_MAPRED_HOME environment variable is set to the correct path.

There are 3 places where we should verify this variable.

grep HADOOP_MAPRED_HOME /etc/default/hadoop

The output should be:
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce

grep HADOOP_MAPRED_HOME /etc/default/hadoop-mapreduce-historyserver

The output should be:
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce

grep HADOOP_MAPRED_HOME /etc/hadoop/conf/

The output should be:
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce

Lastly, I recommend that the HADOOP_MAPRED_HOME be set as a system-wide environment variable to help ease development for the software engineers (assuming you’re only going to use YARN. If you’re using MRv1, then don’t set this variable).

sudo bash -c 'echo HADOOP_MAPRED_HOME=\"/usr/lib/hadoop-mapreduce\" >> /etc/environment'
source /etc/environment

Finally, we’ll verify that the environment variable is correctly set for the sudo user.

sudo env | grep HADOOP_MAPRED_HOME

Verify the sqoop installation

sqoop version

The output should include:
Sqoop 1.4.1-cdh4.0.0

Additional Reading

Install Cloudera Hadoop (CDH4) with YARN (MRv2) in Pseudo mode on Ubuntu 12.04 LTS


These instructions cover a manual installation of the Cloudera CDH4 packages on Ubuntu 12.04 LTS and are based on my following the Cloudera CDH4 Quick Start Guide (CDH4_Quick_Start_Guide_4.0.0.pdf).

Installation prerequisites

sudo apt-get install curl

Verify that Java is installed correctly

First, check that Java is setup correctly for your account.


The output should be:

Next, check that the JAVA_HOME environment variable is setup correctly for the sudo user.

sudo env | grep JAVA_HOME

The output should be:

Download the CDH4 package

cd ~/Downloads
mkdir cloudera
cd cloudera

There was no good way of wrapping the link below, so I added an HTML link to the post. This way you can right-click the link and click Copy Link Location, which you can then use to paste into the terminal.


Install the CDH4 package

sudo dpkg -i cdh4-repository_1.0_all.deb

Install the Cloudera Public GPG Key

curl -s \ \
| sudo apt-key add -
sudo apt-get update

Install CDH4 with YARN in Pseudo mode

sudo apt-get install hadoop-conf-pseudo

This one command will install a large number of packages:

  • bigtop-jsvc
  • bigtop-utils
  • hadoop
  • hadoop-conf-pseudo
  • hadoop-hdfs
  • hadoop-hdfs-datanode
  • hadoop-hdfs-namenode
  • hadoop-hdfs-secondarynamenode
  • hadoop-mapreduce
  • hadoop-mapreduce-historyserver
  • hadoop-yarn
  • hadoop-yarn-nodemanager
  • hadoop-yarn-resourcemanager
  • zookeeper

View the installed files

It’s good practice to view the list of files installed by each package. Specifically, this is a good method to learn about all of the available configuration files.

dpkg -L hadoop-conf-pseudo

Included in the list of files displayed by dpkg are the configuration files (and some other files):


Format the HDFS filesystem

sudo -u hdfs hdfs namenode -format

I received one warning message when I formatted the HDFS filesystem:
WARN common.Util: Path /var/lib/hadoop-hdfs/cache/hdfs/dfs/name should be specified as a URI in configuration files. Please update hdfs configuration.

Start HDFS

mkdir bin
cd bin
vi hadoop-hdfs-start

Paste the following code in hadoop-hdfs-start:

for service in /etc/init.d/hadoop-hdfs-*
sudo $service start

chmod +x hadoop-hdfs-start

Open the NameNode web console at http://localhost:50070.

About the HDFS filesystem

The commands in the sections below are for creating directories in the HDFS filesystem. Importantly, the HDFS directory structure is not the same as the directory structure in ext4 (i.e. your main Linux directory structure).

To view the HDFS directory structure, you basically prefix standard Linux commands with sudo -u hdfs hadoop fs –. Therefore, you will likely find it useful to create a .bash_aliases file that provides an easier way to type these commands.

I have created a sample .bash_aliases file in the following post: Create a .bash_aliases file

I have used my aliases to setup Hadoop as it’s easier to type. For example, I use hls instead of sudo -u hdfs hadoop fs -ls.

Create the HDFS /tmp directory

You don’t need to remove an old /tmp directory if this is the first time you’re installing Hadoop, but I’ll include the command here for completeness.

sudo -u hdfs hadoop fs -rm -r /tmp

Let’s create a new /tmp directory in HDFS.

shmkdir /tmp

Next, we’ll update the permissions on /tmp in HDFS.

shchmod -R 1777 /tmp

Create a user directory

Since this is a setup for development, we will only create one user directory. However, for a cluster or multi-user environment, you should create one user directory per MapReduce user.

Change akbar below to your username.

shmkdir /user/akbar
shchown akbar:akbar /user/akbar

Create the /var/log/hadoop-yarn directory

shmkdir /var/log/hadoop-yarn
shchown yarn:mapred /var/log/hadoop-yarn

Create the staging directory

The Hadoop -mkdir command defaults to -p (create parent directory).

shmkdir /tmp/hadoop-yarn/staging
shchmod -R 1777 /tmp/hadoop-yarn/staging

Create the done_intermediate directory

shmkdir /tmp/hadoop-yarn/staging/history/done_intermediate
shchmod -R 1777 /tmp/hadoop-yarn/staging/history/done_intermediate
shchown -R mapred:mapred /tmp/hadoop-yarn/staging

Verify that the directory structure is setup correctly

shls -R /

The output should look like:

drwxrwxrwt - hdfs supergroup 0 2012-06-25 15:11 /tmp
drwxr-xr-x - hdfs supergroup 0 2012-06-25 15:11 /tmp/hadoop-yarn
drwxrwxrwt - mapred mapred 0 2012-06-25 15:51 /tmp/hadoop-yarn/staging
drwxr-xr-x - mapred mapred 0 2012-06-25 15:51 /tmp/hadoop-yarn/staging/history
drwxrwxrwt - mapred mapred 0 2012-06-25 15:51 /tmp/hadoop-yarn/staging/history/done_intermediate
drwxr-xr-x - hdfs supergroup 0 2012-06-25 15:09 /user
drwxr-xr-x - akbar akbar 0 2012-06-25 15:09 /user/akbar
drwxr-xr-x - hdfs supergroup 0 2012-06-25 13:42 /var
drwxr-xr-x - hdfs supergroup 0 2012-06-25 13:42 /var/log
drwxr-xr-x - yarn mapred 0 2012-06-25 13:42 /var/log/hadoop-yarn

Start YARN

sudo service hadoop-yarn-resourcemanager start
sudo service hadoop-yarn-nodemanager start
sudo service hadoop-mapreduce-historyserver start

I get the following error message when I start the MR History Server:
chown: changing ownership of `/var/log/hadoop-mapreduce': Operation not permitted

However, this error is not significant and can be ignored. It’ll likely be fixed in an update to CDH4/Hadoop.

Run an example application with YARN

We are going to run the sample YARN app as our regular user, so we’ll use the Hadoop aliases, such as hls, and not the sudo Hadoop aliases, such as shls.

In the first command, we’ll enter just a directory name of input. However, you’ll notice that the input is automatically created in our user directory of /user/akbar/input.

hmkdir input

Let’s view our new directory in HDFS.


Or, you can optionally specify the complete path.

hls /user/akbar

Next, we’ll put some files into the HDFS /user/akbar/input directory.

hadoop fs -put /etc/hadoop/conf/*.xml input

Set the HADOOP_MAPRED_HOME environment variable for the current user in our current session.

export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce

Next, we’ll run the sample job which is a simple grep of the file in the input directory, which then outputs the results to the output directory. It’s worth noting that that .jar file is located on the physical filesystem, while the input and output directories in the HDFS filesystem.

hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar \
grep input output 'dfs[a-z.]+'
hls output
hcat output/part-r-00000

If you have a longer file to view, piping into less will prove helpful, such as:

hcat output/part-r-00000 | less

How to start the various Hadoop services

The following are some additional notes I took on starting the various Hadoop services (in the correct order).

Start the Hadoop namenode

sudo service hadoop-hdfs-namenode start

Start the Hadoop datanode service

sudo service hadoop-hdfs-datanode start

Start the Hadoop secondarynamenode service

sudo service hadoop-hdfs-secondarynamenode start

Start the Hadoop resourcemanager service

sudo service hadoop-yarn-resourcemanager start

Start the Hadoop nodemanager service

sudo service hadoop-yarn-nodemanager start

Start the Hadoop historyserver service

sudo service hadoop-mapreduce-historyserver start