AkbarAhmed.com

Engineering Leadership

Introduction

Pentaho Analysis Services is called Mondrian, which is the name I’ll use for the remainder of this post. Mondrian is Pentaho’s OLAP server.

In this post I’ll provide step-by-step instructions on how to install Mondrian 3.4.1 Ubuntu Linux 12.04 LTS x64. We’ll use MySQL as the database.

Download

Let’s first create a directory to download Mondrian into.

mkdir -p ~/Downloads/pentaho/analysis
cd ~/Downloads/pentaho/analysis

To download the Mondrian either run the following command, or follow the 3 bulleted steps below.

wget http://hivelocity.dl.sourceforge.net/project/mondrian/mondrian/mondrian-3.4.1/mondrian-3.4.1.zip

Or follow the steps below if you don’t want to use the wget command shown above.

Unzip

cd ~/Downloads/pentaho/analysis
unzip mondrian-3.4.1.zip

Install the Mondrian Server

cd ~/Downloads/pentaho/analysis/mondrian-3.4.1/lib
sudo unzip mondrian.war -d /opt/pentaho/biserver-ce/tomcat/webapps/mondrian
sudo chown -R pentaho:pentaho /opt/pentaho/biserver-ce/tomcat/webapps/mondrian

Create the FoodMart Database

FoodMart is a sample database that can be used to learn Mondrian.

mysql -u root -p

Enter the following commands at the MySQL command prompt.

mysql> CREATE DATABASE foodmart;

Please use a real password by changing pass.word below.

mysql> GRANT ALL ON foodmart.* TO foodmart@localhost IDENTIFIED BY 'pass.word';
mysql> quit;

Import the Foodmart Schema and Data

cd /opt/pentaho/biserver-ce/tomcat/webapps/mondrian

Important
Change akbar to your user name in the inputFile option in the command below.

java -cp "\
/opt/pentaho/biserver-ce/tomcat/webapps/mondrian/WEB-INF/lib/mondrian.jar:\
/opt/pentaho/biserver-ce/tomcat/webapps/mondrian/WEB-INF/lib/log4j-1.2.8.jar:\
/opt/pentaho/biserver-ce/tomcat/webapps/mondrian/WEB-INF/lib/commons-collections-3.1.jar:\
/opt/pentaho/biserver-ce/tomcat/webapps/mondrian/WEB-INF/lib/olap4j.jar:\
/opt/pentaho/biserver-ce/tomcat/webapps/mondrian/WEB-INF/lib/commons-logging-1.0.4.jar:\
/opt/pentaho/biserver-ce/tomcat/webapps/mondrian/WEB-INF/lib/eigenbase-xom.jar:\
/opt/pentaho/biserver-ce/tomcat/webapps/mondrian/WEB-INF/lib/eigenbase-resgen.jar:\
/opt/pentaho/biserver-ce/tomcat/webapps/mondrian/WEB-INF/lib/eigenbase-properties.jar:\
/opt/pentaho/biserver-ce/tomcat/lib/mysql-connector-java-5.1.17.jar" \
mondrian.test.loader.MondrianFoodMartLoader -verbose -tables -data -indexes \
-jdbcDrivers=com.mysql.jdbc.Driver \
-inputFile=/home/akbar/Downloads/pentaho/analysis/mondrian-3.4.1/demo/FoodMartCreateData.sql \
-outputJdbcURL="jdbc:mysql://localhost/foodmart?user=foodmart&password=pass.word"

If you get an error, there is a 99% probability that the jar files cannot be found on the paths specified in the command above. So, the first thing you should do is check if the jar files can be found in the class path paths that are listed above, then update the paths as necessary.

Update the Sample MDX Query Files

cd /opt/pentaho/biserver-ce/tomcat/webapps/mondrian/WEB-INF/queries

Edit each of the following files and make the change shown in the From / To instructions below.

Change each of the JSP files from:


<jp:mondrianQuery id="query01" jdbcDriver="org.apache.derby.jdbc.EmbeddedDriver" jdbcUrl="jdbc:derby:classpath:/foodmart" catalogUri="/WEB-INF/queries/FoodMart.xml"
   jdbcUser="sa" jdbcPassword="sa" connectionPooling="false">

To…


<jp:mondrianQuery id="query01" jdbcDriver="com.mysql.jdbc.Driver" jdbcUrl="jdbc:mysql://localhost/foodmart?user=foodmart&password=pass.word" catalogUri="/WEB-INF/queries/FoodMart.xml">

sudo vi fourhier.jsp
sudo vi mondrian.jsp
sudo vi arrows.jsp
sudo vi colors.jsp
cd /opt/pentaho/biserver-ce/tomcat/bin
sudo ./shutdown.sh
sudo ./startup.sh

I have changed the default Pentaho BI Server port from 8080 to 8585.

Open a web browser to http://localhost:8585/mondrian.

Click each of the links below to view the same cube.

The cube will be populated for the first time when you click the first link below. In other words, the first page will take a long time to load while the cube is populating. On my system, I actually had to reboot and restart Tomcat before the pages displayed.

Mondrian Configuration Files

Nothing needs to be done to setup the FoodMart database, however you will need to edit the following configuration files when you create your own database.

mondrian.properties

cd /opt/pentaho/biserver-ce/tomcat/webapps/mondrian/WEB-INF
sudo cp mondrian.properties mondrian.properties.org
sudo vi mondrian.properties

datasources.xml

cd /opt/pentaho/biserver-ce/tomcat/webapps/mondrian/WEB-INF
sudo cp datasources.xml datasources.xml.org
sudo vi datasources.xml

Debugging

I was initially using the instructions at http://mondrian.pentaho.com/documentation/installation.php, however, I got the following error when I ran the jar command.

Exception in thread "main" java.lang.NoClassDefFoundError: org/olap4j/mdx/IdentifierSegment
	at mondrian.test.loader.MondrianFoodMartLoader.(MondrianFoodMartLoader.java:98)
Caused by: java.lang.ClassNotFoundException: org.olap4j.mdx.IdentifierSegment
	at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
	... 1 more
Could not find the main class: mondrian.test.loader.MondrianFoodMartLoader.  Program will exit.

The solution was to add the following missing .jar files to the class path:

  • commons-collections-3.1.jar
  • olap4j.jar

Introduction

The hadoop fs -ls command allows you to view the files and directories in your HDFS filesystem, much as the ls command works on Linux / OS X / *nix.

Default Home Directory in HDFS
A user’s home directory in HDFS is located at /user/userName. For example, my home directory is /user/akbar.

List the Files in Your Home Directory

hadoop fs -ls defaults to /user/userName, so you can leave the path blank to view the contents of your home directory.

hadoop fs -ls

Recursively List Files

The following command will recursively list all files in the /tmp/hadoop-yarn directory.

hadoop fs -ls -R /tmp/hadoop-yarn

Show List Output in Human Readable Format

Human readable format will show each file’s size, such as 1461, as 1.4k.

hadoop fs -ls -h /user/akbar/input

You will see output similar to:
-rw-r–r–   1 akbar akbar       1.4k 2012-06-25 16:45 /user/akbar/input/core-site.xml
-rw-r–r–   1 akbar akbar       1.8k 2012-06-25 16:45 /user/akbar/input/hdfs-site.xml
-rw-r–r–   1 akbar akbar       1.3k 2012-06-25 16:45 /user/akbar/input/mapred-site.xml
-rw-r–r–   1 akbar akbar       2.2k 2012-06-25 16:45 /user/akbar/input/yarn-site.xml

List Information About a Directory

By default, hadoop fs -ls shows the contents of a directory. But what if you want to view information about the directory, not the directory’s contents?

To show information about a directory, use the -d option.

hadoop fs -ls -d /user/akbar

Output:
drwxr-xr-x   – akbar akbar          0 2012-07-07 02:28 /user/akbar/

Compare the output above to the output without the -d option:
drwxr-xr-x   – akbar akbar          0 2012-06-25 16:45 /user/akbar/input
drwxr-xr-x   – akbar akbar          0 2012-06-25 17:09 /user/akbar/output
-rw-r–r–   1 akbar akbar          3 2012-07-07 02:28 /user/akbar/text.hdfs

Show the Usage Statement

hadoop fs -usage ls

The output will be:

Usage: hadoop fs [generic options] -ls [-d] [-h] [-R] [ ...]

Introduction

hdfs dfsadmin -metasave provides additional information compared to hdfs dfsadmin -report. With hdfs dfsadmin -metasave provides information about blocks, including>

  • blocks waiting for replication
  • blocks currently being replication
  • total number of blocks

hdfs dfsadmin -metasave filename.txt

Run the command with sudo -u hdfs prefixed to ensure you don’t get a permission denied error. CDH4 runs the namenode as the hdfs user by default. However if you have changed the

ssudo -u hdfs hdfs dfsadmin -metasave metasave-report.txt

You will see output similar to:


Created file metasave-report.txt on server hdfs://localhost:8020

The output above initially confused me as I thought the metasave report was saved to the HDFS filesystem. However, it’s stating the the metasave report is saved into the /var/log/hadoop-hdfs directory on localhost.

cd /var/log/hadoop-hdfs
cat metasave-report.txt

You will see output similar to:


58 files and directories, 17 blocks = 75 total
Live Datanodes: 1
Dead Datanodes: 0
Metasave: Blocks waiting for replication: 0
Mis-replicated blocks that have been postponed:
Metasave: Blocks being replicated: 0
Metasave: Blocks 0 waiting deletion from 0 datanodes.
Metasave: Number of datanodes: 1
127.0.0.1:50010 IN 247241674752(230.26 GB) 323584(316 KB) 0% 220983930880(205.81 GB) Sat Jul 14 18:52:49 PDT 2012

Introduction

hdfs dfsadmin -report outputs a brief report on the overall HDFS filesystem. It’s a userful command to quickly view how much disk is available, how many datanodes are running, and so on.

Command

Run the command with sudo -u hdfs prefixed to ensure you don’t get a permission denied error. CDH4 runs the namenode as the hdfs user by default. However if you have changed the

sudo -u hdfs hdfs dfsadmin -report

You will see output similar to:


Configured Capacity: 247241674752 (230.26 GB)
Present Capacity: 221027041280 (205.85 GB)
DFS Remaining: 221026717696 (205.85 GB)
DFS Used: 323584 (316 KB)
DFS Used%: 0%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)

Live datanodes:
Name: 127.0.0.1:50010 (localhost)
Hostname: freshstart
Decommission Status : Normal
Configured Capacity: 247241674752 (230.26 GB)
DFS Used: 323584 (316 KB)
Non DFS Used: 26214633472 (24.41 GB)
DFS Remaining: 221026717696 (205.85 GB)
DFS Used%: 0%
DFS Remaining%: 89.4%
Last contact: Sat Jul 14 18:07:18 PDT 2012

Depricated Command

hadoop dfsadmin -report is a deprecated command. If you enter hadoop dfsadmin -report, you will see the report with the following note at the top of the output.


DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

I’ve been “trying” to use Google Wallet for a couple of weeks now, and I’ve pretty much given up. When it works it’s great…but that’s the problem. You never know when it’ll actually work.

First, it’s anybody’s guess if an in-store reader will even work. The same reader may work in the morning, then completely fail to work in the evening. If you go to 2 different Peet’s locations then you may be able to buy a coffee with Google Wallet at one, and then you’ll have to pull out your credit card at the other.

Second, the Google Wallet app fails to open on the Galaxy Nexus from time to time, which is always fun when you have a long line behind you (solution: put the phone away and pull out a credit card).

Google is in the habit of releasing buggy software then iterating quickly. But Wallet is different, this is money they are working with and errors are not acceptable. Wallet is a great idea that’s poorly implemented.

With that said, I bid adieu to Wallet and start my wait for someone else to release a better digital credit card app.

Introduction

HBase is a tabular-oriented database that runs on top of HDFS. It is modeled on Google’s BigTable.

In this post, I’m going to install HBase in Pseudo mode, so please use these instructions for setting up a developer’s workstation, not for a production cluster.

When should you use HBase

HBase should be used when you need random read/write access to the data in Hadoop. While HBase gives you random seeks, it does so at the expense of performance vs. HDFS. Therefore, it is important to look at your workload and pick the correct solution for your specific requirements.

Install Zookeeper

Install Zookeeper before installing HBase.

Install Prerequisites

sudo apt-get install ntp libopts25

Installation

sudo apt-get install hbase

Let’s see what files were installed. I have written an HBase Files and Directories post that contains more information about what’s installed with the hbase package.

dpkg -L hbase | less
sudo apt-get install hbase-master

Next, we’ll stop the HBase Master.

sudo service hbase-master stop

Configure HBase to run in pseudo mode

Let’s check the hostname and port used by the HDFS Name Node.

grep -A 1 fs.default.name /etc/hadoop/conf.pseudo/core-site.xml | grep value

You should see output of:
<value>hdfs://localhost:8020</value>

cd /etc/hbase/conf; ls -l
sudo vi hbase-site.xml

Paste the following into hbase-site.xml, between <configuration> and </configuration>.


  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://localhost:8020/hbase</value>
  </property>

Add the /hbase directory to HDFS

Important
The following commands assume that you’ve followed the instructions in my post on how to Create a .bash_aliases file.

shmkdir /hbase
shchown hbase /hbase

Let’s check that the /hbase directory was created correctly in HDFS.

hls /

You should see output that includes a line for the /hbase directory.

Start the HBase Master

sudo service hbase-master start

Install an HBase Region Server

The HBase Region Server is started automatically when you install it in Ubuntu.

sudo apt-get install hbase-regionserver

Check that HBase is Setup Correctly

sudo /usr/lib/jvm/jdk1.6.0_31/bin/jps

You should see output similar to the following (look for QuorumPeerMain, NameNode, DataNode, HRegionServer, and HMaster):


1942   SecondaryNameNode
12783  QuorumPeerMain
1747   NameNode
1171   DataNode
15034  HRegionServer
14755  HMaster
2396   NodeManager
2497   ResourceManager
2152   JobHistoryServer
15441  Jps

Open http://localhost:60010 in a web browser to verify that the HBase Master was installed correctly.

If everything installed correctly then you should see the following:

  • In the Region Servers section, there should be one line for localhost.
  • In the Attributes section, you should see HBase Version = 0.92.1-cdh4.0.0.

Add the JDK 1.6.0 u31 Path to BigTop

This update is required as BigTop uses a fixed array approach to finding JAVA_HOME.

sudo vi /usr/lib/bigtop-utils/bigtop-detect-javahome

Add the following line just below the for candidate in \ line:

/usr/lib/jvm/jdk1.6.0_31 \

Update the hosts file

It’s likely that you’ll get an error due to the localhost loopback.

Update the /etc/hosts file (note: The page that contains these instructions was originally written during HBase debugging).

That’s it. You now have HBase installed and ready for use on a developer’s workstation/laptop.

Additional Reading

There are some additional configuration options for HBase, including:

Introduction

Zookeeper provides cluster management for Hadoop.

In this post, I’m going to install Zookeeper in Pseudo mode, so please use these instructions for setting up a developer’s workstation, not for a production cluster.

Installation

The zookeeper package should already be installed, but we’ll double check.

sudo apt-get install zookeeper

Next, we’ll install the Zookeeper Server.

sudo apt-get install zookeeper-server

The following files are now installed:
/etc/zookeeper/conf/zoo.cfg: Zookeeper configuration file

sudo service zookeeper-server stop
sudo service zookeeper-server start

If you have installed Zookeeper before installing HBase, you will see the following error message:

Using config: /etc/zookeeper/conf/zoo.cfg
ZooKeeper data directory is missing at /var/lib/zookeeper fix the path or run initialize
invoke-rc.d: initscript zookeeper-server, action "start" failed.

You need to initialize Zookeeper when it’s installed before HBase.

sudo service zookeeper-server init

Now you can start Zookeeper.

sudo service zookeeper-server start

Additional Reading

Introduction

You will need to know the location of binaries, configuration files, and libraries when working with HBase.

Directories

Configuration

/etc/hbase/conf is the location for all of HBase’s configuration files.

HBase uses Debian Alternatives, so there are a number of symlinks to the configuration files.

/etc/hbase/conf is a symlink to /etc/alternatives/hbase-conf.
/etc/alternatives/hbase-conf is a symlink to /etc/hbase/conf.dist

Logs

/var/log/hbase contains all of the HBase log files.

Files

Configuration Files

The following configuration files are located in /etc/hbase/conf

hadoop-metrics.properties

TODO

hbase-env.sh

TODO

hbase-policy.xml

TODO

hbase-site.xml

TODO

log4j.properties

TODO

regionservers

TODO

Introduction

You will need to know the location of binaries, configuration files, and libraries when working with Zookeeper.

Zookeeper 3.4.3 is a part of Cloudera Distribution Hadoop (CDH4).

Directories

/etc/zookeeper/conf

/etc/zookeeper/conf is the location for all of Zookeeper’s configuration files.

Zookeeper uses Debian Alternatives, so there are a number of symlinks to the configuration files.

/etc/zookeeper/conf is a symlink to /etc/alternatives/zookeeper-conf.
/etc/alternatives/zookeeper-conf is a symlink to /etc/zookeeper/conf.dist

Files

Configuration Files

The following configuration files are located in /etc/zookeeper/conf

configuration.xsl

log4j.properties

zoo.cfg

zoo.cfg is the main Zookeeper configuration file.

dataDir

dataDir specifies the directory where znode snapshot files and transaction logs are stored. These files are important as you will need them to recover data.

The files located in dataDir should be backed up regularly.

zoo_sample.cfg

A sample configuration file. One of the more interesting notes is about the autopurge.snapRetainCount configuration variable (http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance).

Init Files

zookeeper-server

Use the init script to start, stop, restart, check the status of zookeeper, and initial zookeeper.

Binaries and Scripts

/usr/lib/zookeeper/bin/zkCleanup.sh

Script that cleans up the files created in dataDir. This script should be modified per installation and should be added to cron for periodic cleanup.

%d bloggers like this: