Install Pyramid on Ubuntu 12.04 LTS in the Rackspace Cloud

Check the Installed Python Version

python --version

You should see the following output:

Python 2.7.3

Install Prerequisites

apt-get install python-setuptools python-pip python-virtualenv virtualenvwrapper

Install Prerequisites for Pyramid Speedups

apt-get install gcc cpp libc6-dev python2.7-dev

Install nginx

apt-get install nginx nginx-full nginx-common

Create a wwwuser that waitress (the web server) will run as

useradd wwwuser -d /home/wwwuser -k /etc/skel -m -s /bin/bash -U

Setup the Virtual Environment

mkdir -p /var/www/delixus.com
mkdir /var/www/environments
cd /var
chown -R wwwuser:wwwuser www

We are now going to change users to wwwuser user.

su - wwwuser
cd /var/www/environments
virtualenv env_delixus

Install Pyramid

You must perform the following steps as the wwwuser user.

cd /var/www/environments/env_delixus
source bin/activate

You should see the environment name as the prefix in the command prompt, such as:

(env_delixus)wwwuser@ws2:

easy_install Pyramid
pip install waitress

Checkout the Pyramid Project

cd /var/www/delixus.com

Change the SVN checkout command to something that matches your server. If you use git, then change appropriately.

svn checkout https://repo.company.com/source/delixus/tags/1.0 .

Install the delixus.com Pyramid project

cd /var/www/delixus.com/delixus
vi production.ini

Under [app:main], add a [server:main] configuration as follows:


# http://docs.pylonsproject.org/projects/waitress/en/latest/arguments.html
[server:main]
use = egg:waitress#main
host = 127.0.0.1
port = %(http_port)s
# default # of threads = 4
threads = 8
url_scheme = http

I don’t think you need to install the development version of the site, but it seems to be the only way that I get everything to work while debugging…go figure.

python setup.py develop
pserve development.ini

Then open the site in a text-based web browser.

links http://localhost:6543

You should be able to view your site at this point.

Now, let’s install the production version of the site.

python setup.py install

Start Waitress

First we’re going to start and test waitress, then we’ll start it as a deamon.

pserve production.ini http_port=5000
links http://localhost:5000

Again, you should be able to view your site.

pserve production.ini start --daemon --pid-file=/var/www/5000.pid \
--log-file=/var/www/5000.log --monitor-restart http_port=5000
pserve production.ini start --daemon --pid-file=/var/www/5001.pid \
--log-file=/var/www/5001.log --monitor-restart http_port=5001

Check the waitress process.

ps -ef | grep pserve

You should see the pserve process running.

Configure nginx as a Proxy for Waitress

The following steps must be performed as root.

cd /etc/nginx/sites-available
vi delixus

Paste the following into the delixus.conf file.


upstream delixus-site {
    server 127.0.0.1:5000;
    server 127.0.0.1:5001;
}

server {
    listen 80;
    server_name  localhost www.delixus.com delixus.com;

    access_log  /var/log/nginx/delixus.com-access.log;

    location / {
        proxy_set_header        Host $host;
        proxy_set_header        X-Real-IP $remote_addr;
        proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header        X-Forwarded-Proto $scheme;

        client_max_body_size    10m;
        client_body_buffer_size 128k;
        proxy_connect_timeout   60s;
        proxy_send_timeout      90s;
        proxy_read_timeout      90s;
        proxy_buffering         off;
        proxy_temp_file_write_size 64k;
        proxy_pass http://delixus-site;
        proxy_redirect          off;
    }

    location /static {
        root            /var/www/delixus.com/delixus/delixus;
        expires         30d;
        add_header      Cache-Control public;
        access_log      off;
    }
}

rm /etc/nginx/sites-enabled/default
ln -s /etc/nginx/sites-available/delixus /etc/nginx/sites-enabled/delixus
service nginx stop
service nginx start

A good next step at this point is to setup Supervisor to control pserve/waitress.

Find and Replace Text with sed

Introduction

sed provides a quick and easy way to find and replace text via it’s search command (‘s’).

Sample File

Copy and paste the following text into a file named practice01.txt.


Author: Akbar S. Ahmed
Date: July 1, 2012
Subject: Sed

sed is an extremely useful Unix/Linux/*nix utility that allows you to manipulate a text stream. It is useful when working with Hadoop, as sed is often used to manipulate text prior to MapReduce.

sed practice

name Akbar
state California
state CA
OS Linux, OS X, Windows
blog http://akbarahmed.com

Substitution (Find and Replace)

The main sed command that you’ll use frequently is s, which stands for substitute.

Let’s start with a basic example.

Substitute Linux with Ubuntu

sed -i 's/Linux/Ubuntu/' practice01.txt

If you’re using a Mac, then you’ll need to adjust the command listed above to work with the BSD version of sed. Fortunately, this command also works in Ubuntu.

sed 's/Linux/Ubuntu/' practice01.txt > practice01-output.txt

Let’s check our work.

cat practice01-output.txt

It’s important to understand each component of a command, including the options. In our command above we used the following:

  • sed: This is the sed utility
  • -i: “In place”. -i means edit and save changes to the same file. In the two commands above, you’ll notice that we have to use > somefile to redirect the output when we don’t use -i.
  • s: Substitute. The first word (ex. Linux) is the word we want to search and replace with the second word (ex. Ubuntu).

Substitute all instances of a word

By default, sed only replaces the first instance of a word on a given line.

Create a new file named practice02.txt by running the following command.

echo "sed is a stream editor. sed is a stream editor." > practice02.txt

Let’s begin by using the command we already learned to change ‘sed’ to ‘vi’.

sed 's/sed/vi/' practice02.txt > practice02-output.txt
cat practice02-output.txt

You should see output that looks like the following:
vi is a stream editor. sed is a stream editor.

Notice how only the first instance of ‘sed’ was changed to ‘vi’.

Let’s create a new practice file by running the following command. This time we’ll create 3 lines with the same text, and we’ll append a ‘cat’ command so that we can immediately see the contents of our file.

for i in 1 2 3; do echo "editorX is a stream editor. editorX is a stream editor." >> practice03.txt; done; cat practice03.txt

To make a global substitution (find and replace all), we need to add the ‘g’ command to ‘s’.

sed 's/editorX/editorY/g' practice03.txt > practice03-output.txt
cat practice03-output.txt

Limiting which lines are edited

sed allows us to easy control which lines are edited. For example, if our data has a header row in the first row, then we can limit editing to only the first row.

sed '1s/editorX/myEditor/g' practice03.txt > practice03a-output.txt
cat practice03a-output.txt

Let’s now edit lines 2 to 3 only.

sed '2,3s/editorX/yourEditor/g' practice03.txt > practice03b-output.txt
cat practice03b-output.txt

Wrap every line in double quotes

This next command is important because it higlights the fact that you can use regex with sed. In fact, the use of regex with sed provides you with an extremely powerful tool to edit files.

sed 's/.*/"&"/g' practice03.txt > practice03c-output.txt

While this post provides a quick into to sed, it’ll be worth your while to learn it in detail as sed is a core part of Linux’s text processing capabilities. Further, sed is an extremely useful tool to preprocess files before submitting them to a MapReduce job in Hadoop.

What is sed?

Introduction

sed is short for Stream EDitor, which is a utility that allow you to parse and transform text one line at a time. sed is a useful tool, along with grep and awk, when manipulating text files. It is also often overlooked when working with Hadoop, although the use of sed, awk and grep can help speed up processing times by preprocessing text before sending it to a MapReduce job.

Install JDK 7 u5 on Ubuntu 12.04 LTS (as a secondary JDK)

Introduction

I had installed JDK 6.0 update 31 in an earlier post. However, I now need to write a Java application that requires the features available in JDK 7.

In this post, I will install JDK 7 update 5 as a secondary JDK, while JDK 6.0 u31 will be the primary JDK. It’s perfectly normal to have multiple JDKs on a single machine to support the requirements of different applications. Fortunately, it’s easy to use a different JDK on a per application basis.

Download

I have a 64 bit version of Ubuntu 12.04 LTS installed, so the instructions below only apply to this OS.

  1. Download the Java JDK from http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1637583.html.
  2. Click Accept License Agreement
  3. Click dk-7u5-linux-x64.tar.gz
  4. Login to Oracle.com with your Oracle account
  5. Download the JDK to your ~/Downloads directory
  6. After downloading, open a terminal, then enter the following commands.

Installation

Open a terminal, then enter the following commands:

cd ~/Downloads
tar -xzf jdk-7u5-linux-x64.tar.gz

Note:
The jvm directory is used to organize all JDK/JVM versions in a single parent directory. As this is our 2nd JDK, we’ll assume that the jvm directory already exists.

sudo mv jdk1.7.0_05 /usr/lib/jvm

The next 3 commands are split across 2 lines per command due to width limits in the blog’s theme.

sudo update-alternatives --install "/usr/bin/java" "java"  \
	"/usr/lib/jvm/jdk1.7.0_05/bin/java" 2
sudo update-alternatives --install "/usr/bin/javac" "javac"  \
	"/usr/lib/jvm/jdk1.7.0_05/bin/javac" 2
sudo update-alternatives --install "/usr/bin/javaws" "javaws"  \
	"/usr/lib/jvm/jdk1.7.0_05/bin/javaws" 2
sudo update-alternatives --config java

You will see output similar to the following (although it’ll differ on your system). Read through the list and find the number for the Oracle JDK installation (/usr/lib/jvm/jdk1.7.0_05/bin/java)

There are 2 choices for the alternative java (providing /usr/bin/java).

  Selection    Path                               Priority   Status
------------------------------------------------------------
* 0            /usr/lib/jvm/jdk1.7.0_05/bin/java   2         auto mode
  1            /usr/lib/jvm/jdk1.6.0_31/bin/java   1         manual mode
  2            /usr/lib/jvm/jdk1.7.0_05/bin/java   2         manual mode

Press enter to keep the current choice[*], or type selection number:

On my system I did entered 1 to keep JDK 1.6.0 u31 as my primary JDK (change the number that is appropriate for your system). To enter 1, press 1 on the keyboard, then press Enter.

sudo update-alternatives --config javac
There are 2 choices for the alternative javac (providing /usr/bin/javac).

  Selection    Path                                Priority   Status
------------------------------------------------------------
* 0            /usr/lib/jvm/jdk1.7.0_05/bin/javac   2         auto mode
  1            /usr/lib/jvm/jdk1.6.0_31/bin/javac   1         manual mode
  2            /usr/lib/jvm/jdk1.7.0_05/bin/javac   2         manual mode

Press enter to keep the current choice[*], or type selection number:

I entered 1 then pressed Enter to keep JDK 1.6.0 u31 as my primary javac command.

sudo update-alternatives --config javaws
There are 2 choices for the alternative javaws (providing /usr/bin/javaws).

  Selection    Path                                 Priority   Status
------------------------------------------------------------
* 0            /usr/lib/jvm/jdk1.7.0_05/bin/javaws   2         auto mode
  1            /usr/lib/jvm/jdk1.6.0_31/bin/javaws   1         manual mode
  2            /usr/lib/jvm/jdk1.7.0_05/bin/javaws   2         manual mode

Press enter to keep the current choice[*], or type selection number:

I entered 1 then pressed Enter to keep JDK 1.6.0 u31 as my primary javaws command.

As a final step, let’s test each of the commands to ensure everything is setup correctly.

java -version

The output should be:
java version "1.6.0_31"
Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)

javac -version

The output should be:
javac 1.6.0_31

javaws -version

The output should be:
Java(TM) Web Start 1.6.0_31, which is followed by a long usage message.

That’s it, the JDK 7 u5 is installed.

Install MySQL 5.5 on Ubuntu 12.04 LTS

Introduction

Installing the MySQL package on Ubuntu is extremely simple.

Installation

Open a terminal and enter the following commands.

sudo apt-get install mysql-client mysql-navigator mysql-server

Type Y to accept the additional packages. Press Enter.

After downloading and during installation, the MySQL configuration dialogs will display in the terminal.

In the first dialog, press Enter.

Enter a password for the MySQL root user. Press Enter.

Reenter the root password. Press Enter.

That’s it, MySQL is now installed and ready for use.

Create a .bash_aliases file

Introduction

This is my personal .bash_aliases file that is mainly used for Cloudera CDH4 (Hadoop) and Pentaho. As a result, many of my aliases are specific to these software packages.

I plan to update this post as my .bash_aliases file expands. I will also push my .bash_aliases file into Git to make it easier to keep up with changes to the file.

How to create a .bash_aliases file

vi ~/.bash_aliases

Paste the following into the file.


# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Personal: ~/.bash_aliases
# Akbar S. Ahmed
#
# Last modified: 2012.06.25
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# ———————————————–
# General
# ———————————————–

alias c=’clear’ # Clear the screen
alias df=’df -Th’ # Disk free space
alias du=’du -h’ # Disk usage
alias h=’history’ # Bash history
alias j=’jobs -l’ # Current running jobs

# ———————————————–
# ls
# ———————————————–

alias lx=’ls -lXB’ # Sort by extension
alias lk=’ls -lSr’ # Sort by size (small to big)
alias lc=’ls -ltcr’ # Sort by change time (old to new)
alias lu=’ls -ltur’ # Sort by change time (new to old)
alias lt=’ls -ltr’ # Sort by date (old to new)

# ———————————————–
# Hadoop Admin (sudo)
# ———————————————–

alias shcat=’sudo -u hdfs hadoop fs -cat’ # Output a file to standard out
alias shchown=’sudo -u hdfs hadoop fs -chown’ # Change ownership
alias shchmod=’sudo -u hdfs hadoop fs -chmod’ # Change permissions
alias shls=’sudo -u hdfs hadoop fs -ls’ # List files
alias shmkdir=’sudo -u hdfs hadoop fs -mkdir’ # Make a directory

# ———————————————–
# Hadoop (regular user)
# ———————————————–

alias hcat=’hadoop fs -cat’ # Output a file to standard out
alias hchown=’hadoop fs -chown’ # Change ownership
alias hchmod=’hadoop fs -chmod’ # Change permissions
alias hls=’hadoop fs -ls’ # List files
alias hmkdir=’hadoop fs -mkdir’ # Make a directory

source ~/.bash_aliases

Install Pentaho Design Studio 4.0 on Ubuntu 12.04 LTS Desktop

Introduction

Pentaho Design Studio (PDS) is a BI plugin for Eclipse. I’m going to download the complete package as Pentaho was nice enough to integrate the plugin with Eclipse for us.

Download

To download the Pentaho Design Studio (PDS) either run the following command, or follow the bulleted steps below.

wget http://downloads.sourceforge.net/project/pentaho/Design%20Studio/4.0.0-stable/pds-ce-linux-64-4.0.0-stable.tar.gz

Or follow the steps below if you don’t want to use the wget command shown above.

Installation

Note:
I am going to assume that you have downloaded the file listed above into the Downloads directory in your Home directory.

Open a terminal and enter the following commands:

cd
mkdir bin
cd ~/Downloads
tar -xzf pds-ce-linux-64-4.0.0-stable.tar.gz
mv design-studio/ ~/bin/pds-ce-linux-64-4.0.0
cd ~/bin
ln -s pds-ce-linux-64-4.0.0 design-studio
vi ~/.profile

Near the bottom of the file you should see the PATH variable. Append :$HOME/bin/design-studio to end of the PATH.

For example, my PATH was:
PATH="$HOME/bin:$PATH"

…which I updated to:
PATH="$HOME/bin:$PATH:$HOME/bin/design-studio"

It’s better to append :$HOME/bin/design-studio to end of the PATH than the beginning so that we don’t accidentally step on another installation of Eclipse. Also, as we created a symlink named pds we are less likely to have PDS inaccessible due to another Eclipse installation that is earlier in the PATH.

Next, we’ll create a symlink named pds so that we can type a shorter command to open Pentaho Design Studio.

cd ~/bin/design-studio
ln -s eclipse pds

Finally, source your profile to update your environment.

source ~/.profile

Now just type pds and press the Enter key:

pds