Find and Replace Text with sed


sed provides a quick and easy way to find and replace text via it’s search command (‘s’).

Sample File

Copy and paste the following text into a file named practice01.txt.

Author: Akbar S. Ahmed
Date: July 1, 2012
Subject: Sed

sed is an extremely useful Unix/Linux/*nix utility that allows you to manipulate a text stream. It is useful when working with Hadoop, as sed is often used to manipulate text prior to MapReduce.

sed practice

name Akbar
state California
state CA
OS Linux, OS X, Windows

Substitution (Find and Replace)

The main sed command that you’ll use frequently is s, which stands for substitute.

Let’s start with a basic example.

Substitute Linux with Ubuntu

sed -i 's/Linux/Ubuntu/' practice01.txt

If you’re using a Mac, then you’ll need to adjust the command listed above to work with the BSD version of sed. Fortunately, this command also works in Ubuntu.

sed 's/Linux/Ubuntu/' practice01.txt > practice01-output.txt

Let’s check our work.

cat practice01-output.txt

It’s important to understand each component of a command, including the options. In our command above we used the following:

  • sed: This is the sed utility
  • -i: “In place”. -i means edit and save changes to the same file. In the two commands above, you’ll notice that we have to use > somefile to redirect the output when we don’t use -i.
  • s: Substitute. The first word (ex. Linux) is the word we want to search and replace with the second word (ex. Ubuntu).

Substitute all instances of a word

By default, sed only replaces the first instance of a word on a given line.

Create a new file named practice02.txt by running the following command.

echo "sed is a stream editor. sed is a stream editor." > practice02.txt

Let’s begin by using the command we already learned to change ‘sed’ to ‘vi’.

sed 's/sed/vi/' practice02.txt > practice02-output.txt
cat practice02-output.txt

You should see output that looks like the following:
vi is a stream editor. sed is a stream editor.

Notice how only the first instance of ‘sed’ was changed to ‘vi’.

Let’s create a new practice file by running the following command. This time we’ll create 3 lines with the same text, and we’ll append a ‘cat’ command so that we can immediately see the contents of our file.

for i in 1 2 3; do echo "editorX is a stream editor. editorX is a stream editor." >> practice03.txt; done; cat practice03.txt

To make a global substitution (find and replace all), we need to add the ‘g’ command to ‘s’.

sed 's/editorX/editorY/g' practice03.txt > practice03-output.txt
cat practice03-output.txt

Limiting which lines are edited

sed allows us to easy control which lines are edited. For example, if our data has a header row in the first row, then we can limit editing to only the first row.

sed '1s/editorX/myEditor/g' practice03.txt > practice03a-output.txt
cat practice03a-output.txt

Let’s now edit lines 2 to 3 only.

sed '2,3s/editorX/yourEditor/g' practice03.txt > practice03b-output.txt
cat practice03b-output.txt

Wrap every line in double quotes

This next command is important because it higlights the fact that you can use regex with sed. In fact, the use of regex with sed provides you with an extremely powerful tool to edit files.

sed 's/.*/"&"/g' practice03.txt > practice03c-output.txt

While this post provides a quick into to sed, it’ll be worth your while to learn it in detail as sed is a core part of Linux’s text processing capabilities. Further, sed is an extremely useful tool to preprocess files before submitting them to a MapReduce job in Hadoop.

What is sed?


sed is short for Stream EDitor, which is a utility that allow you to parse and transform text one line at a time. sed is a useful tool, along with grep and awk, when manipulating text files. It is also often overlooked when working with Hadoop, although the use of sed, awk and grep can help speed up processing times by preprocessing text before sending it to a MapReduce job.

Install JDK 7 u5 on Ubuntu 12.04 LTS (as a secondary JDK)


I had installed JDK 6.0 update 31 in an earlier post. However, I now need to write a Java application that requires the features available in JDK 7.

In this post, I will install JDK 7 update 5 as a secondary JDK, while JDK 6.0 u31 will be the primary JDK. It’s perfectly normal to have multiple JDKs on a single machine to support the requirements of different applications. Fortunately, it’s easy to use a different JDK on a per application basis.


I have a 64 bit version of Ubuntu 12.04 LTS installed, so the instructions below only apply to this OS.

  1. Download the Java JDK from
  2. Click Accept License Agreement
  3. Click dk-7u5-linux-x64.tar.gz
  4. Login to with your Oracle account
  5. Download the JDK to your ~/Downloads directory
  6. After downloading, open a terminal, then enter the following commands.


Open a terminal, then enter the following commands:

cd ~/Downloads
tar -xzf jdk-7u5-linux-x64.tar.gz

The jvm directory is used to organize all JDK/JVM versions in a single parent directory. As this is our 2nd JDK, we’ll assume that the jvm directory already exists.

sudo mv jdk1.7.0_05 /usr/lib/jvm

The next 3 commands are split across 2 lines per command due to width limits in the blog’s theme.

sudo update-alternatives --install "/usr/bin/java" "java"  \
	"/usr/lib/jvm/jdk1.7.0_05/bin/java" 2
sudo update-alternatives --install "/usr/bin/javac" "javac"  \
	"/usr/lib/jvm/jdk1.7.0_05/bin/javac" 2
sudo update-alternatives --install "/usr/bin/javaws" "javaws"  \
	"/usr/lib/jvm/jdk1.7.0_05/bin/javaws" 2
sudo update-alternatives --config java

You will see output similar to the following (although it’ll differ on your system). Read through the list and find the number for the Oracle JDK installation (/usr/lib/jvm/jdk1.7.0_05/bin/java)

There are 2 choices for the alternative java (providing /usr/bin/java).

  Selection    Path                               Priority   Status
* 0            /usr/lib/jvm/jdk1.7.0_05/bin/java   2         auto mode
  1            /usr/lib/jvm/jdk1.6.0_31/bin/java   1         manual mode
  2            /usr/lib/jvm/jdk1.7.0_05/bin/java   2         manual mode

Press enter to keep the current choice[*], or type selection number:

On my system I did entered 1 to keep JDK 1.6.0 u31 as my primary JDK (change the number that is appropriate for your system). To enter 1, press 1 on the keyboard, then press Enter.

sudo update-alternatives --config javac
There are 2 choices for the alternative javac (providing /usr/bin/javac).

  Selection    Path                                Priority   Status
* 0            /usr/lib/jvm/jdk1.7.0_05/bin/javac   2         auto mode
  1            /usr/lib/jvm/jdk1.6.0_31/bin/javac   1         manual mode
  2            /usr/lib/jvm/jdk1.7.0_05/bin/javac   2         manual mode

Press enter to keep the current choice[*], or type selection number:

I entered 1 then pressed Enter to keep JDK 1.6.0 u31 as my primary javac command.

sudo update-alternatives --config javaws
There are 2 choices for the alternative javaws (providing /usr/bin/javaws).

  Selection    Path                                 Priority   Status
* 0            /usr/lib/jvm/jdk1.7.0_05/bin/javaws   2         auto mode
  1            /usr/lib/jvm/jdk1.6.0_31/bin/javaws   1         manual mode
  2            /usr/lib/jvm/jdk1.7.0_05/bin/javaws   2         manual mode

Press enter to keep the current choice[*], or type selection number:

I entered 1 then pressed Enter to keep JDK 1.6.0 u31 as my primary javaws command.

As a final step, let’s test each of the commands to ensure everything is setup correctly.

java -version

The output should be:
java version "1.6.0_31"
Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)

javac -version

The output should be:
javac 1.6.0_31

javaws -version

The output should be:
Java(TM) Web Start 1.6.0_31, which is followed by a long usage message.

That’s it, the JDK 7 u5 is installed.

Install MySQL 5.5 on Ubuntu 12.04 LTS


Installing the MySQL package on Ubuntu is extremely simple.


Open a terminal and enter the following commands.

sudo apt-get install mysql-client mysql-navigator mysql-server

Type Y to accept the additional packages. Press Enter.

After downloading and during installation, the MySQL configuration dialogs will display in the terminal.

In the first dialog, press Enter.

Enter a password for the MySQL root user. Press Enter.

Reenter the root password. Press Enter.

That’s it, MySQL is now installed and ready for use.

Create a .bash_aliases file


This is my personal .bash_aliases file that is mainly used for Cloudera CDH4 (Hadoop) and Pentaho. As a result, many of my aliases are specific to these software packages.

I plan to update this post as my .bash_aliases file expands. I will also push my .bash_aliases file into Git to make it easier to keep up with changes to the file.

How to create a .bash_aliases file

vi ~/.bash_aliases

Paste the following into the file.

# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Personal: ~/.bash_aliases
# Akbar S. Ahmed
# Last modified: 2012.06.25
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# ———————————————–
# General
# ———————————————–

alias c=’clear’ # Clear the screen
alias df=’df -Th’ # Disk free space
alias du=’du -h’ # Disk usage
alias h=’history’ # Bash history
alias j=’jobs -l’ # Current running jobs

# ———————————————–
# ls
# ———————————————–

alias lx=’ls -lXB’ # Sort by extension
alias lk=’ls -lSr’ # Sort by size (small to big)
alias lc=’ls -ltcr’ # Sort by change time (old to new)
alias lu=’ls -ltur’ # Sort by change time (new to old)
alias lt=’ls -ltr’ # Sort by date (old to new)

# ———————————————–
# Hadoop Admin (sudo)
# ———————————————–

alias shcat=’sudo -u hdfs hadoop fs -cat’ # Output a file to standard out
alias shchown=’sudo -u hdfs hadoop fs -chown’ # Change ownership
alias shchmod=’sudo -u hdfs hadoop fs -chmod’ # Change permissions
alias shls=’sudo -u hdfs hadoop fs -ls’ # List files
alias shmkdir=’sudo -u hdfs hadoop fs -mkdir’ # Make a directory

# ———————————————–
# Hadoop (regular user)
# ———————————————–

alias hcat=’hadoop fs -cat’ # Output a file to standard out
alias hchown=’hadoop fs -chown’ # Change ownership
alias hchmod=’hadoop fs -chmod’ # Change permissions
alias hls=’hadoop fs -ls’ # List files
alias hmkdir=’hadoop fs -mkdir’ # Make a directory

source ~/.bash_aliases

Install Pentaho Design Studio 4.0 on Ubuntu 12.04 LTS Desktop


Pentaho Design Studio (PDS) is a BI plugin for Eclipse. I’m going to download the complete package as Pentaho was nice enough to integrate the plugin with Eclipse for us.


To download the Pentaho Design Studio (PDS) either run the following command, or follow the bulleted steps below.


Or follow the steps below if you don’t want to use the wget command shown above.


I am going to assume that you have downloaded the file listed above into the Downloads directory in your Home directory.

Open a terminal and enter the following commands:

mkdir bin
cd ~/Downloads
tar -xzf pds-ce-linux-64-4.0.0-stable.tar.gz
mv design-studio/ ~/bin/pds-ce-linux-64-4.0.0
cd ~/bin
ln -s pds-ce-linux-64-4.0.0 design-studio
vi ~/.profile

Near the bottom of the file you should see the PATH variable. Append :$HOME/bin/design-studio to end of the PATH.

For example, my PATH was:

…which I updated to:

It’s better to append :$HOME/bin/design-studio to end of the PATH than the beginning so that we don’t accidentally step on another installation of Eclipse. Also, as we created a symlink named pds we are less likely to have PDS inaccessible due to another Eclipse installation that is earlier in the PATH.

Next, we’ll create a symlink named pds so that we can type a shorter command to open Pentaho Design Studio.

cd ~/bin/design-studio
ln -s eclipse pds

Finally, source your profile to update your environment.

source ~/.profile

Now just type pds and press the Enter key:


Install Pentaho BI Server 4.5 on Ubuntu 12.04 LTS Desktop

Overview: What is Pentaho?

Pentaho is an open source Business Intelligence (BI) Suite that comes in with either commercial support ( and or community support (

This post provides instructions for the Pentaho community edition suite.

Create a pentaho user and group

Open a terminal and run the following commands:

sudo addgroup pentaho
sudo adduser --system --ingroup pentaho --disabled-login pentaho

Install Java

Follow the JDK installation instructions that are listed in the following post: Install Java JDK 6.0 update 31 on Ubuntu 12.04 LTS

Install the Pentaho BI Server

Now that we have Java installed we can get on with our main task of installing the Pentaho BI Server.

  1. Download the Pentaho BI Server from I’m using the current stable build for x64 Linux which is biserver-ce-4.5.0-stable.tar.gz.
  2. Open a terminal and enter the following commands:
sudo mkdir /opt/pentaho
cd ~/Downloads
gunzip biserver-ce-4.5.0-stable.tar.gz
tar xf biserver-ce-4.5.0-stable.tar
sudo mv biserver-ce /opt/pentaho/biserver-ce-4.5.0
sudo mv administration-console /opt/pentaho/administration-console-ce-4.5.0
cd /opt/pentaho
sudo ln -s biserver-ce-4.5.0 biserver-ce
sudo ln -s administration-console-ce-4.5.0 administration-console
sudo chown -R pentaho:pentaho /opt/pentaho

Start the Pentaho Server

Open a terminal, then enter the following commands:

The following command is only required if you downloaded a Windows .zip file by accident. If this is the case, then none of the .sh files will be executable.
sudo find /opt/pentaho/ -type f -name '*.sh' -exec chmod 744 '{}' \+

cd /opt/pentaho/biserver-ce
sudo -u pentaho ./

Login to the Pentaho User Console

  1. Open a web browser to http://localhost:8080.
  2. Click Evaluation Login and select a user type to login as.

Login to the Pentaho Administration Console

Open a terminal, then enter the following commands:

cd /opt/pentaho/administration-console
sudo -u pentaho ./
  1. Open a web browser to http://localhost:8099.
  2. Enter a User Name of admin.
  3. Enter a Password of password.
  4. Click Log In.

That’s it. The core Pentaho BI server is installed and ready for development. However, a good next step is to change the database that Pentaho uses and install the Pentaho Design Studio (PDS), but we’ll leave that for future posts.

Hadoop also port 8080, so you will either need to use a different port for the Pentaho User Console or change the Hadoop MapReduce ShuffleHandler port.

Important update to earlier post on MySQL Installation

I have updated an earlier post on how to install MySQL 5.5 on Ubuntu 10.04 LTS.

Importantly, the new instructions include how to add the MySQL 5.5 libs to the loader path. This is very important as it’s highly likely that you’ll build software that depends on these MySQL libraries.

The updated instructions can be viewed at: MySQL 5.5 on Ubuntu 10.04 LTS.

Change the default ssh port on Ubuntu

Changing the default port of ssh is not a huge improvement in security, but I’ve found it to be a useful tool in keeping log files free from failed login attempts with username root on port 22 (and I hope you do spend the time to review your log files!). A large number of scripts run scans on the default ssh port of 22 looking for known vulnerabilities. Of course, you should keep ssh fully patched, however rapidly growing log files is a problem all its own.

One of the easiest ways to keep your log files from filling up with failed login attempts is to change the ssh port.

vi /etc/ssh/sshd_config

Update the port to a new value, such as:

Port 876

Once you’ve updated sshd, you may also which to update ssh for convenience:

vi /etc/ssh/ssh_config

Uncomment the line with Port, and set it to the same value that you set in the sshd_config file:

Port 876

Lastly, reload the sshd daemon:

/etc/init.d/ssh reload

Open a 2nd ssh session to the server to ensure everything is working.

I recommend you keep the original session open in case you get something wrong in your configuration.

Install Python 2.7.2 from source on Ubuntu 10.04 LTS

The first thing I did was to create a wwwuser that I plan to run pyramid under. As a result, I am intentionally installing Python 2.7.2 under only 1 user account, and am leaving the system wide python installation unchanged.

useradd wwwuser
passwd wwwuser
cd /home
mkdir wwwuser
chown wwwuser:wwwuser wwwuser

Copy all of the hidden files into the /home/wwwuser folder. I did this from my desktop files.

vi /etc/passwd
Update the shell to be:

su - wwwuser
ln -s .profile .bash_profile
mkdir bin
mv Python-2.7.2.tgz bin
cd bin
tar -xzf Python-2.7.2.tgz
cd Python-2.7.2
./configure --prefix /home/wwwuser/bin/Python-2.7.2
make install

vi ~/.profile
Update path to:

source ~/.profile

which python

python --version

The output of the python --versioncommand should now be Python 2.7.2.