sed provides a quick and easy way to find and replace text via it’s search command (‘s’).
Copy and paste the following text into a file named practice01.txt.
Author: Akbar S. Ahmed
Date: July 1, 2012
sed is an extremely useful Unix/Linux/*nix utility that allows you to manipulate a text stream. It is useful when working with Hadoop, as sed is often used to manipulate text prior to MapReduce.
OS Linux, OS X, Windows
Substitution (Find and Replace)
The main sed command that you’ll use frequently is s, which stands for substitute.
Let’s start with a basic example.
Substitute Linux with Ubuntu
sed -i 's/Linux/Ubuntu/' practice01.txt
If you’re using a Mac, then you’ll need to adjust the command listed above to work with the BSD version of sed. Fortunately, this command also works in Ubuntu.
sed 's/Linux/Ubuntu/' practice01.txt > practice01-output.txt
Let’s check our work.
It’s important to understand each component of a command, including the options. In our command above we used the following:
- sed: This is the sed utility
- -i: “In place”. -i means edit and save changes to the same file. In the two commands above, you’ll notice that we have to use > somefile to redirect the output when we don’t use -i.
- s: Substitute. The first word (ex. Linux) is the word we want to search and replace with the second word (ex. Ubuntu).
Substitute all instances of a word
By default, sed only replaces the first instance of a word on a given line.
Create a new file named practice02.txt by running the following command.
echo "sed is a stream editor. sed is a stream editor." > practice02.txt
Let’s begin by using the command we already learned to change ‘sed’ to ‘vi’.
sed 's/sed/vi/' practice02.txt > practice02-output.txt
You should see output that looks like the following:
vi is a stream editor. sed is a stream editor.
Notice how only the first instance of ‘sed’ was changed to ‘vi’.
Let’s create a new practice file by running the following command. This time we’ll create 3 lines with the same text, and we’ll append a ‘cat’ command so that we can immediately see the contents of our file.
for i in 1 2 3; do echo "editorX is a stream editor. editorX is a stream editor." >> practice03.txt; done; cat practice03.txt
To make a global substitution (find and replace all), we need to add the ‘g’ command to ‘s’.
sed 's/editorX/editorY/g' practice03.txt > practice03-output.txt
Limiting which lines are edited
sed allows us to easy control which lines are edited. For example, if our data has a header row in the first row, then we can limit editing to only the first row.
sed '1s/editorX/myEditor/g' practice03.txt > practice03a-output.txt
Let’s now edit lines 2 to 3 only.
sed '2,3s/editorX/yourEditor/g' practice03.txt > practice03b-output.txt
Wrap every line in double quotes
This next command is important because it higlights the fact that you can use regex with sed. In fact, the use of regex with sed provides you with an extremely powerful tool to edit files.
sed 's/.*/"&"/g' practice03.txt > practice03c-output.txt
While this post provides a quick into to sed, it’ll be worth your while to learn it in detail as sed is a core part of Linux’s text processing capabilities. Further, sed is an extremely useful tool to preprocess files before submitting them to a MapReduce job in Hadoop.