How to add numbers with Pig

Introduction

We’re going to start with a very simple Pig script that reads a file that contains 2 numbers per line separated by a comma. The Pig script will first read the line, store each of the 2 numbers in separate variables, and will then add the numbers together.

Create the Sample Input File

cd
vi pig-practice01.txt

Paste the following into pig-practice01.txt.

5	1
6	4
3	2
1	1
9	2
3	8

Create the Input and Output Directories in HDFS

We’re going to create 2 directories to store the input to and output from our first pig script.

hadoop fs -mkdir pig01-input
hadoop fs -mkdir pig01-output

Put Data File into HDFS

hadoop fs -put pig-practice01.txt pig01-input

Now, let’s check that our file was put from our local file system to HDFS correctly.

hadoop fs -ls pig01-input
hadoop fs -cat pig01-input/pig-practice01.txt

Write the Pig Latin Script

vi practice01.pig

Paste the following code into practice01.pig.

/*
Add 2 numbers together
*/

-- Load the practice file from HDFS
A = LOAD 'pig01-input/pig-practice01.txt' USING PigStorage() AS (x:int, y:int);

-- Add x and y 
B = FOREACH A GENERATE x + y;

-- Show the output
STORE B INTO 'pig01-output/results' USING PigStorage();

Run the Pig Script

pig practice01.pig

View the Results

hadoop fs -ls pig01-output/results

The results are stored in the part* file.

hadoop fs -cat pig01-output/results/part-m-0000

Additional Reading

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s