We’re going to start with a very simple Pig script that reads a file that contains 2 numbers per line separated by a comma. The Pig script will first read the line, store each of the 2 numbers in separate variables, and will then add the numbers together.
Create the Sample Input File
Paste the following into pig-practice01.txt.
5 1 6 4 3 2 1 1 9 2 3 8
Create the Input and Output Directories in HDFS
We’re going to create 2 directories to store the input to and output from our first pig script.
hadoop fs -mkdir pig01-input
hadoop fs -mkdir pig01-output
Put Data File into HDFS
hadoop fs -put pig-practice01.txt pig01-input
Now, let’s check that our file was put from our local file system to HDFS correctly.
hadoop fs -ls pig01-input
hadoop fs -cat pig01-input/pig-practice01.txt
Write the Pig Latin Script
Paste the following code into practice01.pig.
/* Add 2 numbers together */ -- Load the practice file from HDFS A = LOAD 'pig01-input/pig-practice01.txt' USING PigStorage() AS (x:int, y:int); -- Add x and y B = FOREACH A GENERATE x + y; -- Show the output STORE B INTO 'pig01-output/results' USING PigStorage();
Run the Pig Script
View the Results
hadoop fs -ls pig01-output/results
The results are stored in the part* file.
hadoop fs -cat pig01-output/results/part-m-0000