How to view files in HDFS (hadoop fs -ls)

Introduction

The hadoop fs -ls command allows you to view the files and directories in your HDFS filesystem, much as the ls command works on Linux / OS X / *nix.

Default Home Directory in HDFS
A user’s home directory in HDFS is located at /user/userName. For example, my home directory is /user/akbar.

List the Files in Your Home Directory

hadoop fs -ls defaults to /user/userName, so you can leave the path blank to view the contents of your home directory.

hadoop fs -ls

Recursively List Files

The following command will recursively list all files in the /tmp/hadoop-yarn directory.

hadoop fs -ls -R /tmp/hadoop-yarn

Show List Output in Human Readable Format

Human readable format will show each file’s size, such as 1461, as 1.4k.

hadoop fs -ls -h /user/akbar/input

You will see output similar to:
-rw-r–r–   1 akbar akbar       1.4k 2012-06-25 16:45 /user/akbar/input/core-site.xml
-rw-r–r–   1 akbar akbar       1.8k 2012-06-25 16:45 /user/akbar/input/hdfs-site.xml
-rw-r–r–   1 akbar akbar       1.3k 2012-06-25 16:45 /user/akbar/input/mapred-site.xml
-rw-r–r–   1 akbar akbar       2.2k 2012-06-25 16:45 /user/akbar/input/yarn-site.xml

List Information About a Directory

By default, hadoop fs -ls shows the contents of a directory. But what if you want to view information about the directory, not the directory’s contents?

To show information about a directory, use the -d option.

hadoop fs -ls -d /user/akbar

Output:
drwxr-xr-x   – akbar akbar          0 2012-07-07 02:28 /user/akbar/

Compare the output above to the output without the -d option:
drwxr-xr-x   – akbar akbar          0 2012-06-25 16:45 /user/akbar/input
drwxr-xr-x   – akbar akbar          0 2012-06-25 17:09 /user/akbar/output
-rw-r–r–   1 akbar akbar          3 2012-07-07 02:28 /user/akbar/text.hdfs

Show the Usage Statement

hadoop fs -usage ls

The output will be:

Usage: hadoop fs [generic options] -ls [-d] [-h] [-R] [ ...]

Advertisements

hdfs dfsadmin -metasave

Introduction

hdfs dfsadmin -metasave provides additional information compared to hdfs dfsadmin -report. With hdfs dfsadmin -metasave provides information about blocks, including>

  • blocks waiting for replication
  • blocks currently being replication
  • total number of blocks

hdfs dfsadmin -metasave filename.txt

Run the command with sudo -u hdfs prefixed to ensure you don’t get a permission denied error. CDH4 runs the namenode as the hdfs user by default. However if you have changed the

ssudo -u hdfs hdfs dfsadmin -metasave metasave-report.txt

You will see output similar to:


Created file metasave-report.txt on server hdfs://localhost:8020

The output above initially confused me as I thought the metasave report was saved to the HDFS filesystem. However, it’s stating the the metasave report is saved into the /var/log/hadoop-hdfs directory on localhost.

cd /var/log/hadoop-hdfs
cat metasave-report.txt

You will see output similar to:


58 files and directories, 17 blocks = 75 total
Live Datanodes: 1
Dead Datanodes: 0
Metasave: Blocks waiting for replication: 0
Mis-replicated blocks that have been postponed:
Metasave: Blocks being replicated: 0
Metasave: Blocks 0 waiting deletion from 0 datanodes.
Metasave: Number of datanodes: 1
127.0.0.1:50010 IN 247241674752(230.26 GB) 323584(316 KB) 0% 220983930880(205.81 GB) Sat Jul 14 18:52:49 PDT 2012

hdfs dfsadmin -report

Introduction

hdfs dfsadmin -report outputs a brief report on the overall HDFS filesystem. It’s a userful command to quickly view how much disk is available, how many datanodes are running, and so on.

Command

Run the command with sudo -u hdfs prefixed to ensure you don’t get a permission denied error. CDH4 runs the namenode as the hdfs user by default. However if you have changed the

sudo -u hdfs hdfs dfsadmin -report

You will see output similar to:


Configured Capacity: 247241674752 (230.26 GB)
Present Capacity: 221027041280 (205.85 GB)
DFS Remaining: 221026717696 (205.85 GB)
DFS Used: 323584 (316 KB)
DFS Used%: 0%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)

Live datanodes:
Name: 127.0.0.1:50010 (localhost)
Hostname: freshstart
Decommission Status : Normal
Configured Capacity: 247241674752 (230.26 GB)
DFS Used: 323584 (316 KB)
Non DFS Used: 26214633472 (24.41 GB)
DFS Remaining: 221026717696 (205.85 GB)
DFS Used%: 0%
DFS Remaining%: 89.4%
Last contact: Sat Jul 14 18:07:18 PDT 2012

Depricated Command

hadoop dfsadmin -report is a deprecated command. If you enter hadoop dfsadmin -report, you will see the report with the following note at the top of the output.


DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.