Most Useful Apache Hadoop HDFS Commands

8 June 2017

This post describes some of the basic Apache Hadoop HDFS commands one would need when working in a Hadoop Cluster.

Create a directory in HDFS at given path(s)

Syntax: $hadoop fs -mkdir <paths>


$hadoop fs -mkdir /home/hduser/dir1 /home/hduser/dir2

List the contents of a directory in HDFS.

$hadoop fs -ls /home/hduser```

Recursive command to list all directory, sub directory of hadoop HDFS till the end.

[[email protected] ~]$ hdfs dfs -ls -R /data/movies_data
-rw-r--r--   1 maria_dev hdfs    2893177 2018-11-04 22:09 /data/movies_data/movies_data.csv

Upload a file in HDFS from Local path.

//Copy single src file, or multiple src files from local file system to the Hadoop data file system Syntax: hadoop fs -put <local file system source> ... <HDFS_dest_Path>

$hadoop fs -put /home/hduser/HadoopJob/input/74-0.txt /user/hduser/input

# With Relative Path(Use "./"  for relative Path)
$hadoop fs -put ./HadoopJob/input/accesslogs.log /user/hduser/input

Download/Copy a file in local File System from HDFS

  • get

Syntax: $hadoop fs -get <hdfs_source> <local_destination_path>`

$hadoop fs -get /home/hduser/dir3/file1/txt /home/
  • copyToLocal

Syntax: $hadoop fs -copyToLocal <hdfs_source> <local_destination_path>

[[email protected] tutorials]$ hdfs dfs -copyToLocal /data/movies_data/movies_data.csv /home/maria_dev/tutorials                                   
[[email protected] tutorials]$ ls                                                                                                                  
movies_data.csv                 

See or Read contents of a file

$hadoop fs -cat /home/hduser/dir1/abc.txt

Copy a file from source to destination

//allows multiple sources{File or Directory} as well in which case the destination must be a directory.

$hadoop fs -copyFromLocal <localsrc> URI   //Syntax
$hadoop fs -copyFromLocal /home/hduser/abc.txt  /home/hduser/abc.txt

Move file from source to destination.

$hadoop fs -mv <src> <dest>   //Syntax
$hadoop fs -mv /home/hduser/dir2/abc.txt /home/hduser/dir2

Removing files and directories in HDFS

Remove files specified as argument. Deletes directory only when it is empty Syntax: $hadoop fs -rm <argument>

$hadoop fs -rm /home/hduser/dir1/abc.txt

Removing files and directories Recursively version of delete.

Usage :
$hadoop fs -rmr <arg>
$hadoop fs -rm -R /home/hduser/

Display last few lines of a file.

we can display few lins of file Using the tail command of Unix

$hadoop fs -tail /home/hduser/dir1/abc.txt

Display the aggregate length or disk usage of a file or HDFS path

Syntax: hadoop fs -du /<Directory Path>

hadoop fs -du /home/hduser/dir1/abc.txt

Display the HDFS usage in Human Readable Format

Syntax: hdfs dfs -du -h

[[email protected] ~]$ hdfs dfs -du -h  /data/retail_application
590      /data/retail_application/categories_dim
51.5 M   /data/retail_application/customer_addresses_dim
4.4 M    /data/retail_application/customers_dim
17.4 K   /data/retail_application/date_dim
7.4 M    /data/retail_application/email_addresses_dim
131.4 M  /data/retail_application/order_lineitems
69.4 M   /data/retail_application/orders
99       /data/retail_application/payment_methods
22.3 M   /data/retail_application/products_dim

Counts the no of directories,files and bytes in a File Path

Syntax: hadoop fs -count <Filepath>

~$hadoop fs -count <Filepath> 

Empty the Trash

~$hadoop fs -expunge :Empty the trash 

Takes a source directory and destination file as input and concatenates file in src into destination local file

~$hadoop fs -getmerge <HDFS source path>
             <Local file system Destination path >

Takes a source file and outputs the file in text format.

 ~$hadoop fs -text <Source Path>
 The allowed formats are zip & TextReadInput Stream

creates a file of length Zero or size

~$hadoop fs -touchz <path>

Check if the File ,path or Directory Exists

~$hadoop fs -test -ezd <pathname>
  hadoop fs -test -e <path>
  hadoop fs -test -z <pathname>
  hadoop fs -test -d <pathname>
  
 -e:check to see if the file exists 
     return 0 if true
  -z:check to see if  the file is zero length 
      return if true
   -d:Checks and return 1 if path is directory 
      else 0   

Returns the stat information on path

$hadoop fs -stat <local or HDFS path name>

Displaying Disk file system capability in terms of bytes

~$hadoop fs -df <Directory Path>

Check Applications Logs using Application ID

#Check all Logs
~$yarn logs -applicationId <Application ID>

View a Specific Log Type for a Running Application in Yarn

~$yarn logs -applicationId <Application ID> -log_files <log_file_type>

view only the stderr error logs in Yarn

~$yarn logs -applicationId <Application ID> -log_files stderr

The -logFiles option also supports Java regular expressions, so the following format would return all types of log files:

~$yarn logs -applicationId <Application ID> -log_files .* 

Disable the NameNode Safe mode

Below command is used to disable the safe node of namenode and can be executed by only Hadoop Admin or Hadoop operation team.

sudo su hdfs -l -c 'hdfs dfsadmin -safemode leave'

References

Yarn CLI Commands

File System Guide

Share: Twitter Facebook Google+ LinkedIn
comments powered by Disqus