Manage files in HDFS using WEBHDFS REST APIs

Web services have become indispensable,  in the current trend of  development of applications,   for  exchange of data across the  applications / web applications.  Various application programming interfaces (APIs) are emerging to expose Web services. Representational state transfer (REST),  used by browsers, is logically the choice for building APIs.
Web services have become indispensable,  in the current trend of  development of applications,   for  exchange of data across the  applications / web applications.  Various application programming interfaces (APIs) are emerging to expose Web services. Representational state transfer (REST),  used by browsers, is logically the choice for building APIs.
I share my understanding and experience in using WEBHDFS  REST API. 
What is WebHDFS?
  • Hadoop provides a Java native API to support file system operations such as create, rename or delete files and directories, open, read or write files, set permissions, etc.
  • This is perfectly useful for applications running within the Hadoop cluster, in cases where an external application needs to exchange data/files in HDFS and need to perform operations like create directories and write files to that directory or read the content of a file stored on HDFS, a special API is required
  • Hortonworks developed an additional API to support these requirements based on standard REST functionalities
  • WebHDFS concept is based on HTTP operations like GET, PUT, POST and DELETE.
fig1
  • Authentication can be based on user.name query parameter (as part of the HTTP query string) or if security is turned on then it relies on Kerberos.
  • Not much configuration needed to enable WebHDFS. Only we have to add the property shown below  in the  hdfs-site.xml
  • Regular operations like Create a directory, List all directories, Open a directory/file, delete a directory/file and etc.. are straight forward operations
  • We have to give appropriate operator for op=<operation_type> in the WebHDFS URL
  • create/upload a file to HDFS is a little complex. So let us see how to upload a file into HDFS using WebHDFS Rest API
Upload a file into HDFS using WEBHDFS Rest API in Java Jersey Application
Jersey RESTful Web Services framework is open source for developing RESTful Web Services in Java and provides support for JAX-RS APIs and serves as a JAX-RS
Stack I used here is:
  • Java version 1.7
  • Jersey version 1.16
  • Hadoop version 2.6.0
  • Jetty version 9.3.11
Step 1:
  • Please create a Java web project. It should be Jersey rest application
Step 2:
  • Write a controller createFile/uploadFile()
  • java

Step 3:
Write a Service method to upload file to HDFS using HttpUrlConnection
• WebHDFSFileUploadService.java

Step 4:
Test your application
• Make a war file of your application and deploy your application into Jetty server
• Open any rest console and provide below URL
o http://://work/createFile
o Inputs:
 Header Param: . file_name : //Uploaded file with be saved with this name
 Choose a file from rest console
  • Example: Input I gave here is:
    • file_name : test1/files/sample-test.txt
    • file : test-file.txt (Selecting this file from local directory)
  • I have chosen the file test-file.txt from local directory. This file will be saved to HDFS in the given path: /test1/files/sample-test.txt
  •  Then check whether the file is uploaded in the given path, using Hadoop fs command
  • Login to putty and give the command:
• You will see sample-test.txt file in the given path
• To view the  content of the file,
• It will display the file content on the screen
Thus upload of the file to HDFS using WebHDFS API from a Java Jersey Application is successful.
Here I used  HttpURLConnection to upload my files to HDFS

Commentaires

Posts les plus consultés de ce blog

Controlling Parallelism in Spark by controlling the input partitions by controlling the input partitions

Spark performance optimization: shuffle tuning

Spark optimization