How to use Drill to parse ResourceManager Rest API results

How to use Drill to parse ResourceManager Rest API results

rs:http://www.openkb.info/search/label/drill

Goal:

ResourceManager REST API provides all detailed information regarding YARN applications, metrics of YARN cluster, etc.
This article provides a simple demo on how to use Drill to query the result of the REST APIs.
One example use case is to show the largest YARN applications which are currently running.

Env:

Drill 1.8
Hadoop 2.7.0

Solution:

1. Save the current output of RM REST API as a json file on MFS(or HDFS).

1
curl -v -X GET -H "Content-Type: application/json" http://s1.poc.com:8088/ws/v1/cluster/apps > /mapr/mysuper.cluster.com/tmp/restapi/data.json
Here "s1.poc.com" is the RM node.

2. Use Drill to parse the json file to answer the question Cluster Admin want to ask.

For example, to show the largest YARN applications which are currently running:
1
2
3
4
5
6
7
with tmp as
(
select flatten(t.apps.app) as col from dfs.tmp.`restapi/data.json` t
)
select tmp.col.id,tmp.col.`user` as `user`,tmp.col.runningContainers as `runningContainers`,tmp.col.allocatedMB as `allocatedMB`,tmp.col.allocatedVCores as `allocatedVCores` from tmp
where tmp.col.state='RUNNING'
order by tmp.col.runningContainers desc;
The result is:
1
2
3
4
5
6
7
+---------------------------------+-------+--------------------+--------------+------------------+
|             EXPR$0              | user  | runningContainers  | allocatedMB  | allocatedVCores  |
+---------------------------------+-------+--------------------+--------------+------------------+
| application_1475192050844_0003  | mapr  | 4                  | 16384        | 4                |
| application_1475192050844_0004  | mapr  | 1                  | 2048         | 1                |
+---------------------------------+-------+--------------------+--------------+------------------+
2 rows selected (0.525 seconds)

Commentaires

Posts les plus consultés de ce blog

Controlling Parallelism in Spark by controlling the input partitions by controlling the input partitions

Spark performance optimization: shuffle tuning

Spark optimization