Troubleshoot Oozie MapReduce jobs
This articles provide troubleshooting steps for Oozie MapReduce job failure.
YARN is used in this example.
For example, if below Oozie MapReduce job fails, what logs need to check for RCA?
We can identify the container on which the attempt task failed, and then check the nodemanager log.
For example:
YARN is used in this example.
For example, if below Oozie MapReduce job fails, what logs need to check for RCA?
[root@admin]# oozie job -info 0000031-140711123346649-oozie-oozi-W Job ID : 0000031-140711123346649-oozie-oozi-W ------------------------------------------------------------------------------------------------------------------------------------ Workflow Name : map-reduce-wf-pi App Path : hdfs://nameservice1/user/root/examples/apps/map-reduce_pi Status : KILLED Run : 0 User : root Group : - Created : 2014-07-16 21:15 GMT Started : 2014-07-16 21:15 GMT Last Modified : 2014-07-16 21:17 GMT Ended : 2014-07-16 21:17 GMT CoordAction ID: - Actions ------------------------------------------------------------------------------------------------------------------------------------ ID Status Ext ID Ext Status Err Code ------------------------------------------------------------------------------------------------------------------------------------ 0000031-140711123346649-oozie-oozi-W@:start: OK - OK - ------------------------------------------------------------------------------------------------------------------------------------ 0000031-140711123346649-oozie-oozi-W@mr-node ERROR job_1404818506021_0063 FAILED/KILLED- ------------------------------------------------------------------------------------------------------------------------------------ 0000031-140711123346649-oozie-oozi-W@fail OK - OK E0729 ------------------------------------------------------------------------------------------------------------------------------------
1. Check Oozie log firstly.
oozie job -log 0000031-140711123346649-oozie-oozi-W
2. Check related MapReduce job log.
mapred job -logs job_1404818506021_0063
3. Check related map and reduce attempts logs.
Firstly identify the map and reduce attempts IDs.[root@admin]# mapred job -list-attempt-ids job_1404818506021_0063 map completed attempt_1404818506021_0063_m_000000_0 [root@admin]# mapred job -list-attempt-ids job_1404818506021_0063 reduce completedThen check attempts log(s):
mapred job -logs job_1404818506021_0063 attempt_1404818506021_0063_m_000000_0
4. Check YARN container logs.
4.1 Firstly identify the YARN application ID and all of its children application IDs from Oozie web GUI.
4.2 Check all YARN application status to see which one of them failed.
[root@admin]# yarn application -status application_1404818506021_0063 14/07/16 15:56:28 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm148 Application Report : Application-Id : application_1404818506021_0063 Application-Name : oozie:launcher:T=map-reduce:W=map-reduce-wf-pi:A=mr-node:ID=0000031-140711123346649-oozie-oozi-W Application-Type : MAPREDUCE User : root Queue : root.root Start-Time : 1405545360622 Finish-Time : 1405545383113 Progress : 100% State : FINISHED Final-State : SUCCEEDED Tracking-URL : http://admin.xxx.com:19888/jobhistory/job/job_1404818506021_0063 RPC Port : 31561 AM Host : hdw2.xxx.com Diagnostics : [root@admin]# yarn application -status application_1404818506021_0064 14/07/16 16:15:17 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm148 Application Report : Application-Id : application_1404818506021_0064 Application-Name : oozie:action:T=map-reduce:W=map-reduce-wf-pi:A=mr-node:ID=0000031-140711123346649-oozie-oozi-W Application-Type : MAPREDUCE User : root Queue : root.root Start-Time : 1405545382007 Finish-Time : 1405545422206 Progress : 100% State : FINISHED Final-State : FAILED Tracking-URL : http://admin.xxx.com:19888/jobhistory/job/job_1404818506021_0064 RPC Port : 8699 AM Host : hdw1.xxx.com Diagnostics : Task failed task_1404818506021_0064_m_000000 Job failed as tasks failed. failedMaps:1 failedReduces:0We can see YARN application application_1404818506021_0064 failed.
4.3 Check the logs of the failed YARN application.
yarn logs -applicationId application_1404818506021_0064
In this example, the root cause is in this YARN container logs:WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.example.Q
uasiMonteCarlo$QmcMapper not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1895)
at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:196)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:722)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.example.QuasiMonteCarlo$QmcMapper not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1893)
... 8 more
5. Check resourcemanager and nodemanger logs.
ResouceManager log is on the active resource manager.We can identify the container on which the attempt task failed, and then check the nodemanager log.
For example:
[root@hdw2 ~]# ls /var/log/hadoop-yarn container hadoop-cmf-yarn-NODEMANAGER-hdw2.xxx.com.log.out hadoop-cmf-yarn-RESOURCEMANAGER-xxx.viadea.com.log.out
Commentaires
Enregistrer un commentaire