Friday, May 25, 2012

Debugging Hadoop Task tracker, Job tracker, Data Node or Name Node

Hadoop conf/hadoop-env.sh has following environment variables 

  1. HADOOP_NAMENODE_OPTS
  2. HADOOP_SECONDARYNAMENODE_OPTS, 
  3. HADOOP_DATANODE_OPTS
  4. HADOOP_BALANCER_OPTS
  5. HADOOP_JOBTRACKER_OPTS
  6. HADOOP_TASKTRACKER_OPTS


You can use them to start the remote debugger so that you can connection and debug any of the above servers. Unfortunately, Hadoop tasks are started through a separate JVM by the task tracker, and you cannot use this method to debug your map or reduce function as they run in separate JVMs. 

To debug task tracker, do following steps. 

1. Edit conf/hadoop-env.sh to have following

export HADOOP_TASKTRACKER_OPTS="-Xdebug -Xrunjdwp:transport=dt_socket,address=5000,server=y,suspend=n"

2. Start Hadoop (bin/start-dfs.sh and bin/start-mapred.sh)
3. It will block waiting for debug connection
4. Connect to the server using Eclipse "Remote Java Application" in the Debug configurations and add the break points
5. Run a map reduce Job 

No comments: