Graft is a debugging and testing tool for programs written for Apache Giraph. In particular Graft helps users find bugs in their Computation.compute() and Master.compute() methods that result in an incorrect computation being made on the graph, such as incorrect messages being sent between vertices, vertices being assigned incorrect vertex values, or aggregators being updating in an incorrect way. Graft is NOT designed for identifying performance bottlenecks in Giraph programs or Giraph itself. For more information, visit Graft's wiki.
The following are required to build and run Graft:
- Protocol Buffers
- JDK 7
- Maven 3
- Git
Please check the wiki page for detailed installation instructions.
Make sure everything required for Giraph is also installed, such as:
- Hadoop
Graft must be built as a module for Giraph trunk, so let's grab a copy of it:
git clone https://github.com/apache/giraph.git -b trunk
cd giraph
mvn -DskipTests --projects .,giraph-core install
Get a copy of Graft as giraph-debugger module in Giraph trunk:
git clone https://github.com/semihsalihoglu/graft.git giraph-debugger
cd giraph-debugger
mvn -DskipTests compile
Add current directory to PATH, so we can easily run giraph-debug later:
PATH=$PWD:$PATH
You can add the line to your shell configuration. For example, if you use bash:
echo PATH=$PWD:\$PATH >>~/.bash_profile
Now, let's debug an example Giraph job.
Before we move on, let's download a small sample graph:
curl -L http://ece.northwestern.edu/~aching/shortestPathsInputGraph.tar.gz | tar xfz -
hadoop fs -put shortestPathsInputGraph shortestPathsInputGraph
You must have your system configured to use a Hadoop cluster, or run one on your local machine with the following command:
start-all.sh
Next, let's compile the giraph-examples module:
cd ../giraph-examples
mvn -DskipTests compile
Here's how you would typically launch a Giraph job with GiraphRunner class (the simple shortest paths example):
hadoop jar \
target/giraph-examples-*.jar org.apache.giraph.GiraphRunner \
org.apache.giraph.examples.SimpleShortestPathsComputation \
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \
-vip shortestPathsInputGraph \
-vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
-op shortestPathsOutputGraph.$RANDOM \
-w 1 \
-ca giraph.SplitMasterWorker=false \
#
Now, you can launch the Giraph job in debugging mode by simply replacing the first two words (hadoop jar
) of the command with giraph-debug
:
giraph-debug \
target/giraph-examples-*.jar org.apache.giraph.GiraphRunner \
org.apache.giraph.examples.SimpleShortestPathsComputation \
# ... rest are the same as above
Find the job identifier from the output, e.g., job_201405221715_0005
and copy it for later.
You can optionally specify the supersteps and vertex IDs to debug:
giraph-debug -S{0,1,2} -V{1,2,3,4,5} -S 2 \
target/giraph-examples-*.jar org.apache.giraph.GiraphRunner \
org.apache.giraph.examples.SimpleShortestPathsComputation \
# ...
Launch the debugger GUI with the following command:
giraph-debug gui
Then open http://localhost:8000 from your web browser, and paste the job ID to browse it after the job has finished.
If necessary, you can specify a different port number when you launch the GUI.
giraph-debug gui 12345
You can access all information that has been recorded by the debugging Giraph job using the following commands.
giraph-debug list
giraph-debug list job_201405221715_0005
giraph-debug dump job_201405221715_0005 0 6
giraph-debug mktest job_201405221715_0005 0 6 Test_job_201405221715_0005_S0_V6
giraph-debug mktest-master job_201405221715_0005 0 TestMaster_job_201405221715_0005_S0