Skip to content
Jaeho Shin edited this page Aug 10, 2019 · 51 revisions

GRAFT:
A Distributed Debugger and Testing Tool for Apache Giraph

On this page you can find information about the following:

Other pages of this wiki covers the following topics:

Related Publication:

Related Talk:

Overview

Graft is a debugging and testing tool for programs written for Apache Giraph. In particular, Graft helps users find bugs in their Computation.compute() and Master.compute() methods that result in an incorrect computation being made on the graph, such as incorrect messages being sent between vertices, vertices being assigned incorrect vertex values, or aggregators being updating in an incorrect way. Note that Graft is NOT designed for identifying performance bottlenecks in Giraph programs or Giraph itself. For an overview of Apache Giraph, please see Giraph documentation and the original Pregel paper. Below we describe how Graft helps users debug and test their Giraph jobs. For a motivation for Graft's approach to debugging and testing please see the section on motivation for Graft's approach.

Debugging Giraph programs

Graft's debugging functionality is based on the idea of capturing a very small number of (buggy looking) contexts under which users' compute() methods executed, visualizing these captured contexts in an easy and intuitive fashion, and reproducing these contexts for the users in a compilable program to enable step-by-step debugging into the compute() methods under these captured contexts. The compilable program is a junit test case, which contains code that reproduces the context and then calls the appropriate compute() method. For step-by-step debugging, users run their IDE's debugger (such as Eclipse's debugger) on the generated unit test.The context captured for both Computation.compute() and Master.compute() includes a set of aggregators, the superstep number in which the compute() method was executed, some global stats over the graph, such as total vertex and edge counts, and other program specific configurations that were used when the user executed his/her Giraph program. In addition the context captured for Computation.compute() includes a previous vertex value and a set of incoming messages.

Graft has four core components:

  1. DebugConfig.java: Users specify when to capture a context in a Giraph job programmatically through a DebugConfig object. Using DebugConfig, the user can specify the following: i) a set of vertices, for which the Computation.compute() contexts will be captured; ii) a set of supersteps to capture contexts; iii) message value constraints to capture the contexts when an incorrect messages are being sent; iv) vertex value constraints to capture the contexts when incorrect values are assigned to vertices; and v) whether to capture contexts when a vertex throws an exception. If the program contains a Master.compute(), Graft always captures its context every superstep.

  2. Graft Instrumenter: By specifying a custom or one of the common DebugConfigs, users submit programs to be debugged to Graft through Graft's command line interface. Graft's Instrumenter instruments user's .compute() methods according to the given DebugConfig. The instrumented code is then launched as a normal Giraph job either locally or on a cluster. During the computation, the instrumented code captures one or more contexts that are specified in the DebugConfig.

  3. Graft GUI: Graft's GUI enables users to visually inspect the contexts that were captured during the debug run and spot incorrect vertex values, messages, or aggregators. Users can start visualizing the captured contexts from the first superstep and go to later supersteps and come back, even when the Giraph job is still is continuing. Graft GUI displays the captured vertices, the set of captured aggregators for different supersteps, message and vertex value constraints that were violated, and exceptions that were thrown during the computation. User's can visualize the captured vertices in a node-link format, if there's a small number of them, or in a tabular view when a large number of vertex contexts are captured. We note that at the core of Graft's approach to debugging is the understanding that only a very small number of vertex contexts are needed to debug programs. As a result, Graft limits the number of vertex contexts that can be captured, yet there can still be up to a few hundreds of captured vertex contexts for which the tabular view is preferable to the node-link view.

  4. Contexts Reproduction for Step-By-Step Debugging: Once the user spots an incorrect message, vertex value, or aggregator using the GUI, he can use the Graft' GUI or command line interface to generate a compilable program that reproduces the context under which the Computation.compute() or Master.compute() methods resulted in the incorrect message, vertex value, or aggregator value. The compilable program is a junit test case which first reproduces the captured context and then calls the appropriate _compute() method. The user can drag or copy and paste this test case to an IDE and using the IDE's debugger, step into the code to debug his compute() method.

Testing Giraph programs

Another functionality of Graft's GUI is to help users easily construct small graphs which they can use for running end-to-end tests. Using the GUI users can also generate code-snippets which wrap the constructed graph in a Giraph's TestGraph object, and copy and paste the TestGraph into their test suites.

Motivation for Graft's Approach

Graft's capture-visualize-reproduce approach to debugging is motivated by the observation that the two most common debugging cycles that Giraph users go through--explained momentarily--follow a similar three step process of logging a very small amount of context (i.e., messages, vertex values, aggregators), inspecting log files, and then inspecting the Giraph code to find the bug:

  1. In the early development cycle, the user first writes a Computation.compute() and optionally a Master.compute() method. Then he constructs a very small graph (say several vertices and edges) inside a text file and runs Giraph locally using this small graph as input. The user then goes through a three step process to notice and fix a bug: 1) He puts println statements inside the compute() methods to log the messages that are being sent, and/or the values that are being assigned to vertices, and/or aggregator values that are being registered and updated. 2) Then the user inspects the log file of the local Hadoop worker to look at the dumped information and try to notice either an incorrect message, vertex value, or aggregator value; 3) Finally, once an incorrect message, vertex, or aggregator value is noticed, the user inspects his compute() methods to spot and fix the part of the code that may have produced the incorrect data. During this cycle, the user catches some bugs that can be identified on small graphs.

  2. In later stages of the development cycle the user encounters bugs that only appear on large graphs in a distributed cluster. Typically, bugs in a distributed cluster are noticed in two ways: a) An exception is thrown during the computation, e.g. a NullPointerException or an IllegalArgumentException caused by a division by 0; or b) The user writes a script or runs an extra hadoop job to do a sanity check on the output of the program and the sanity check fails, e.g. he notices that unlike what he expects, not all final vertex values are positive and/or the vertex value of a particular vertex is not 0. After noticing a bug, the user next uses the ID of a single problematic vertex v that either throws an exception, does not seem to have a correct value and goes through the same three step debugging process to find the bug: 1) put println statements in his code, this time only for v, to log some context information about v for one or more supersteps; 2) find the correct Hadoop log file that contains the output of the println statement and inspect the context under which v threw an exception or was assigned a wrong value and see if any data in the context looks wrong; and 3) inspect the Computation.compute() and/or Master.compute() methods to find the bug in the code that may have produced the exception, the wrong assignment, or wrong data in the context.

In both of the above debugging cycles, the programmer essentially captures a very small number of contexts under which Computation.compute() or Master.compute() was executed, inspects these contexts, and then inspects the compute() methods to find the bug. Graft's capture-visualize-reproduce approach aims to simplify each of the three common debugging steps:

  • capture: Using DebugConfig, users specify when to capture the context of the compute() methods. They may specify a set of vertex Ids and Graft will capture the contexts of the specified vertices. They may also put a vertex value, or message value integrity constraints and Graft will capture the contexts under which these constraints are violated. The contexts under which exceptions are thrown are always captured.
  • visualize: Graft GUI simplifies visualizing the captured contexts.
  • reproduce: The generated junit test cases reproduce a captured context under which a bug is exposed and allows programmers to do step-by-step debugging into their compute() methods to inspect

Limitations

The main limitation of Graft is that it is able to capture only the context exposed to the vertices and master by Giraph, which includes vertex values, messages, aggregators, the superstep numbers, and some common graph state information, such as the total number of vertices and edges in the graph. Although not recommmended, Giraph programs can depend on many other state, such as static fields of a class, which may be initialized by some vertex in the beginning of the computation. Graft will not be able to capture the value of such static fields and reproduce it for the programmer.

Another limitation of Graft is that it currently does not include any visualizations to observe changes made to the topology of the input graph, such as edge and vertex removals or additions.

Please send your questions to the Graft developer email list: [email protected]