-
-
Notifications
You must be signed in to change notification settings - Fork 710
Using Geometric Encoding based Context Sensitive Points to Analysis (geomPTA)
Contributed by Xiao Xiao ([email protected])
geomPTA is a context sensitive points-to analysis based on SPARK. It uses the call graph generated by SPARK to build the full context sensitivity model for subsequent analysis. The cycles on call graph are handled by a special form of 1CFA to gain better precision. This is necessary because Java programs intend to have large cycles. The full algorithm details can be found in our technical report, where the evaluation results are only for reference because we are continuing to improve geomPTA.
Specifying the options "cg.spark enabled:true" and "cg.spark geom-pta:true" can start geomPTA. An note is that the option "cg.spark simplify-offline" should be false. A useful option is "cg.spark geom-runs", which is 1 by default. This option controls how many times the geomPTA will be iterated, where more iterations mean better precision. Usually, setting "cg.spark geom-runs:2" can obtain high precision and that does not cost 2X running time.
By default, geomPTA does not compute the context sensitive points-to information for all variables that are seen by SPARK. First, geomPTA performs a local equivalence merging to merge the pointers that are local to the same function and have same points-to information. This process is similar to the OVS algorithm. Second, geomPTA scans the whole program and artificially marks the functions in the packages with the names starting by "java.", "sun.", and etc. (For full list, please see function isJavaLibraryClass
in class SootClass
) as library functions. For rest of the functions, they are called application functions, although not all are really from user's code. We call the pointers in library functions that could impact the points-to information of pointers in application code supporting pointers and the pointers in application code core pointers. In order to soundly refine the call graph, the base pointers at virtual callsites are also marked as core pointers.
When geomPTA finished, only the core pointers gain the context sensitive points-to information. For non-core pointers, only the context insensitive points-to information computed by SPARK is available. If you want to change this default behavior to compute points-to information for all pointers, please set the option "cg.spark geom-app-only:false".
geomPTA also updates the call graph computed by SPARK, which can be obtained by Scene.v().getCallGraph()
. After geomPTA, the virtual calls (including the interface calls) will be refined with up-to-date points-to information. The spurious call edges will be removed and the unreachable methods will also be cleaned. Generally speaking, the refined call graph will be much more precise than that generated by SPARK.
Note: If you iterate the call graph edges obtained by
Scene.v().getCallGraph().listener()
, the call edges removed in the updated call graph may also be present in the iterator. This is because all the edges ever added to the call graph are tracked.
SPARK provides a set of overloading functions of reachingObject
for querying points-to information of a given local variable, global variable, and an object field. The querying result is saved in PointsToSet
, an interface that defines basic function for accessing the points-to result. However, it doesn't permit programmers to iterate the objects contained in the set. To do this, you can cast a PointsToSet
object returned by reachingObject
to the type PointsToSetInternal
, which has a forall
utility function that accepts a user defined visitor. Sample code for querying points-to information for pointer l
is as follows:
PointsToAnalysis pta = Scene.v().getPointsToAnalysis();
GeomPointsTo geomPTA = (GeomPointsTo)pta;
PointsToSetInternal pts = (PointsToSetInternal)geomPTA.reachingObjects(l)
pts.forall( new P2SetVisitor() {
public void visit(Node n) {
// Do what you like with n, which is in the type of AllocNode
}
});
If only SPARK is enabled, the cast GeomPointsTo geomPTA = (GeomPointsTo)pta
should be changed to PAG sparkPTA = (PAG)pta
.
In addition to the SPARK querying interface, geomPTA also provides its own interface soot.jimple.spark.geom.geomPA.GeomQueries
for querying context sensitive points-to information in more sophisticated usage scenarios. In the following, we assume:
- The pointer is
l
; - The enclosing function of
l
isfunc(l)
.
The most common way for specifying context for l
is using the last K call edges to func(l)
. We provide two functions for this kind of query:
public boolean contextsByCallChain(Edge[] callEdgeChain, Local l, PtSensVisitor visitor)
public boolean contextByCallChain(Edge[] callEdgeChain, Local l, SparkField field, PtSensVisitor visitor)
The two functions are similar except that the first function queries variable l
and the second function queries the expression l.field
. The call edges given in callEdgeChain
are in the order that callEdgeChain[0]
is the farthest call edge in the chain and callEdgeChain[k-1]
is direct call edge of func(l)
. The last parameter visitor
is a container that stores the querying result. Usually, you can use following code to create a container:
PtSensVisitor visitor = new Obj_full_extractor();
For this query, we specify any edge e
and use all the paths from main
to func(l)
that pass e
as the contexts for l
. The functions related to this query are:
public boolean contexsByAnyCallEdge( Edge sootEdge, Local l, PtSensVisitor visitor )
public boolean contextsByAnyCallEdge(Edge sootEdge, Local l, SparkField field, PtSensVisitor visitor)
This query is special to the full context model, because we do not know how far is from the given edge e
to func(l)
. Therefore, it is hard to answer it with a kCFA analysis as implemented in Paddle. One application of this query is proving that, can a given library function call could call back to application code? To answer the query, we first collect all candidate call back callsites reachable from the given entry library call. Then, for each candidate call back site p.foo(...)
, we check if the virtual call p.foo
can resolve to an application function, considering the points-to information of p
under the contexts that must include the given entry callsite. Sample code for this application can be found here.
The most common query is deciding if pointer l1
is an alias of pointer l2
under any contexts. We provide three functions for this query:
public boolean isAliasCI(Local l1, Local l2)
public boolean isAlias(IVarAbstraction pn1, IVarAbstraction pn2)
public boolean isAlias(Local l1, Local l2)
Of course, we can also test if two pointers can point to same object under specified contexts. For this purpose, we should borrow the querying interfaces given above and use the hasNonEmptyIntersection
function provided by the class PtSensVisitor
. Sample code is:
PtSensVisitor pts1 = new Obj_full_extractor();
PtSensVisitor pts2 = new Obj_full_extractor();
if (contexsByAnyCallEdge( Edge e1, Local l1, PtSensVisitor pts1 ))
if (contexsByAnyCallEdge( Edge e2, Local l2, PtSensVisitor pts2 ))
boolean ans = pts1.hasNonEmptyIntersection(pts2)
In the code, ans == true
means the two pointers form an alias.
geomPTA is still under development and please always use the develop version of soot to gain newest features. Currently, geomPTA can easily analyze subjects with JDK 1.7 and be aware of Android multi-threading. For any bugs or ideas, please contact me.
Also check out Soot's webpage.
NOTE: If you find any bugs in those tutorials (or other parts of Soot) please help us out by reporting them in our issue tracker.
- Home
- Getting Help
- Tutorials
- Reference Material
- General Notions
- Getting Started
- A Few Uses of Soot
- Using Soot as a Command-Line Tool
- Using the Soot Eclipse Plugin
- Using Soot as a Compiler Framework
- Building Soot
- Coding Conventions
- Contributing to Soot
- Updating the Soot Web Page
- Reporting Bugs
- Preparing a New Soot Release