All about Soot (draft)


1. Preliminaries

JVM 4 种函数调用

  • invoke special: call constructor, superclass methods, private method
  • invoke virtual: normal instance method call (virtual dispatch)
  • invoke interface: like invoke virtual, but cannot optimize, additionally, check interface implementation
  • invoke static: call static methods
  • invoke dynamic (after Java 7): allows dynamic typing language to run on JVM (Java is static typing)

2. Basic concepts

Soot has its own class path, which by default is empty. When specifying class path of Soot using -cp, do not use ~. Instead, use absolute or relative paths.

Jimple 尖括号中为 method signature: class-name: return-type method-name (parameter-type1, ...)

2.1. Three types of classes

There are three kinds of classes (these are classes analyzed by Soot, not the ones owned by Soot):

  • argument class: specified explicitly in Soot cli as argument, also become application class
  • application class: classes that Soot analyzes, transforms, and turns into output files
  • library class: classes which are referred to, directly or indirectly, by the application classes, but which are not themselves application classes. Only used for type resolution.

Since argument classes automatically become application classes, there are inherently only two classes—application class & library class.

When you use the -app option, however, then Soot also processes all classes referenced by these classes. It will not, however, process any classes in the JDK, i.e. classes in one of the java.* and com.sun.* packages. If you wish to include those too you have to use the special –i option, e.g. -i java.

2.2. Packs & phases

The execution of Soot is separated into several phases called packs.

The role of a pack

  • b: body creation
  • t: user-defined transformations. This is of special interest since it allows us to inject custom analysis.
  • o: optimizations
  • a: annotation (attribute generation)

2.2.1. Whole Program Analysis Packs

Before running the aforementioned packs, some packs are run

  • wjpp: here w stands for whole-program.
  • cg: call-graph generation
  • wjtp: whole Jimple transformation pack
  • wjop: whole Jimple optimization pack (this is disabled unless -W is specified)
  • wjap: whole Jimple annotation pack

The information generated in these packs are made available to the rest of Soot through Scene.v().

2.2.2. Cli Options

To show help:

  • -pl, -phase-list: Print list of available phases
  • -ph PACK, -phase-help PACK: Print help for the specified PACK. Here PACK can be either generic (e.g. jop), or specific (e.g. jop.cpf)

To set an option to a pack, use -p or -phase-option in the form of -p PACK OPTION:VALUE, which sets PACK's OPTION to VALUE, e.g. to turn off all user-defined intra-procedural transformations (in pack jtp):

soot -p jtp enabled:false ...

3. Building Soot

mvn clean compile assembly:single

3.1. Javadoc

mvn javadoc:javadoc

4. Soot in cli

soot -v -process-dir code/ -d out
soot -cp . -pp Circle
soot -cp . -pp Circle -p cg.spark verbose:true,on-fly-cg:true

Cli options are defined in src/main/xml/options/soot_options.xml.

5. Different IRs

ir.jpg
图1  Soot IRs

5.1. Baf

Baf is

  • a compact representation of bytecode
  • stack-based

The common interface is soot.baf.Inst.

Available optimizations are in soot.baf.toolkits.base.

5.2. Jimple

Jimple is

  • typed: all local variables are typed
  • stackless
  • 3-address (statements reference at most 3 local variables or constants)
    • this requires linearization of some complex expressions, e.g. a*b + c*d is converted to multiple 3-address statements.

For a complete explanation of Jimple, see section Jimple.

5.3. Shimple

Shimple is

  • SSA-version (Static Single Assignment) of Jimple: each local variable has a single static point of definition.
    • this introduces a Phi node.

5.4. Grimp

Grimp preserves new operator and complex expressions (no linearization).

5.5. Dava

6. Main implementation classes

Thses are implementation classes of Soot, i.e. they are owned by Soot. For a classification of classes analyzed by Soot, see this section. Fig. 2 shows fun-call relations of some of the most important classes.

main-class-relation.jpg
图2  Main class relationships
  • Scene Manages the SootClasses of the application being analyzed.
  • SootClass Soot representation of a Java class. They are usually created by a Scene, but can also be constructed manually through the given constructors.

    // for methods
    SootMethod getMethod(String subsignature);
    SootMethod getMethod(String name, List<Type> parameterTypes);
    SootMethod getMethodByName(String name);
    int getMethodCount();
    List<SootMethod> getMethods();
    // for fields, alike
    Chain<SootField> getFields();
    
  • SootMethod
    • Body, JimpleBody
  • SootField
  • Unit
  • UnitGraph
    • ExceptionalUnitGraph: use ExceptionalUnitGraphFactory.createExceptionalUnitGraph() to create

6.1. Scene

Scene is a singleton class that keeps all classes which are represented by SootClass. Each SootClass may contain several methods (SootMethod) and each method may have a Body object that keeps the statements (Units).

Scene

There are two scenes:

  • soot.Scene: which manages all the SootClasses being analyzed.
  • soot.ModuleScene: a subclass of Scene used to analyze Java 9 modules.

Methods of soot.Scene:

  • loadClassAndSupport(String className): loads the given class and all the required support classes.
  • loadNecessaryClass(String name)

    protected void loadNecessaryClass(String name) {
        loadClassAndSupport(name).setApplicationClass();
    }
    
  • loadNecessaryClasses(): loads the set of classes that soot needs, including those specified on the command-line. This is the standard way of initialising the list of classes soot should use.

    The classes specified in the command-line include:

    • individual classes specified in command-line. e.g. java soot.Main -cp . -pp A B, then opts.classes() returns the list {"A", "B"}.

      for (String name : opts.classes()) {
          loadNecessaryClass(name);
      }
      
    • -process-dir: all classes specified in directories

      for (String path : opts.process_dir()) {
          for (String cl : SourceLocator.v().getClassesUnder(path)) {
              SootClass theClass = loadClassAndSupport(cl);
              if (!theClass.isPhantom) {
                  theClass.setApplicationClass();
              }
          }
      }
      

6.2. SootMethod

SootMethod

  • getActiveBody() throws an exception when no active body is present. This cannot be called before PackManager.v().runPacks(); in Main.
  • retrieveActiveBody() will construct an active body if none is present.

6.2.1. Printing a Method

In soot.Body::toString(), Printer.v().printTo() is used to print a method body:

Printer.v().printTo(this, writerOut);

6.3. SootField

6.4. Graph

Different kinds of graphs (partial)

DirectedBodyGraph (I)
    ExceptionalGraph (I)
        CompleteUnitGraph (C)
        ExceptionalUnitGraph (C)
            CompleteUnitGraph (C)
        CompleteBlockGraph (C)
        ExceptionalBlockGraph (C)
            CompleteBlockGraph (C)
    CompleteUnitGraph (C)
    ExceptionalUnitGraph (C)
        CompleteUnitGraph (C)
    BriefUnitGraph (C)
    TrapUnitGraph (C)
    UnitGraph (C)
        ExceptionalUnitGraph (C)
            CompleteUnitGraph (C)
        BriefUnitGraph (C)
        TrapUnitGraph (C)

7. Jimple

A complete description of the Jimple grammar can be seen in Figure 2.9 and 2.10 of the Sable thesis.

The common interface is soot.jimple.Stmt.

There are 15 Stmts (Stmt is instance of Unit)

  • Core statements
    • NopStmt
    • DefinitionStmt: its left op can either be a primitive (PrimType) or a ref-like type (RefLikeType). To check:

      if (defStmt.getLeftOp().getType() instanceof RefLikeType) {
          // ...
      }
      
      • IdentityStmt: assigns parameters and this reference to local variables. This ensures that all local variables have at least one definition point.

        r0 := @this;
        i1 := @parameter0;
        
      • AssignStmt
  • Intra-procedual control-flow statements
    • IfStmt

      if r1 != null goto label0;
      

      In a BranchedFlowAnalysis, there're two flows out of an IfStmt: the fall-through flow and branched flow.

    • GotoStmt
    • SwitchStmt
      • TableSwitchStmt
      • LookupSwitchStmt
  • Inter-procedual control-flow statements
    • InvokeStmt
    • ReturnStmt
    • ReturnVoidStmt
  • Monitor statements: for mutual exclusion
    • EnterMonitorStmt
    • ExitMonitorStmt
  • ThrowStmt: throws an exception
  • RetStmt: not used; returns from a JSR
    • JSR & RET are JVM instructions for subroutine. It seems that they are deprecated Java bytecode, as using them causes more harm than good. According to this mail and its reply, JVM subroutines (JSR & RET) "cause huge problems with analysis and optimization" and are removed by Jimple's JSR inliner.

The local variables which start with a dollar sign ($) represent stack positions and not local variables in the original program whereas those without $ represent real local variables e.g. i0 in the main method corresponds to a in the Java source.

The main structure of a Jimple method (from Section 2.3.6 of the Sable thesis):

  • All local variables are declared at the top of the method.
  • Identity statements follow the local variable declarations, which marks the local variables that have values upon method entry.
  • Then comes the method body, which are mostly assignment statements.
  • See the Hierarchy For Package soot.jimple.internal, all statements are under soot.AbstractUnitsoot.jimple.internal.AbstractStmt.

7.1. FieldRef

FieldRef 分为 InstanceFieldRefStaticFieldRef

FieldRef (I)
|- InstanceFieldRef (I)
|  |- JInstanceFieldRef (C, for Jimple)
|  |- GInstanceFieldRef (C, for Grimp)
|  `- ...
|- StaticFieldRef (C)
`- ...

7.2. Labels

Labels are displayed using Printer.

8. Body

Body has three chains

  • Units chain: the actual code. Jimple provides the Stmt implementation of Unit, while Grimp provides the Inst implementation.
  • Locals chain: local variables
  • Traps chain: trap handlers, in the form of

    catch java.lang.Exception from label0 to label1 with label2;
    

9. Value

Value

  • Local: a local variable
    • JimpleLocal
  • Expr: expression. An Expr carries out some action on one or several Values and returns another Value.
    • package soot.jimple
      • BinopExpr
      • NewExpr
      • NewArrayExpr
      • NewMultiArrayExpr
    • package soot.jimple.internal
      • JCastExpr
  • Immediate
    • Constant
  • Ref
    • ParameterRef
    • CaughtExceptionRef
    • ThisRef

9.1. ValueBox

A ValueBox is a pointer to some value. It can be visualized as a box containing some value.

  • getValue(): dereferences the pointer
  • setValue(): mutates value in the box
  • A unit has both DefBox & UseBox
    • getUseBoxes() returns a list of ValueBoxes, corresponding to all Values used in the unit.
    • getDefBoxes() returns all Valuess defined in the unit.
    • For example, for unit x=y*z, there're 3 use boxes: [y*z] (an Expr), [y] (a Local), and [z] (another Local); and one def box: [x] (a Local). The brackets ([]) represent the box.

For example, the following Java code

int a = 12;
int b = 24;
int x = a * b;

is translated to

a = 12;
b = 24;
temp$0 = a * b;
x = temp$0;

The DefBox & UseBox of each statement is as follows

a = 12
  Def:
    LinkedVariableBox[JimpleLocal: a]
  Use:
    LinkedRValueBox[IntConstant: 12]

b = 24
  Def:
    LinkedVariableBox[JimpleLocal: b]
  Use:
    LinkedRValueBox[IntConstant: 24]

temp$0 = a * b
  Def:
    LinkedVariableBox[JimpleLocal: temp$0]
  Use:
    LinkedRValueBox[JMulExpr: a * b]
    ImmediateBox[JimpleLocal: a]
    ImmediateBox[JimpleLocal: b]

x = temp$0
  Def:
    LinkedVariableBox[JimpleLocal: x]
  Use:
    LinkedRValueBox[JimpleLocal: temp$0]

10. Type

Class hierarchy of Type:

Type
|- PrimType: including int, float, char ...
|  |- BooleanType
|  |- CharType
|  |- IntType
|  `- ...
|- RefLikeType
|  |- ArrayType: array reference
|  |- NullType
|  `- RefType: simple reference
`- VoidType: void

11. Analyses

11.1. Off-The-Shelf Analyses

  • Null Pointer Checker
    • jap.npc
    • jap.npcolorer
  • Array Bound Checker
    • jap.abc
  • Liveness Analysis
    • jap.lvtagger

11.2. Custom Analyses

Inject custom inter-procedural analyses into wjtp pack and intra-procedural analyses into jtp pack.

public class MySootMainExtension {
    public static void main(String[] args) {
        // Inject the analysis tagger into Soot
        PackManager.v().getPack("jtp")
            .add(new Transform("jpt.myanalysistagger",
                               MyAnalysisTagger.instance()));
        // Invoke soot.Main with arguments given
        Main.main(args);
    }
}

11.2.1. Very Busy Expressions Analysis

The goal of Very Busy Expressions analysis is to compute expressions that are very busy at the exit from each program point.

An expression is very busy if, no matter what path is taken, the expression is always used before any of the variables occurring in it are redefined.

This is a must analysis, since if in either one of the path, the expression \(e\) is not used, it is not considered very busy.

This is a backwards analysis, as the fact at node \(d\) is deduced from later (TODO: change word) nodes.

For expression \(e = x + y\) from node \(s\) to \(p\), if either \(x\) or \(y\) is redefined along the path, then even if \(p\) uses expression \(e\), it's not very busy at \(s\).


Authorthebesttv
Created2022-11-15 12:51
Modified2023-01-22 11:40
Generated2024-06-11 02:39
VersionEmacs 29.3 (Org mode 9.6.15)
Rawsoot.org