\documentclass{article} %% \usepackage{indentfirst} \usepackage{fullpage} \usepackage{html} \begin{document} \title{Adding attributes to class files via Soot} \author{Feng Qian (\htmladdnormallink{fqian@sable.mcgill.ca} {mailto:fqian@sable.mcgill.ca})\\ Patrick Lam (\htmladdnormallink{plam@sable.mcgill.ca} {mailto:plam@sable.mcgill.ca})} \date{\today} \maketitle Soot can annotate class files: for instance, it can add information about which array bounds checks and null pointer checks are redundant. We anticipate that users of Soot may wish to add new attributes to class files. This tutorial uses the array bounds check attribute to illustrate the internal structure of Soot annotation and describes how to add new attributes via Soot. Before reading this tutorial, readers should be familiar with the basic Soot classes, like SootClass, SootField, SootMethod, Body, and Unit. The other Soot \htmladdnormallink{tutorials}{http://www.sable.mcgill.ca/soot/tutorial} explain these classes. \section{Structure of annotation} In release 1.2.0, we introduced a scheme for class file annotation in Soot. The general description of our annotation framework can be found in our CASCON paper, ``\htmladdnormallink{A Framework for Optimizing Java Using Attributes}{http://www.sable.mcgill.ca/publications/\#report2000-2}''. This tutorial explains practical issues related to implementing new attributes. We first introduce the classes used for annotation. These classes reside in the {\tt soot.tagkit} package. \begin{description} \item[Host] interface\\ Any class which implements the {\tt Host} interface promises that it can hold tags. \begin{verbatim} public interface Host { /** Get a list of tags associated with the current object. */ public List getTags(); /** Returns a tag with the given name. */ public Tag getTag(String aName); /** Adds a tag. */ public void addTag(Tag t); /** Removes a tag with the given name. */ public void removeTag(String name); /** Returns true if this host has a tag with the given name. */ public boolean hasTag(String aName); } \end{verbatim} \item[AbstractHost] class\\ Soot provides a default implementation of the {\tt Host} interface in the form of {\tt AbstractHost}. Unless you have a pressing desire to provide the functionality yourself, any classes which you would like to implement {\tt Host} should subclass {\tt AbstractHost}. In Soot, the classes {\tt SootClass}, {\tt SootField}, {\tt SootMethod}, {\tt Body}, and {\tt Unit} inherit from {\tt AbstractHost}. Instances of these classes know how to carry tags, in the form of the {\tt Tag} interface. \item[Tag] interface\\ In Soot, we represent any annotation by an object whose type implements the {\tt Tag} interface. This interface defines one methods: {\tt getName()} returns the unique name of the tag (note that tag names must not conflict with each other). \begin{verbatim} public interface Tag { /** Returns the tag's name. */ public String getName(); } \end{verbatim} \item[Attribute] interface\\ The {\tt Attribute} interface extends {\tt Tag}; it promises that the associated tag has attribute-like data which can be read and written as an array of bytes. \begin{verbatim} public interface Attribute extends Tag { /** Returns the tag's raw data. */ public byte[] getValue() throws AttributeValueException; /** Sets the value of the attribute from a byte[]. */ public void setValue(byte[] v); } \end{verbatim} An {\tt Tag} which is not an {\tt Attribute} could be used to store arbitrary Soot information about a {\tt Host}. An {\tt Attribute} is something that would go in a class file. \item[TagAggregator] interface\\ The array-bounds check analysis annotates individual instructions as it discovers whether or not their bounds checks are required. More generally, analyses will attach attributes directly to the Units in question. However, the Java class file structure does not make any provisions for directly attaching attributes to bytecodes, and attaching attributes directly to bytecodes would in any case be inefficient. Hence, we designed our attributes so that they would attach to a method in a tabular format: only one actual attribute is required per method body and tag type; this meta-attribute contains information about a number of different instructions. An implementation of {\tt TagAggregator} promises that it can combine all tags of some type into one big aggregated attribute, which can be attached to a method's code attribute. One implementation of a {\tt TagAggregator} is {\tt soot.jimple.toolkits.annotation.tags.ArrayNullTagAggregator}. One of the things that an aggregator must do is decide which bytecode instructions a tag will be associated with, since each Jimple statement may turn into several bytecode instructions. Two subclasses of {\tt TagAggregator}, {\tt ImportantTagAggregator} and {\tt FirstTagAggregator}, attach the tags to important instructions containing field or array references or method invocations, and to the first bytecode instruction for the Jimple statement, respectively. \item[Base64] tool class\\ This utility class allows the encoding of raw bytes to base64-encoded characters and the decoding of base64 characters back to raw bytes. \item[JasminAttribute] abstract class\\ Attributes are generated by analysis phases in the form of strings containing labels in the unit body and their values; for instance, we might have the attribute \verb+"%label2%Aw==%label3%Ag==%label4%Ag=="+ associated with a method body. In order to include this attribute in a class file, exact PC values are needed for the labels. The {\tt JasminAttribute} class provides a {\tt decode} method which takes a string of (label, value) pairs and a map from labels to PCs and emits raw data, ready for inclusion in a class file. This method is called by Jasmin after the PC values are known. Any attribute which uses (label, value) pairs can subclass {\tt JasminAttribute} to get output to class files for free; other attributes hoping to be output to class files must subclass {\tt JasminAttribute} and override the {\tt decode} method. The abstract {\tt getJasminValue()} method must return a string that can be included when outputting a {\tt .jasmin} file. This string later gets decoded by {\tt decode()}. \begin{verbatim} public abstract class JasminAttribute implements Attribute { public static byte[] decode(String attr, Hashtable labelToPc); abstract public String getJasminValue(Map instToLabel); } \end{verbatim} \item[CodeAttribute] class\\ This class provides an implementation of the abstract {\tt getJasminValue()} method of {\tt JasminAttribute}. The {\tt getJasminValue()} method must return a string reflecting the contents of its {\tt CodeAttribute}. It may use the provided {\tt instToLabel} map to convert {\tt Unit}s into labels used in the returned {\tt String}. \begin{verbatim} public class CodeAttribute extends JasminAttribute { public String getJasminValue(Map instToLabel); } \end{verbatim} This type of attribute is clearly intended to be used for attributes associated with code. \item[GenericAttribute] class\\ Java describes how three other types of attributes can be created: attributes may be associated with methods, fields and classes as well as code. Soot supports these attributes via the {\tt GenericAttribute} class. Any such attribute can be created with an attribute name and a byte array value; it can then be attached to {\tt SootClass}, {\tt SootField}, or {\tt SootMethod}. \begin{verbatim} public class GenericAttribute implements Attribute { public GenericAttribute(String name, byte[] value); public String getName(); public byte[] getValue(); } \end{verbatim} \end{description} The above classes provide APIs useful for adding new attributes. Soot attributes are represented as {\tt Tag}s, and are attached to {\tt Host}s. An exception is {\tt CodeAttribute}. Because the tags for {\tt CodeAttribute} are attached to units, a {\tt TagAggregator} is used to combine them. {\tt TagAggregator}s are instances of {\tt BodyTransformer}s, and are generally included in the {\tt tag} pack. \section{Adding method attributes in Soot} Adding a code attribute is non-trivial, as it requires that an aggregator be provided. We first give a trivial example of adding a method attribute via {\tt GenericAttribute}. The code can be found in {\tt ashes.examples.addattributes}. It can also be downloaded at: \htmladdnormallink{\tt http://www.sable.mcgill.ca/soot/tutorial/addattributes/Main.java}{Main.java} We proceed by adding a new phase to the {\tt jtp} {\tt Pack}, called {\tt annotexample}. \begin{verbatim} package ashes.examples.addattributes; import soot.*; import soot.tagkit.*; import java.util.*; public class AnnExample { public static void main(String[] args) { /* adds the transformer. */ PackManager.v().getPack("jtp").add(new Transform("jtp.annotexample", AnnExampleWrapper.v())); /* invokes Soot */ soot.Main.main(args); } } \end{verbatim} The {\tt AnnExampleWrapper} is a subclass of {\tt BodyTransformer}, which implements the {\tt internalTransform} method. It simply adds a string ``Hello world!'' as an attribute to every method. The attribute has the name `Example'. \begin{verbatim} public class AnnExampleWrapper extends BodyTransformer { private static AnnExampleWrapper instance = new AnnExampleWrapper(); private AnnExampleWrapper() {}; public static AnnExampleWrapper v() { return instance; } public void internalTransform(Body body, String phaseName, Map options) { SootMethod method = body.getMethod(); String attr = new String("Hello world!"); Tag example = new GenericAttribute("Example", attr.getBytes()); method.addTag(example); } } \end{verbatim} We recompile {\tt foo} and annotate it with new attributes.\\ {\tt java AnnExample foo}\\ The annotated class file has an ``Example'' attribute for each method. The string ``Hello world!'' is in binary form. \begin{verbatim} public class foo extends java.lang.Object filename foo compiled from foo.jasmin compiler version 45.3 access flags 33 constant pool 14 entries ACC_SUPER flag true Attribute(s): SourceFile(foo.jasmin) 2 methods: public void () <(Unknown attribute Example: 48 65 6c 6c 6f 20 77 6f 72 6c 64 21)> void footest() <(Unknown attribute Example: 48 65 6c 6c 6f 20 77 6f 72 6c 64 21)> public void () <(Unknown attribute Example: 48 65 6c 6c 6f 20 77 6f 72 6c 64 21)> Code(max_stack = 1, max_locals = 1, code_length = 5) 0: aload_0 1: invokespecial java.lang.Object. ()V (2) 4: return void footest() <(Unknown attribute Example: 48 65 6c 6c 6f 20 77 6f 72 6c 64 21)> Code(max_stack = 3, max_locals = 1, code_length = 7) 0: iconst_2 1: newarray 3: iconst_0 4: iconst_1 5: iastore 6: return \end{verbatim} \section{The Array Bounds Check Annotation Example} In this section, we will use the array bounds check attribute to illustrate the process of creating a new code attribute. The classes in this example are located in the {\tt soot.jimple.toolkits.annotation.arraycheck} and {\tt .nullcheck} packages. Clearly we must be able to represent whether or not an array reference is safe. To do this, we first created the {\tt ArrayCheckTag} class implementing (a subclass of) {\tt Tag}. It is not an {\tt Attribute} because the information is not in a form suitable for adding to a class file and setting the information directly is meaningless. {\tt ArrayCheckTag} has a constructor with boolean parameters representing upper and lower array bounds checks. If a parameter is {\tt true}, the respective bound check is needed. The {\tt getValue()} method converts the boolean values to a byte value where the lowest two bits represent the bounds checks. \begin{verbatim} /** * This tag represents the two bounds checks of an array reference. * The value true indicates that a check is needed. */ public ArrayCheckTag(boolean lower, boolean upper) { lowerCheck = lower; upperCheck = upper; } /** Returns the value of this tag as a one-byte array for inclusion in * the class file. */ public byte[] getValue() { byte[] value = new byte[1]; value[0] = 0; if (lowerCheck) value[0] |= 0x01; if (upperCheck) value[0] |= 0x02; return value; } \end{verbatim} We designed an algorithm to analyze array bounds checks. The final phase of this algorithm attaches the analysis results to the various units as tags. This is accomplished with the following code: \begin{verbatim} Tag checkTag = new ArrayCheckTag(lowercheck, uppercheck); stmt.addTag(checkTag); \end{verbatim} As previously explained, code tags are attached to units, but units themselves do not have attributes. Thus, an aggregator is needed to group the attributes. Now, a null pointer check elimination algorithm has already executed, attaching {\tt NullCheckTag}s to units. An {\tt ArrayNullTagAggregator} will collect the {\tt NullCheckTag}s and {\tt ArrayCheckTag}s, combining these two tags into a single {\tt ArrayNullCheckTag} per method body. \begin{verbatim} public class ArrayNullCheckTag implements OneByteCodeTag { private final static String NAME = "ArrayNullCheckTag"; public String getName(); public byte[] getValue(); public byte accumulate(byte other); } \end{verbatim} The {\tt ArrayNullTagAggregator} implements the {\tt TagAggregator} interface. It is called while Baf is generating its backend code. The {\tt wantTag()} method returns true for tags that are to be considered by this aggregator. The {\tt considerTag} method accumulates one (unit, tag) pair, typically encountered during Baf's traversal of the units. \begin{verbatim} public class ArrayNullTagAggregator extends TagAggregator { public ArrayNullTagAggregator( Singletons.Global g ) {} public static ArrayNullTagAggregator v() { return G.v().ArrayNullTagAggregator(); } public boolean wantTag( Tag t ) { return (t instanceof OneByteCodeTag); } public void considerTag(Tag t, Unit u) { Inst i = (Inst) u; if(! ( i.containsInvokeExpr() || i.containsFieldRef() || i.containsArrayRef() ) ) return; OneByteCodeTag obct = (OneByteCodeTag) t; if( units.size() == 0 || units.getLast() != u ) { units.add( u ); tags.add( new ArrayNullCheckTag() ); } ArrayNullCheckTag anct = (ArrayNullCheckTag) tags.getLast(); anct.accumulate(obct.getValue()[0]); } public String aggregatedName() { return "ArrayNullCheckAttribute"; } } \end{verbatim} We examine the annotation process on a simple example, {\tt foo.class}. \begin{verbatim} public class foo { void footest() { int[] c = new int[2]; c[0] = 1; } } \end{verbatim} After compilation with {\tt javac}, we can use the {\tt JavaClass} tool to inspect the contents of the class file. \begin{verbatim} public class foo extends java.lang.Object filename foo compiled from foo.java compiler version 45.3 access flags 33 constant pool 14 entries ACC_SUPER flag true Attribute(s): SourceFile(foo.java) 2 methods: public void () void footest() public void () Code(max_stack = 1, max_locals = 1, code_length = 5) 0: aload_0 1: invokespecial java.lang.Object. ()V (3) 4: return Attribute(s) = LineNumber(0, 1) void footest() Code(max_stack = 3, max_locals = 2, code_length = 9) 0: iconst_2 1: newarray 3: astore_1 4: aload_1 5: iconst_0 6: iconst_1 7: iastore 8: return Attribute(s) = LineNumber(0, 5), LineNumber(4, 7), LineNumber(8, 3) \end{verbatim} Great. Next, we annotate the class by executing \begin{verbatim} [plam@kerala soot]$ java soot.Main -A both foo \end{verbatim} and inspect the annotated class file. \begin{verbatim} public class foo extends java.lang.Object filename foo compiled from foo.jasmin compiler version 45.3 access flags 33 constant pool 14 entries ACC_SUPER flag true Attribute(s): SourceFile(foo.jasmin) 2 methods: public void () void footest() public void () Code(max_stack = 1, max_locals = 1, code_length = 5) 0: aload_0 1: invokespecial java.lang.Object. ()V (2) 4: return Attribute(s) = (Unknown attribute ArrayNullCheckAttribute: 00 01 00) void footest() Code(max_stack = 3, max_locals = 1, code_length = 7) 0: iconst_2 1: newarray 3: iconst_0 4: iconst_1 5: iastore 6: return Attribute(s) = (Unknown attribute ArrayNullCheckAttribute: 00 05 00) \end{verbatim} We can see that an {\tt ArrayNullCheckAttribute} has been added to the class file, and we can read the attribute data in hexadecimal. \section*{Known shortcomings} Soot cannot currently preserve existing attributes in a class file when transforming and annotating it. In the {\tt foo} example, any debug information from {\tt javac} would be lost after annotation. \section*{History} \begin{itemize} \item October 6, 2000 : Initial version. \item May 31, 2003 : Updated for Soot 2.0. \end{itemize} \end{document}