Pipeline#
Overview#
The rev.ng pipeline is composed by steps. Each step runs a series of pipes. Each pipe works on one or more container. Certain steps have an artifact, which represents basically their output.
Analyses, when scheduled, run at a certain point in the pipeline, after a specified step.
The following tree reports the structure of the pipeline, its steps and analyses.
Containers#
The pipeline declares the following containers:
hex.dump
(type:hex-dump
)cross-relations.yml
(type:binary-cross-relations
)module.bc.zstd
(type:llvm-container
)input
(type:binary
)object.o
(type:object
)output
(type:translated
)assembly-internal.yml.tar.gz
(type:function-assembly-internal
)assembly.ptml.tar.gz
(type:function-assembly-ptml
)call-graph.svg.yml
(type:call-graph-svg
)call-graph-slice.svg.tar.gz
(type:call-graph-slice-svg
)cfg.svg.tar.gz
(type:function-control-flow-graph-svg
)cfg.yml.tar.gz
(type:cfg
)types-and-globals.h
(type:model-header
)helpers.h
(type:helpers-header
)decompiled.c
(type:decompiled-c-code
)decompiled.tar.gz
(type:decompile
)recompilable-archive.tar.gz
(type:recompilable-archive
)module.mlir
(type:mlir-module
)model-type-definitions.tar.gz
(type:model-type-definitions
)
Steps#
initial
step#
Analyses:
-
verify-diff()
: Verifies if a diff for the model would apply correctly. -
import-binary(input)
: This analysis inspects the input binary and, if its format is among the supported ones, imports all the available information in the model. If debug info are available, they are imported as well.In particular, from the binary image we import
Segment
s, theEntryPoint
and, if symbols are available,Function
Entry
and names. We currently support ELF, PE/COFF and Mach-O.From debug information, all the data structures and function prototypes are imported. We currently support DWARF and CodeView (
.pdb
) debug info. -
import-well-known-models()
: Import the prototype of certain well-known functions from the C standard library. -
convert-functions-to-cabi()
: Convert as manyRawFunctionType
s as possible toCABIFunctionType
s, using the requested ABI.
lift
step#
Parent step: initial
Artifact: The root
function produced by the lifting phase.
It's a single large function containing all of the executable code
identified in the binary.
Pipes:
lift(input, module.bc.zstd)
llvm-pipe(module.bc.zstd)
globaldce
Analyses:
-
detect-abi(module.bc.zstd)
: This analysis creates new functions from targets of function calls and from code addresses found in the code or in memory that would otherwise be unreachable.Additionally, the analysis builds, for each function lacking a prototype, a
RawFunctionType
by automatically identifying the list of arguments passed through registers and the list of return values.This analysis doesn't handle stack arguments.
isolate
step#
Parent step: lift
Artifact: This artifact contains an LLVM function for each function in the input program. The functions still employ global variables (CSVs) to pass and return data. Therefore, they lack arguments and return values.
Pipes:
collect-cfg(module.bc.zstd, cfg.yml.tar.gz)
isolate(cfg.yml.tar.gz, module.bc.zstd)
attach-debug-info-to-isolated(cfg.yml.tar.gz, module.bc.zstd)
process-call-graph(cfg.yml.tar.gz, cross-relations.yml)
enforce-abi
step#
Parent step: isolate
Artifact: This artifact contains an LLVM function for each function in the input program. The functions no longer use global variables (CSVs) to communicate: each register is promoted to a local variable, an argument and/or a return value.
This means that, for instance, a function using the SystemV ABI for
x86-64 that has two uint8_t
arguments, will have two 64-bits
registers, not two 8-bits registers.
This reflects the fact that in the considered ABI, two uint8_t
arguments are passed on the rdi
and rsi
registers.
The stack pointer is an exception: it's still used as a CSV. As a consequence, stack arguments are not promoted to actual arguments: they are accessed with pointer arithmetic w.r.t. the stack pointer CSV.
Pipes:
llvm-pipe(module.bc.zstd)
drop-root
enforce-abi(cfg.yml.tar.gz, module.bc.zstd)
llvm-pipe(module.bc.zstd)
strip-debug-info-from-helpers
promote-csvs
mem2reg
inline-helpers
attach-debug-info-to-abi-enforced(cfg.yml.tar.gz, module.bc.zstd)
llvm-pipe(module.bc.zstd)
promote-csvs
remove-exceptional-functions
emit-cfg
step#
Parent step: isolate
Artifact: This artifact is an archive containing one YAML file for each function. Each document contains information about the control-flow graph of each function.
hexdump
step#
Parent step: isolate
Artifact: This artifact contains a hex dump of each segment in the input binary.
Pipes:
hex-dump(input, module.bc.zstd, cfg.yml.tar.gz, hex.dump)
render-svg-call-graph
step#
Parent step: isolate
Artifact: This artifact is an SVG representing the call graph of the input program.
Pipes:
yield-call-graph(cross-relations.yml, call-graph.svg.yml)
render-svg-call-graph-slice
step#
Parent step: isolate
Artifact: This artifact is an archive of SVG files. Each file represents a subset of the call graph considering only the functions called/calling, directly or indirectly, the given function.
Pipes:
yield-call-graph-slice(cfg.yml.tar.gz, cross-relations.yml, call-graph-slice.svg.tar.gz)
process-assembly
step#
Parent step: isolate
Pipes:
process-assembly(input, cfg.yml.tar.gz, assembly-internal.yml.tar.gz)
disassemble
step#
Parent step: process-assembly
Artifact: This artifact is an archive of PTML files. Each file represents the disassembly of the given function.
Pipes:
yield-assembly(assembly-internal.yml.tar.gz, assembly.ptml.tar.gz)
render-svg-cfg
step#
Parent step: process-assembly
Artifact: This artifact is an archive of SVG files. Each file represents the CFG of the given function.
Pipes:
yield-cfg(assembly-internal.yml.tar.gz, cfg.svg.tar.gz)
recompile
step#
Parent step: lift
Artifact: This artifact is translated version of the input binary.
Specifically, it's an ELF executable for Linux x86-64 containing the
root
function plus the required runtime.
Pipes:
link-support(module.bc.zstd)
llvm-pipe(module.bc.zstd)
O2
llvm-pipe(module.bc.zstd)
drop-opaque-return-address
compile(module.bc.zstd, object.o)
link-for-translation(input, object.o, output)
recompile-isolated
step#
Parent step: isolate
Artifact: This artifact is translated version of the input binary.
Specifically, it's an ELF executable for Linux x86-64 containing the
root
function, all the isolated functions plus the required
runtime.
Pipes:
llvm-pipe(module.bc.zstd)
invoke-isolated-functions
link-support(module.bc.zstd)
llvm-pipe(module.bc.zstd)
O2
llvm-pipe(module.bc.zstd)
drop-opaque-return-address
compile-isolated(module.bc.zstd, object.o)
link-for-translation(input, object.o, output)
remove-lifting-artifacts
step#
Parent step: enforce-abi
Pipes:
llvm-pipe(module.bc.zstd)
dce
remove-lifting-artifacts
promote-init-csv-to-undef
promote-stack-pointer
step#
Parent step: remove-lifting-artifacts
Pipes:
llvm-pipe(module.bc.zstd)
measure-stack-size-at-call-sites
promote-stack-pointer
early-optimize
step#
Parent step: promote-stack-pointer
Pipes:
llvm-pipe(module.bc.zstd)
dce
remove-extractvalues
simplify-cfg-with-hoist-and-sink
dse
instcombine
remove-extractvalues
sroa
instsimplify
jump-threading
licm
unreachableblockelim
instcombine
remove-extractvalues
early-cse
simplify-cfg-with-hoist-and-sink
early-type-shrinking
type-shrinking
early-cse
instsimplify
gvn
instsimplify
dse
dce
drop-opaque-return-address
simplify-switch
step#
Parent step: early-optimize
Artifact: This artifact contains an LLVM function for each function in the input program. The stack pointer has been promoted to a local variable and initialized with the result of an opaque function call.
Pipes:
simplify-switch(input, module.bc.zstd)
detect-stack-size
step#
Parent step: simplify-switch
Pipes:
llvm-pipe(module.bc.zstd)
remove-stack-alignment
instrument-stack-accesses
instcombine
remove-extractvalues
loop-rotate
loop-simplify
compute-stack-accesses-bounds
Analyses:
detect-stack-size(module.bc.zstd)
: This analysis, for each function, identifies the size of the stack frame and the amount of stack used to pass arguments.
segregate-stack-accesses
step#
Parent step: detect-stack-size
Pipes:
llvm-pipe(module.bc.zstd)
hoist-struct-phis
segregate-stack-accesses
late-optimize
step#
Parent step: segregate-stack-accesses
Pipes:
llvm-pipe(module.bc.zstd)
cleanup-stack-size-markers
dce
sroa
instcombine
remove-extractvalues
sroa
simplify-cfg-with-hoist-and-sink
loop-rotate
loop-rewrite-with-canonical-induction-variable
simplify-cfg-with-hoist-and-sink
loop-simplify
instcombine
remove-extractvalues
early-cse
dce
strip-dead-prototypes
split-overflow-intrinsics
dce
make-segment-ref
step#
Parent step: late-optimize
Artifact: This artifact contains an LLVM function for each function in the input program. The functions have an argument for each argument in the input prototype.
Unlike upstream artifacts, the arguments are not tied to the
register containing them. So, if a function using the x86-64
SystemV ABI has two uint8_t
arguments, they will appear as two
distinct arguments, as opposed to being merged in a single argument
representing rdi
.
Additionally, this artifact correctly represent each stack argument in the function prototype.
Pipes:
make-segment-ref(input, module.bc.zstd)
Analyses:
analyze-data-layout(module.bc.zstd)
: This analysis inspects the memory accesses performed by the input program and detects the layout of data structures. The produced data structures are the result of merging the information obtained from each function interprocedurally.
canonicalize
step#
Parent step: make-segment-ref
Pipes:
llvm-pipe(module.bc.zstd)
hoist-struct-phis
remove-llvmassume-calls
dce
remove-pointer-casts
make-model-gep
dce
twoscomplement-normalization
peephole-opt-for-decompilation
ternary-reduction
exit-ssa
make-local-variables
remove-load-store
fold-model-gep
dce
switch-to-statements
make-model-cast
implicit-model-cast
operatorprecedence-resolution
pretty-int-formatting
remove-broken-debug-information
decompile
step#
Parent step: canonicalize
Artifact: This artifact is an archive of PTML file representing the C code of the program's functions.
Pipes:
helpers-to-header(module.bc.zstd, helpers.h)
model-to-header(input, types-and-globals.h)
decompile(module.bc.zstd, cfg.yml.tar.gz, decompiled.tar.gz)
Analyses:
import-from-c()
: This analysis, given a snippet of C code representing an individual type, parses it and imports into the model, possibly replacing an existing type.
decompile-to-single-file
step#
Parent step: decompile
Artifact: This artifact is a single PTML file representing the decompiled C code of the whole program, including the body of all of program's functions.
Pipes:
decompile-to-single-file(decompiled.tar.gz, decompiled.c)
emit-recompilable-archive
step#
Parent step: canonicalize
Artifact: This artifact is an archive containing all the files necessary to recompile the decompiled C code of the input program. These files are not in PTML, they are plain C.
It contains:
functions.c
: thedecompile-to-single-file
artifact;types-and-globals.h
: see theemit-model-header
artifact;helpers.h
: see theemit-helpers-header
artifact;attributes.h
: an helper header file defining a set of annotations used by the decompiled C source files;primitive-types.h
: a header defining all the primitive types.
Pipes:
decompile-to-directory(module.bc.zstd, cfg.yml.tar.gz, recompilable-archive.tar.gz)
emit-helpers-header
step#
Parent step: canonicalize
Artifact: This artifact contains the declarations of all the helpers used the decompiled code.
Pipes:
helpers-to-header(module.bc.zstd, helpers.h)
emit-model-header
step#
Parent step: initial
Artifact: This artifact contains all the declaration of types, functions and segments defined in the binary.
Pipes:
model-to-header(input, types-and-globals.h)
emit-type-definitions
step#
Parent step: canonicalize
Artifact: This artifact is an archive of plain C headers.
Each file contains the declaration of a type defined for this
binary.
This artifact is designed to be used as the initial input of the
import-from-c
analysis. In fact, this artifact is designed to be
easily editable by the end-user; it's not designed to represent
valid C code, unlike the emit-model-header
artifact.
Pipes:
generate-model-type-definition(input, model-type-definitions.tar.gz)
cleanup-ir
step#
Parent step: make-segment-ref
Artifact: This artifact contains one LLVM function for each function defined in this binary.
The output similar to the output of make-segment-ref
but it's
cleaned up from rev.ng-specific artifacts in order to be more easily
consumed as standard LLVM IR.
This is an appropriate artifact on top of which write analyses, such as a taint analysis.
Pipes:
llvm-pipe(module.bc.zstd)
instcombine
cleanup-ir
dce
convert-to-mlir
step#
Parent step: make-segment-ref
Artifact: This artifact is an MLIR module with one function for each function defined in the binary.
Pipes:
llvm-pipe(module.bc.zstd)
prepare-llvmir-for-mlir
import-llvm-to-mlir(module.bc.zstd, module.mlir)
import-clift-types
step#
Parent step: convert-to-mlir
Artifact: A test artifact to import types into a Clift module.
Pipes:
import-clift-types(cfg.yml.tar.gz, module.mlir)
Analysis lists#
-
revng-initial-auto-analysis