Pipeline#
Overview#
The rev.ng pipeline is composed by steps. Each step runs a series of pipes. Each pipe works on one or more container. Certain steps have an artifact, which represents basically their output.
Analyses, when scheduled, run at a certain point in the pipeline, after a specified step.
The following tree reports the structure of the pipeline, its steps and analyses.
Containers#
The pipeline declares the following containers:
hex.dump(type:hex-dump)cross-relations.yml(type:binary-cross-relations)functions.bc.zstd(type:llvm-container)root.bc.zstd(type:llvm-container)root-with-functions.bc.zstd(type:llvm-container)input(type:binary)object.o(type:object)output(type:translated)assembly-internal.yml.tar.gz(type:function-assembly-internal)assembly.ptml.tar.gz(type:function-assembly-ptml)call-graph.svg.yml(type:call-graph-svg)call-graph-slice.svg.tar.gz(type:call-graph-slice-svg)cfg.svg.tar.gz(type:function-control-flow-graph-svg)cfg.yml.tar.gz(type:cfg)types-and-globals.h(type:model-header)helpers.h(type:helpers-header)decompiled.c(type:decompiled-c-code)decompiled.tar.gz(type:decompile)recompilable-archive.tar.gz(type:recompilable-archive)functions.mlir(type:clift-module)model-type-definitions.tar.gz(type:model-type-definitions)
Steps#
initial step#
Analyses:
-
verify-diff(): Verifies if a diff for the model would apply correctly. -
import-binary(input): This analysis inspects the input binary and, if its format is among the supported ones, imports all the available information in the model. If debug info are available, they are imported as well.In particular, from the binary image we import
Segments, theEntryPointand, if symbols are available,FunctionEntryand names. We currently support ELF, PE/COFF and Mach-O.From debug information, all the data structures and function prototypes are imported. We currently support DWARF and CodeView (
.pdb) debug info. -
import-well-known-models(): Import the prototype of certain well-known functions from the C standard library. -
convert-functions-to-cabi(): Convert as manyRawFunctionTypes as possible toCABIFunctionTypes, using the requested ABI.
lift step#
Parent step: initial
Artifact: The root function produced by the lifting phase.
It's a single large function containing all of the executable code
identified in the binary.
Pipes:
lift(input, root.bc.zstd)llvm-pipe(root.bc.zstd)globaldce
Analyses:
-
detect-abi(root.bc.zstd): This analysis creates new functions from targets of function calls and from code addresses found in the code or in memory that would otherwise be unreachable.Additionally, the analysis builds, for each function lacking a prototype, a
RawFunctionTypeby automatically identifying the list of arguments passed through registers and the list of return values.This analysis doesn't handle stack arguments.
isolate step#
Parent step: lift
Artifact: This artifact contains an LLVM function for each function in the input program. The functions still employ global variables (CSVs) to pass and return data. Therefore, they lack arguments and return values.
Pipes:
collect-cfg(root.bc.zstd, cfg.yml.tar.gz)isolate(cfg.yml.tar.gz, root.bc.zstd, functions.bc.zstd)attach-debug-info-to-isolated(cfg.yml.tar.gz, functions.bc.zstd)process-call-graph(cfg.yml.tar.gz, cross-relations.yml)
enforce-abi step#
Parent step: isolate
Artifact: This artifact contains an LLVM function for each function in the input program. The functions no longer use global variables (CSVs) to communicate: each register is promoted to a local variable, an argument and/or a return value.
This means that, for instance, a function using the SystemV ABI for
x86-64 that has two uint8_t arguments, will have two 64-bits
registers, not two 8-bits registers.
This reflects the fact that in the considered ABI, two uint8_t
arguments are passed on the rdi and rsi registers.
The stack pointer is an exception: it's still used as a CSV. As a consequence, stack arguments are not promoted to actual arguments: they are accessed with pointer arithmetic w.r.t. the stack pointer CSV.
Pipes:
enforce-abi(cfg.yml.tar.gz, functions.bc.zstd)llvm-pipe(functions.bc.zstd)strip-debug-info-from-helperspromote-csvsmem2reginline-helpers
attach-debug-info-to-abi-enforced(cfg.yml.tar.gz, functions.bc.zstd)llvm-pipe(functions.bc.zstd)promote-csvsremove-exceptional-functions
emit-cfg step#
Parent step: isolate
Artifact: This artifact is an archive containing one YAML file for each function. Each document contains information about the control-flow graph of each function.
hexdump step#
Parent step: isolate
Artifact: This artifact contains a hex dump of each segment in the input binary.
Pipes:
hex-dump(input, functions.bc.zstd, cfg.yml.tar.gz, hex.dump)
render-svg-call-graph step#
Parent step: isolate
Artifact: This artifact is an SVG representing the call graph of the input program.
Pipes:
yield-call-graph(cross-relations.yml, call-graph.svg.yml)
render-svg-call-graph-slice step#
Parent step: isolate
Artifact: This artifact is an archive of SVG files. Each file represents a subset of the call graph considering only the functions called/calling, directly or indirectly, the given function.
Pipes:
yield-call-graph-slice(cfg.yml.tar.gz, cross-relations.yml, call-graph-slice.svg.tar.gz)
process-assembly step#
Parent step: isolate
Pipes:
process-assembly(input, cfg.yml.tar.gz, assembly-internal.yml.tar.gz)
disassemble step#
Parent step: process-assembly
Artifact: This artifact is an archive of PTML files. Each file represents the disassembly of the given function.
Pipes:
yield-assembly(assembly-internal.yml.tar.gz, assembly.ptml.tar.gz)
render-svg-cfg step#
Parent step: process-assembly
Artifact: This artifact is an archive of SVG files. Each file represents the CFG of the given function.
Pipes:
yield-cfg(assembly-internal.yml.tar.gz, cfg.svg.tar.gz)
recompile step#
Parent step: lift
Artifact: This artifact is translated version of the input binary.
Specifically, it's an ELF executable for Linux x86-64 containing the
root function plus the required runtime.
Pipes:
link-support(root.bc.zstd)llvm-pipe(root.bc.zstd)O2
llvm-pipe(root.bc.zstd)drop-opaque-return-address
compile(root.bc.zstd, object.o)link-for-translation(input, object.o, output)
recompile-isolated step#
Parent step: isolate
Artifact: This artifact is translated version of the input binary.
Specifically, it's an ELF executable for Linux x86-64 containing the
root function, all the isolated functions plus the required
runtime.
Pipes:
invoke-isolated-functions(root.bc.zstd, functions.bc.zstd, root-with-functions.bc.zstd)link-support(root-with-functions.bc.zstd)llvm-pipe(root-with-functions.bc.zstd)O2
llvm-pipe(root-with-functions.bc.zstd)drop-opaque-return-address
compile-isolated(root-with-functions.bc.zstd, object.o)link-for-translation(input, object.o, output)
remove-lifting-artifacts step#
Parent step: enforce-abi
Pipes:
llvm-pipe(functions.bc.zstd)dceremove-lifting-artifactspromote-init-csv-to-undef
promote-stack-pointer step#
Parent step: remove-lifting-artifacts
Pipes:
llvm-pipe(functions.bc.zstd)measure-stack-size-at-call-sitespromote-stack-pointer
early-optimize step#
Parent step: promote-stack-pointer
Pipes:
llvm-pipe(functions.bc.zstd)dceremove-extractvaluessimplify-cfg-with-hoist-and-sinkdseinstcombineremove-extractvaluessroainstsimplifyjump-threadinglicmunreachableblockeliminstcombineremove-extractvaluesearly-csesimplify-cfg-with-hoist-and-sinkearly-type-shrinkingtype-shrinkingearly-cseinstsimplifygvninstsimplifydsedcedrop-opaque-return-address
simplify-switch step#
Parent step: early-optimize
Artifact: This artifact contains an LLVM function for each function in the input program. The stack pointer has been promoted to a local variable and initialized with the result of an opaque function call.
Pipes:
simplify-switch(input, functions.bc.zstd)
detect-stack-size step#
Parent step: simplify-switch
Pipes:
llvm-pipe(functions.bc.zstd)remove-stack-alignmentinstrument-stack-accessesinstcombineremove-extractvaluesloop-rotateloop-simplifycompute-stack-accesses-bounds
Analyses:
detect-stack-size(functions.bc.zstd): This analysis, for each function, identifies the size of the stack frame and the amount of stack used to pass arguments.
legacy-segregate-stack-accesses step#
Parent step: detect-stack-size
Pipes:
llvm-pipe(functions.bc.zstd)hoist-struct-phislegacy-segregate-stack-accessescleanup-stack-size-markersdcesroainstcombineremove-extractvaluessroasimplify-cfg-with-hoist-and-sinkloop-rotateloop-rewrite-with-canonical-induction-variablesimplify-cfg-with-hoist-and-sinkloop-simplifyinstcombineremove-extractvaluesearly-csedcestrip-dead-prototypessplit-overflow-intrinsicsdceremove-llvmassume-calls
make-segment-ref step#
Parent step: legacy-segregate-stack-accesses
Artifact: This artifact contains an LLVM function for each function in the input program. The functions have an argument for each argument in the input prototype.
Unlike upstream artifacts, the arguments are not tied to the
register containing them. So, if a function using the x86-64
SystemV ABI has two uint8_t arguments, they will appear as two
distinct arguments, as opposed to being merged in a single argument
representing rdi.
Additionally, this artifact correctly represent each stack argument in the function prototype.
Pipes:
make-segment-ref(input, functions.bc.zstd)
Analyses:
analyze-data-layout(functions.bc.zstd): This analysis inspects the memory accesses performed by the input program and detects the layout of data structures. The produced data structures are the result of merging the information obtained from each function interprocedurally.
canonicalize step#
Parent step: make-segment-ref
Pipes:
llvm-pipe(functions.bc.zstd)hoist-struct-phisremove-llvmassume-callsdceremove-pointer-castsmake-model-gepdcetwoscomplement-normalizationpeephole-opt-for-decompilationternary-reductionexit-ssamake-local-variablesremove-load-storefold-model-gepdcelegacy-switch-to-statementsmake-model-castimplicit-model-castoperatorprecedence-resolutionpretty-int-formattingdiscard-broken-debug-information
embed-statement-comments step#
Parent step: canonicalize
Pipes:
embed-statement-comments(functions.bc.zstd)
decompile step#
Parent step: embed-statement-comments
Artifact: This artifact is an archive of PTML file representing the C code of the program's functions.
Pipes:
helpers-to-header(functions.bc.zstd, helpers.h)model-to-header(input, types-and-globals.h)decompile(functions.bc.zstd, cfg.yml.tar.gz, decompiled.tar.gz)
Analyses:
-
import-from-c(): This analysis, given a snippet of C code representing an individual type, parses it and imports into the model, possibly replacing an existing type. -
llm-rename(decompiled.tar.gz): Rename the specified function(s) bodies using an LLM
decompile-to-single-file step#
Parent step: decompile
Artifact: This artifact is a single PTML file representing the decompiled C code of the whole program, including the body of all of program's functions.
Pipes:
decompile-to-single-file(decompiled.tar.gz, decompiled.c)
emit-recompilable-archive step#
Parent step: embed-statement-comments
Artifact: This artifact is an archive containing all the files necessary to recompile the decompiled C code of the input program. These files are not in PTML, they are plain C.
It contains:
functions.c: thedecompile-to-single-fileartifact;types-and-globals.h: see theemit-model-headerartifact;helpers.h: see theemit-helpers-headerartifact;attributes.h: an helper header file defining a set of annotations used by the decompiled C source files;primitive-types.h: a header defining all the primitive types.
Pipes:
decompile-to-directory(functions.bc.zstd, cfg.yml.tar.gz, recompilable-archive.tar.gz)
emit-helpers-header step#
Parent step: canonicalize
Artifact: This artifact contains the declarations of all the helpers used the decompiled code.
Pipes:
helpers-to-header(functions.bc.zstd, helpers.h)
emit-model-header step#
Parent step: initial
Artifact: This artifact contains all the declaration of types, functions and segments defined in the binary.
Pipes:
model-to-header(input, types-and-globals.h)
emit-type-definitions step#
Parent step: canonicalize
Artifact: This artifact is an archive of plain C headers.
Each file contains the declaration of a type defined for this
binary.
This artifact is designed to be used as the initial input of the
import-from-c analysis. In fact, this artifact is designed to be
easily editable by the end-user; it's not designed to represent
valid C code, unlike the emit-model-header artifact.
Pipes:
generate-model-type-definition(input, model-type-definitions.tar.gz)
cleanup-ir step#
Parent step: make-segment-ref
Artifact: This artifact contains one LLVM function for each function defined in this binary.
The output is similar to the output of make-segment-ref but it's
cleaned up from rev.ng-specific artifacts in order to be more easily
consumed as standard LLVM IR.
This is an appropriate artifact on top of which write analyses, such as a taint analysis.
Pipes:
llvm-pipe(functions.bc.zstd)instcombinecleanup-irdce
segregate-stack-accesses step#
Parent step: detect-stack-size
Pipes:
llvm-pipe(functions.bc.zstd)hoist-struct-phissegregate-stack-accessescleanup-stack-size-markersdcesroa-noarraysinstcombine-noarraysremove-extractvaluessroa-noarrayssimplify-cfg-with-hoist-and-sinkloop-rotateloop-rewrite-with-canonical-induction-variablesimplify-cfg-with-hoist-and-sinkloop-simplifyinstcombine-noarraysremove-extractvaluesearly-csedcestrip-dead-prototypessplit-overflow-intrinsicsdce
emit-c step#
Parent step: segregate-stack-accesses
Artifact: This artifact contains a C function for each function in the input program.
Pipes:
llvm-pipe(functions.bc.zstd)remove-llvmassume-callsdceexit-ssaswitch-to-statementsremove-constant-array-returnsdagifyinline-divergent-scopesenforce-single-exitmaterialize-trivial-gotoselect-scopeinline-divergent-scopesenforce-single-exitmaterialize-trivial-goto
llvm-to-clift(functions.bc.zstd, functions.mlir)clift-optimization(functions.mlir)model-verify-clift(functions.mlir)import-model-names(functions.mlir)emit-c(functions.mlir, decompiled.tar.gz)
emit-c-as-single-file step#
Parent step: emit-c
Artifact: This artifact is a single PTML file representing the decompiled C code of the whole program, including the body of all of program's functions. (Clift backend)
Pipes:
decompile-to-single-file(decompiled.tar.gz, decompiled.c)
Analysis lists#
-
revng-initial-auto-analysis