Skip to content

Pipeline#

Overview#

The rev.ng pipeline is composed by steps. Each step runs a series of pipes. Each pipe works on one or more container. Certain steps have an artifact, which represents basically their output.

Analyses, when scheduled, run at a certain point in the pipeline, after a specified step.

The following tree reports the structure of the pipeline, its steps and analyses.

%3legendLegend:step-initialinitiallegend->step-initiallegend-stepSteplegend-artifactStep withartifactlegend-analysisAnalysisanalysis-apply-diffapply-diffstep-initial->analysis-apply-diff:nanalysis-verify-diffverify-diffstep-initial->analysis-verify-diff:nanalysis-set-globalset-globalstep-initial->analysis-set-global:nanalysis-verify-globalverify-globalstep-initial->analysis-verify-global:nanalysis-import-binaryimport-binarystep-initial->analysis-import-binary:nanalysis-import-well-known-modelsimport-well-known-modelsstep-initial->analysis-import-well-known-models:nanalysis-convert-functions-to-cabiconvert-functions-to-cabistep-initial->analysis-convert-functions-to-cabi:nstep-liftliftstep-initial->step-lift:nstep-emit-model-headeremit-model-headerstep-initial->step-emit-model-header:nanalysis-set-global->analysis-verify-globalanalysis-set-global->analysis-import-binaryanalysis-set-global->analysis-import-well-known-modelsanalysis-import-well-known-models->analysis-convert-functions-to-cabianalysis-import-well-known-models->step-liftanalysis-import-well-known-models->step-emit-model-headeranalysis-detect-abidetect-abistep-lift->analysis-detect-abi:nstep-isolateisolatestep-lift->step-isolate:nstep-recompilerecompilestep-lift->step-recompile:nstep-enforce-abienforce-abistep-isolate->step-enforce-abi:nstep-emit-cfgemit-cfgstep-isolate->step-emit-cfg:nstep-hexdumphexdumpstep-isolate->step-hexdump:nstep-render-svg-call-graphrender-svg-call-graphstep-isolate->step-render-svg-call-graph:nstep-render-svg-call-graph-slicerender-svg-call-graph-slicestep-isolate->step-render-svg-call-graph-slice:nstep-process-assemblyprocess-assemblystep-isolate->step-process-assembly:nstep-recompile-isolatedrecompile-isolatedstep-isolate->step-recompile-isolated:nstep-remove-lifting-artifactsremove-lifting-artifactsstep-enforce-abi->step-remove-lifting-artifacts:nstep-hexdump->step-render-svg-call-graphstep-hexdump->step-render-svg-call-graph-slicestep-hexdump->step-process-assemblystep-disassembledisassemblestep-process-assembly->step-disassemble:nstep-render-svg-cfgrender-svg-cfgstep-process-assembly->step-render-svg-cfg:nstep-process-assembly->step-recompile-isolatedstep-promote-stack-pointerpromote-stack-pointerstep-remove-lifting-artifacts->step-promote-stack-pointer:nstep-early-optimizeearly-optimizestep-promote-stack-pointer->step-early-optimize:nstep-simplify-switchsimplify-switchstep-early-optimize->step-simplify-switch:nstep-detect-stack-sizedetect-stack-sizestep-simplify-switch->step-detect-stack-size:nanalysis-detect-stack-sizedetect-stack-sizestep-detect-stack-size->analysis-detect-stack-size:nstep-segregate-stack-accessessegregate-stack-accessesstep-detect-stack-size->step-segregate-stack-accesses:nstep-late-optimizelate-optimizestep-segregate-stack-accesses->step-late-optimize:nstep-make-segment-refmake-segment-refstep-late-optimize->step-make-segment-ref:nanalysis-analyze-data-layoutanalyze-data-layoutstep-make-segment-ref->analysis-analyze-data-layout:nstep-canonicalizecanonicalizestep-make-segment-ref->step-canonicalize:nstep-cleanup-ircleanup-irstep-make-segment-ref->step-cleanup-ir:nstep-convert-to-mlirconvert-to-mlirstep-make-segment-ref->step-convert-to-mlir:nstep-decompiledecompilestep-canonicalize->step-decompile:nstep-emit-recompilable-archiveemit-recompilable-archivestep-canonicalize->step-emit-recompilable-archive:nstep-emit-helpers-headeremit-helpers-headerstep-canonicalize->step-emit-helpers-header:nstep-emit-type-definitionsemit-type-definitionsstep-canonicalize->step-emit-type-definitions:nanalysis-import-from-cimport-from-cstep-decompile->analysis-import-from-c:nstep-decompile-to-single-filedecompile-to-single-filestep-decompile->step-decompile-to-single-file:nstep-emit-helpers-header->step-emit-type-definitionsstep-cleanup-ir->step-convert-to-mlirstep-import-clift-typesimport-clift-typesstep-convert-to-mlir->step-import-clift-types:n

Containers#

The pipeline declares the following containers:

  • hex.dump (type: hex-dump)
  • cross-relations.yml (type: binary-cross-relations)
  • module.bc.zstd (type: llvm-container)
  • input (type: binary)
  • object.o (type: object)
  • output (type: translated)
  • assembly-internal.yml.tar.gz (type: function-assembly-internal)
  • assembly.ptml.tar.gz (type: function-assembly-ptml)
  • call-graph.svg.yml (type: call-graph-svg)
  • call-graph-slice.svg.tar.gz (type: call-graph-slice-svg)
  • cfg.svg.tar.gz (type: function-control-flow-graph-svg)
  • cfg.yml.tar.gz (type: cfg)
  • types-and-globals.h (type: model-header)
  • helpers.h (type: helpers-header)
  • decompiled.c (type: decompiled-c-code)
  • decompiled.tar.gz (type: decompile)
  • recompilable-archive.tar.gz (type: recompilable-archive)
  • module.mlir (type: mlir-module)
  • model-type-definitions.tar.gz (type: model-type-definitions)

Steps#

initial step#

Analyses:

  • apply-diff(): Apply a diff to the model.

  • verify-diff(): Verifies if a diff for the model would apply correctly.

  • set-global(): Replace the model with a new one.

  • verify-global(): Verify if the given model is valid.

  • import-binary(input): This analysis inspects the input binary and, if its format is among the supported ones, imports all the available information in the model. If debug info are available, they are imported as well.

    In particular, from the binary image we import Segments, the EntryPoint and, if symbols are available, Function Entry and names. We currently support ELF, PE/COFF and Mach-O.

    From debug information, all the data structures and function prototypes are imported. We currently support DWARF and CodeView (.pdb) debug info.

  • import-well-known-models(): Import the prototype of certain well-known functions from the C standard library.

  • convert-functions-to-cabi(): Convert as many RawFunctionTypes as possible to CABIFunctionTypes, using the requested ABI.

lift step#

Parent step: initial

Artifact: The root function produced by the lifting phase. It's a single large function containing all of the executable code identified in the binary.

Pipes:

  • lift(input, module.bc.zstd)
  • llvm-pipe(module.bc.zstd)
    • globaldce

Analyses:

  • detect-abi(module.bc.zstd): This analysis creates new functions from targets of function calls and from code addresses found in the code or in memory that would otherwise be unreachable.

    Additionally, the analysis builds, for each function lacking a prototype, a RawFunctionType by automatically identifying the list of arguments passed through registers and the list of return values.

    This analysis doesn't handle stack arguments.

isolate step#

Parent step: lift

Artifact: This artifact contains an LLVM function for each function in the input program. The functions still employ global variables (CSVs) to pass and return data. Therefore, they lack arguments and return values.

Pipes:

  • collect-cfg(module.bc.zstd, cfg.yml.tar.gz)
  • isolate(cfg.yml.tar.gz, module.bc.zstd)
  • attach-debug-info-to-isolated(cfg.yml.tar.gz, module.bc.zstd)
  • process-call-graph(cfg.yml.tar.gz, cross-relations.yml)

enforce-abi step#

Parent step: isolate

Artifact: This artifact contains an LLVM function for each function in the input program. The functions no longer use global variables (CSVs) to communicate: each register is promoted to a local variable, an argument and/or a return value.

This means that, for instance, a function using the SystemV ABI for x86-64 that has two uint8_t arguments, will have two 64-bits registers, not two 8-bits registers. This reflects the fact that in the considered ABI, two uint8_t arguments are passed on the rdi and rsi registers.

The stack pointer is an exception: it's still used as a CSV. As a consequence, stack arguments are not promoted to actual arguments: they are accessed with pointer arithmetic w.r.t. the stack pointer CSV.

Pipes:

  • llvm-pipe(module.bc.zstd)
    • drop-root
  • enforce-abi(cfg.yml.tar.gz, module.bc.zstd)
  • llvm-pipe(module.bc.zstd)
    • strip-debug-info-from-helpers
    • promote-csvs
    • mem2reg
    • inline-helpers
  • attach-debug-info-to-abi-enforced(cfg.yml.tar.gz, module.bc.zstd)
  • llvm-pipe(module.bc.zstd)
    • promote-csvs
    • remove-exceptional-functions

emit-cfg step#

Parent step: isolate

Artifact: This artifact is an archive containing one YAML file for each function. Each document contains information about the control-flow graph of each function.

hexdump step#

Parent step: isolate

Artifact: This artifact contains a hex dump of each segment in the input binary.

Pipes:

  • hex-dump(input, module.bc.zstd, cfg.yml.tar.gz, hex.dump)

render-svg-call-graph step#

Parent step: isolate

Artifact: This artifact is an SVG representing the call graph of the input program.

Pipes:

  • yield-call-graph(cross-relations.yml, call-graph.svg.yml)

render-svg-call-graph-slice step#

Parent step: isolate

Artifact: This artifact is an archive of SVG files. Each file represents a subset of the call graph considering only the functions called/calling, directly or indirectly, the given function.

Pipes:

  • yield-call-graph-slice(cfg.yml.tar.gz, cross-relations.yml, call-graph-slice.svg.tar.gz)

process-assembly step#

Parent step: isolate

Pipes:

  • process-assembly(input, cfg.yml.tar.gz, assembly-internal.yml.tar.gz)

disassemble step#

Parent step: process-assembly

Artifact: This artifact is an archive of PTML files. Each file represents the disassembly of the given function.

Pipes:

  • yield-assembly(assembly-internal.yml.tar.gz, assembly.ptml.tar.gz)

render-svg-cfg step#

Parent step: process-assembly

Artifact: This artifact is an archive of SVG files. Each file represents the CFG of the given function.

Pipes:

  • yield-cfg(assembly-internal.yml.tar.gz, cfg.svg.tar.gz)

recompile step#

Parent step: lift

Artifact: This artifact is translated version of the input binary. Specifically, it's an ELF executable for Linux x86-64 containing the root function plus the required runtime.

Pipes:

  • link-support(module.bc.zstd)
  • llvm-pipe(module.bc.zstd)
    • O2
  • llvm-pipe(module.bc.zstd)
    • drop-opaque-return-address
  • compile(module.bc.zstd, object.o)
  • link-for-translation(input, object.o, output)

recompile-isolated step#

Parent step: isolate

Artifact: This artifact is translated version of the input binary. Specifically, it's an ELF executable for Linux x86-64 containing the root function, all the isolated functions plus the required runtime.

Pipes:

  • llvm-pipe(module.bc.zstd)
    • invoke-isolated-functions
  • link-support(module.bc.zstd)
  • llvm-pipe(module.bc.zstd)
    • O2
  • llvm-pipe(module.bc.zstd)
    • drop-opaque-return-address
  • compile-isolated(module.bc.zstd, object.o)
  • link-for-translation(input, object.o, output)

remove-lifting-artifacts step#

Parent step: enforce-abi

Pipes:

  • llvm-pipe(module.bc.zstd)
    • dce
    • remove-lifting-artifacts
    • promote-init-csv-to-undef

promote-stack-pointer step#

Parent step: remove-lifting-artifacts

Pipes:

  • llvm-pipe(module.bc.zstd)
    • measure-stack-size-at-call-sites
    • promote-stack-pointer

early-optimize step#

Parent step: promote-stack-pointer

Pipes:

  • llvm-pipe(module.bc.zstd)
    • dce
    • remove-extractvalues
    • simplify-cfg-with-hoist-and-sink
    • dse
    • instcombine
    • remove-extractvalues
    • sroa
    • instsimplify
    • jump-threading
    • licm
    • unreachableblockelim
    • instcombine
    • remove-extractvalues
    • early-cse
    • simplify-cfg-with-hoist-and-sink
    • early-type-shrinking
    • type-shrinking
    • early-cse
    • instsimplify
    • gvn
    • instsimplify
    • dse
    • dce
    • drop-opaque-return-address

simplify-switch step#

Parent step: early-optimize

Artifact: This artifact contains an LLVM function for each function in the input program. The stack pointer has been promoted to a local variable and initialized with the result of an opaque function call.

Pipes:

  • simplify-switch(input, module.bc.zstd)

detect-stack-size step#

Parent step: simplify-switch

Pipes:

  • llvm-pipe(module.bc.zstd)
    • remove-stack-alignment
    • instrument-stack-accesses
    • instcombine
    • remove-extractvalues
    • loop-rotate
    • loop-simplify
    • compute-stack-accesses-bounds

Analyses:

  • detect-stack-size(module.bc.zstd): This analysis, for each function, identifies the size of the stack frame and the amount of stack used to pass arguments.

segregate-stack-accesses step#

Parent step: detect-stack-size

Pipes:

  • llvm-pipe(module.bc.zstd)
    • hoist-struct-phis
    • segregate-stack-accesses

late-optimize step#

Parent step: segregate-stack-accesses

Pipes:

  • llvm-pipe(module.bc.zstd)
    • cleanup-stack-size-markers
    • dce
    • sroa
    • instcombine
    • remove-extractvalues
    • sroa
    • simplify-cfg-with-hoist-and-sink
    • loop-rotate
    • loop-rewrite-with-canonical-induction-variable
    • simplify-cfg-with-hoist-and-sink
    • loop-simplify
    • instcombine
    • remove-extractvalues
    • early-cse
    • dce
    • strip-dead-prototypes
    • split-overflow-intrinsics
    • dce

make-segment-ref step#

Parent step: late-optimize

Artifact: This artifact contains an LLVM function for each function in the input program. The functions have an argument for each argument in the input prototype.

Unlike upstream artifacts, the arguments are not tied to the register containing them. So, if a function using the x86-64 SystemV ABI has two uint8_t arguments, they will appear as two distinct arguments, as opposed to being merged in a single argument representing rdi.

Additionally, this artifact correctly represent each stack argument in the function prototype.

Pipes:

  • make-segment-ref(input, module.bc.zstd)

Analyses:

  • analyze-data-layout(module.bc.zstd): This analysis inspects the memory accesses performed by the input program and detects the layout of data structures. The produced data structures are the result of merging the information obtained from each function interprocedurally.

canonicalize step#

Parent step: make-segment-ref

Pipes:

  • llvm-pipe(module.bc.zstd)
    • hoist-struct-phis
    • remove-llvmassume-calls
    • dce
    • remove-pointer-casts
    • make-model-gep
    • dce
    • twoscomplement-normalization
    • peephole-opt-for-decompilation
    • ternary-reduction
    • exit-ssa
    • make-local-variables
    • remove-load-store
    • fold-model-gep
    • dce
    • switch-to-statements
    • make-model-cast
    • implicit-model-cast
    • operatorprecedence-resolution
    • pretty-int-formatting
    • remove-broken-debug-information

decompile step#

Parent step: canonicalize

Artifact: This artifact is an archive of PTML file representing the C code of the program's functions.

Pipes:

  • helpers-to-header(module.bc.zstd, helpers.h)
  • model-to-header(input, types-and-globals.h)
  • decompile(module.bc.zstd, cfg.yml.tar.gz, decompiled.tar.gz)

Analyses:

  • import-from-c(): This analysis, given a snippet of C code representing an individual type, parses it and imports into the model, possibly replacing an existing type.

decompile-to-single-file step#

Parent step: decompile

Artifact: This artifact is a single PTML file representing the decompiled C code of the whole program, including the body of all of program's functions.

Pipes:

  • decompile-to-single-file(decompiled.tar.gz, decompiled.c)

emit-recompilable-archive step#

Parent step: canonicalize

Artifact: This artifact is an archive containing all the files necessary to recompile the decompiled C code of the input program. These files are not in PTML, they are plain C.

It contains:

  • functions.c: the decompile-to-single-file artifact;
  • types-and-globals.h: see the emit-model-header artifact;
  • helpers.h: see the emit-helpers-header artifact;
  • attributes.h: an helper header file defining a set of annotations used by the decompiled C source files;
  • primitive-types.h: a header defining all the primitive types.

Pipes:

  • decompile-to-directory(module.bc.zstd, cfg.yml.tar.gz, recompilable-archive.tar.gz)

emit-helpers-header step#

Parent step: canonicalize

Artifact: This artifact contains the declarations of all the helpers used the decompiled code.

Pipes:

  • helpers-to-header(module.bc.zstd, helpers.h)

emit-model-header step#

Parent step: initial

Artifact: This artifact contains all the declaration of types, functions and segments defined in the binary.

Pipes:

  • model-to-header(input, types-and-globals.h)

emit-type-definitions step#

Parent step: canonicalize

Artifact: This artifact is an archive of plain C headers. Each file contains the declaration of a type defined for this binary. This artifact is designed to be used as the initial input of the import-from-c analysis. In fact, this artifact is designed to be easily editable by the end-user; it's not designed to represent valid C code, unlike the emit-model-header artifact.

Pipes:

  • generate-model-type-definition(input, model-type-definitions.tar.gz)

cleanup-ir step#

Parent step: make-segment-ref

Artifact: This artifact contains one LLVM function for each function defined in this binary.

The output similar to the output of make-segment-ref but it's cleaned up from rev.ng-specific artifacts in order to be more easily consumed as standard LLVM IR.

This is an appropriate artifact on top of which write analyses, such as a taint analysis.

Pipes:

  • llvm-pipe(module.bc.zstd)
    • instcombine
    • cleanup-ir
    • dce

convert-to-mlir step#

Parent step: make-segment-ref

Artifact: This artifact is an MLIR module with one function for each function defined in the binary.

Pipes:

  • llvm-pipe(module.bc.zstd)
    • prepare-llvmir-for-mlir
  • import-llvm-to-mlir(module.bc.zstd, module.mlir)

import-clift-types step#

Parent step: convert-to-mlir

Artifact: A test artifact to import types into a Clift module.

Pipes:

  • import-clift-types(cfg.yml.tar.gz, module.mlir)

Analysis lists#