The Compilation Process
The following is a brief explanation of how the source code of a program is converted to an executable binary image.1 Keep in mind that this is a very simplistic exposition of a rather complex process. There are three key programs that convert the code from text to executable: the compiler, the assembler and the linker.
First the text file containing the C-program is processed by the compiler front-end. This consists of a preprocessor, a lexical analyser, a parser, and (optionally) an optimiser.
• The preprocessor performs a number of text-conversion and text-replacement tasks. It includes information from header-files, replaces symbolic constants, and expands macros.
• The lexical analyser reads the preprocessed file, which is still a string of unprocessed characters, and interprets the characters as tokens (such as keywords, operators, variable names, etc).
• The parser takes the string of tokens, and orders them logically into cohesive groups called expressions. This ordering forms a tree-like structure, so the output of the parser is often called expression trees.
• The optimiser is an optional compilation stage that reorders expressions (and maybe substitutes equivalent expressions) to produce faster and/or smaller code. It may also allocate some variables to registers for faster access. Further optimisation may take place after the code-generation phase below.
The next step in compilation is code generation (also called the compiler back-end), after which the code is processed by an assembler and a linker to produce the executable program.
• The compiler back-end converts the expression trees to assembler code. This code is low-level machine dependent instructions.
• The assembler translates the assembler code to object code.
• The linker merges object code produced from all the source files composing the program, along with code from any libraries that might be included in the program. The result is a binary “executable image” of the program.