Tutorial : C - The C Preprocessor

The C Preprocessor

The C preprocessor is a simple macro processor that conceptually processes the source text of a C program before the compiler proper reads the source program...

The preprocessor is controlled by special preprocessor command lines, which are lines of the source file beginning with the character #. Lines that do not contain preprocessor commands are called lines of the program text...

The preprocessor removes all preprocessor command lines from the source file and makes additional transformations on the source file as directed by the commands, such as expanding macro calls that occur within the source program text. The resulting preprocessed source text must then be a valid C program.

The syntax of preprocessor commands is completely independent of (though in some ways similar to) the syntax of the rest of the C language

The C preprocessor performs a variety of text replacement operations on the source text before it is parsed by the C compiler. These operations include replacing symbolic names with constants, expansion of macros, and inclusion of header-files. A preprocessor directive, such as #define, has file scope, meaning that defined names are visible from their point of declaration until the end of the source file in which they appear.

The operations of the preprocessor are not subject to the syntactical rules of the C language, and so should be used sparingly and with care. In particular, the text substitution of macro expansion changes the lexical structure of the source code and may produce unexpected behaviour if certain pitfalls are not avoided.

 

 

File Inclusion

One of the most common uses of the preprocessor is for the inclusion of files into the source text. These are usually header files, but may be any text file. The file include command is

#include

and it is followed by a file name enclosed in either <> or "". The result of this command after preprocessing is as if the command were replaced by the text of the specified file. Further details with regard to the #include directive.

 

 

Symbolic Constants

The preprocessor may be used to define symbolic constants via the #define command. The general form of a symbolic constant definition is as follows

#define name replacement text

whereby name becomes a placeholder for the character sequence replacement text. Given this command, the preprocessor will replace all subsequent occurrences of the token name in the source file by the sequence replacement text. Some example symbolic constants are

#define BUFFERSIZE 	256
#define MIN_VALUE 	-32
#define PI 			3.14159

A name created by #define can be undefined using the directive #undef. This name may then be used to represent a different sequence of replacement text.

Style Note. Symbolic constants are usually given uppercase names to differentiate them from variable and function names. The #define command is also used to define macros, and these names are also usually uppercase.

C provides several different mechanisms for defining names for numerical constants. The preprocessor directive #define can create names for constants of any type; the type qualifier const can define constant variables of any type; and the keyword enum can define constants of integer type. Each of these is better suited to certain situations than the others. The #define command is most powerful, and can be used in any situation where the other two might be used, but it is also the most dangerous as the preprocessor does not respect C syntax rules. Variables qualified by const are generally preferred but, as these variables are not considered compile-time constant, they have one significant limitation; namely, a variable of type const int cannot be used to define the size of an array.

#define ARRAYSIZE 10
const int ArraySize = 10;
double array1[ARRAYSIZE]; /* Valid. */
double array2[ArraySize]; /* Invalid. */

An enumeration constant does not suffer this limitation and, between them, const and enum can satisfy all symbolic constant operations for which a #define might be used. As const and enum are part of the C language proper, and abide by its rules, they are to be preferred over #define in general. For example,

#define PI 3.14159
#define ARRAYSIZE 10
const double Pi = 3.14159; /* Preferred */
enum { ARRAYSIZE=10 }; /* Preferred */

 

 

Macros

Symbolic constants defined by the #define command are a simple form of macro: a symbolic name that is expanded into an expression via text substitution. The C preprocessor provides a more sophisticated type of macro definition by allowing the macro name to be followed by a set of
arguments enclosed in parentheses. For example,

#define MAX(x,y) ((x)>(y) ? (x) : (y))

Although it looks like a function call, a macro behaves in a quite different manner. The preprocessor replaces the macro name with the defined replacement text, and substitutes the argument variables in the specified locations. Thus, MAX might be used as follows

int a=4, b= -7, c;
c = MAX(a,b);

and the preprocessor will expand the code to

int a=4, b= -7, c;
c = ((a)>(b) ? (a) : (b));

Note. In a macro definition, the parentheses immediately following the macro name must be directly adjacent to the name without whitespace. For example, the definition

#define MAX (x,y) ((x)>(y) ? (x) : (y))

is not equivalent to the previous macro definition, but will simply replace all occurrences of the name

MAX with the text: (x,y) ((x)>(y) ? (x) : (y)).

Macros are typically used for one of two reasons. The first is speed. Macros can perform functionlike operations without the overhead of a function call because the code is expanded inline. With modern fast machines, using macros for speed is less important than it used to be. The second use of macros is to implement a kind of generic function. That is, to define a function-like expression that bypasses the C type constraints, and can be passed parameters of any type. For example, the macro MAX will work correctly if a, b, and c were type double, or any other type, where as an equivalent function

int max(int x, int y)
{
	return x > y ? x : y;
}

will only accept integer parameters.

 

 

Macro Basics

Consider the following simple macros.

#define SQR(x) ((x)*(x))
#define SGN(x) (((x)<0) ? -1 : 1)
#define ABS(x) (((x)<0) ? -(x) : (x))
#define ISDIGIT(x) ((x) >= ’0’ && (x) <= ’9’)
#define NELEMS(array) (sizeof(array) / sizeof(array[0]))

SQR calculates the square of its argument, SGN calculates the sign of its argument, ABS converts its argument to an absolute value, ISDIGIT equals 1 if the argument value is between the character code for 0 and the character code for 9, and NELEMS computes the number of elements in an array. Macros should be used with care. The preprocessor is a powerful but rather blunt instrument, and it is easy to use macros incorrectly. Macros are subject to three main dangers. The first is that the passed arguments may have surprising precedence after macro expansion. For example, if SQR were defined as

#define SQR(x) x * x

then the following expression

int a = 7;
b = SQR(a+1);

will expand to

b = a+1 * a+1; /* b equals 7 + 1*7 + 1 = 15, not the expected 64 */

For this reason, macro arguments should be heavily parenthesised as with the set of examples above. The second danger is that arguments with side-effects may be evaluated multiple times after macro expansion. For example,

b = ABS(a++);

will expand to

b = (((a++)<0) ? -(a++) : (a++));

so that a is incremented twice, which is not the expected behaviour. To avoid these sort of problems, it is good practice to never use expressions with side-effects2 as macro arguments. The final danger is that the ability to bypass the C type-checking system is a double-edged sword. It permits greater flexibility, but also prevents the compiler from catching some type-mismatch bugs. In general, functions are to be preferred over macros as they are safer. However, with a little care, macros can be used without significant trouble, when required.

 

 

More Macros

There are many neat and ingenious macros to be found in existing source code, and there is much to be learned from other peoples invention. The following two examples are simple and clever.

#define CLAMP(val,low,high) ((val)<(low) ? (low) : (val) > (high) ? (high) : (val))
#define ROUND(val) ((val)>0 ? (int)((val)+0.5) : -(int)(0.5-(val)))

The first, CLAMP, uses two nested ?: expressions to bound a value val so that if it is less-than low it becomes equal to low, and if it is greater-than high it becomes equal to high, otherwise it remains unchanged. The second macro, ROUND, rounds a floating-point value to the nearest integer. It performs this operation using the truncation properties of casting a double to an int, but contains one clever subtlety. The truncation by casting to int is straightforward if the value is positive, but machine dependent if the value is negative (see Section 2.7). ROUND gets around this problem by subtracting the negative value from 0.5, thus making a positive value, and then negating the answer.

Another clever macro trick is used to make macros behave more like functions. Consider the following macro that swaps two variables (using an additional temporary value).

#define SWAP(x,y,tmp) { tmp=x; x=y; y=tmp; }

This operation might be used as in the next example

int a=4, b=-1, temp;
SWAP(a, b, temp);

However, this macro will not behave in a function-like manner if used in an if-statement

if (a > b) SWAP(a, b, temp); /* Won’t compile */
else a = b;

as this code will be expanded to incorrect C syntax.

if (a > b) { temp=a; a=b; b=temp; }
;
else a = b;

A solution to this problem is to wrap the body of the macro in a do-while statement, which will consume the offending semicolon.

#define SWAP(x,y,tmp) do { tmp=x; x=y; y=tmp; } while (0)

An alternative solution is to wrap the macro in an if-else statement.

#define SWAP(x,y,tmp) if(1) { tmp=x; x=y; y=tmp; } else

A variant of SWAP does away with defining an explicit temporary variable by simply passing the variable type to the macro.

#define SWAP(x,y,type) do { type tmp=x; x=y; y=tmp; } while (0)

This might be used as

SWAP(a, b, double);

Finally, a very tricky bitwise technique allows us to perform the swap operation without any temporary variable at all. (However, this variant is only valid if x and y are integer variables of the same type.)

#define SWAP(x,y) do { x^=y; y^=x; x^=y; } while (0)

 

 

More Complex Macros

Normally a macro definition extends to the end of the line beginning with the command #define. However, long macros can be split over several lines by placing a \ at the end of the line to be continued. For example,

#define ERROR(condition, message) \
if (condition) printf(message)

A more interesting example, adapted from [KP99, page 240], performs a timing loop on a section of code using the standard library function clock(), which returns processor time in milliseconds.

#define TIMELOOP(CODE) { \
t0 = clock(); \
for (i = 0; i<n; ++i) { CODE; } \
	printf("%7d ", clock() - t0); \
}

This macro might be used as follows.

TIMELOOP(y = sin(x));

It is possible to convert a token to a string constant by writing a macro that uses the # symbol in the manner of the following example.

#define PRINT_DEBUG(expr) printf(#expr " = %g\n", expr)

This macro when invoked will print the expression and its result, as the first instance of the expression is converted to a string constant by the preceding #. For example,

PRINT_DEBUG(x/y);

is expanded to

printf("x/y" " = %g\n", x/y);

Another rather obscure preprocessor operator is ##, which provides a way to concatenate two tokens. For example,

#define TEMP(i) temp ## i

might be used to create different temporary variable names

TEMP(1) = TEMP(2);

which, after preprocessing, becomes

temp1 = temp2;

The preprocessor defines a number of predefined macros. These are __LINE__, __FILE__, __DATE__, __TIME__, __STDC__, and __STDC_VERSION__. Notice that each of these names is prefixed and suffixed by double underscore characters. Determining the meaning of each of these macros is
left as an exercise to the reader. To put the various aspects of this section together, consider the following macro,

#define PRINT_DEBUG(expr, type) \
printf("File: " __FILE__ \
"\nLine: %d\nExpr: " #expr \
" = %" type##TYPE "\n", __LINE__, (expr))

which, given a set of definitions for formatting different types, e.g.,

#define intTYPE "d"
#define doubleTYPE "f"

can be used as

PRINT_DEBUG(x/y, double);

which will print the source-file name, the statement line number, the expression and its value. Indeed a clever and useful debugging macro.

 

 

Conditional Compilation

The C preprocessor provides a series of directives for conditional compilation: #if, #elif, #else, #ifdef, #ifndef, and #endif. These commands cause the preprocessor to include or exclude sections of the source code from compilation depending on certain conditions. Conditional compilation is used for three main purposes: to optionally include debug code, to enclose non-portable code, and to guard against multiple inclusion of header files.

It is common to write short sections of code exclusively for debugging purposes; this is often called “instrumentation”. This code should be present in the program during a debug build but removed for the final release build. However, it is a good idea to not actually delete the code, but to optionally include it using a preprocessor condition so that it is still available if further debugging is required. Typically, during debug builds, one defines a symbolic constant DEBUG so that debug code is included. For example,

#ifdef DEBUG
	printf("Pointer %#x points to value %f", pd, *pd);
#endif

When writing a program that contains non-portable code, it is good practice to isolate the nonportable parts in a separate source file. The different code sections for different machines are then enclosed in preprocessor conditions so that only the code for the specified machine is compiled. For example,

#ifdef __WIN32__ /* Code specific to Windows. */
return WaitForSingleObject(handle, 0) == WAIT_OBJECT_0;
#elif defined(__QNX__) || defined(__linux__) /* Code specific to QNX or Linux. */
if(flock(fd, LOCK_EX | LOCK_NB) == -1) return 0;
else return 1;
#endif

A header file should only be included once in any given source file (although they may appear in any number of different source files in the program). Otherwise certain symbols might obtain multiple definitions, which would result in a compilation error. This problem can occur if some header files include other header files, such that several headers are dependent on a common header file. To prevent the problem of multiple inclusion, the following preprocessor idiom is applied. Consider a header file aheader.h, which begins and ends with the preprocessor commands below.

#ifndef A_HEADER_H_
#define A_HEADER_H_
/* Contents of header file is contained here. */
#endif

By prefixing the file with #ifndef A_HEADER_H_, the header is included the first time (presuming A_HEADER_H_ is not defined previously), but A_HEADER_H_ is subsequently defined in the next line. This prevents any subsequent inclusion of the header for a given source-file. This idiom is known as a “header guard”.