Tutorial : C - Scope And Extent

Scope and Extent

The scope of a name refers to the part of the program within which the name can be used. That is, it describes the visibility of an identifier within the program. The extent of a variable or function refers to its lifetime in terms of when memory is allocated to store it, and when that memory is released.

The rules of scope and extent affect the way functions and data interact, and are central to the design of C programs. This chapter examines the various storage classes that control these properties. The focus is on the way in which control of scope and extent facilitate the writing of modular programs, and particularly the implementation of multiple-file programs

 

Local Scope and Automatic Extent

A variable declared within a function has local scope by default.1 This means that it is local to the block in which it is defined, where a block is a code segment enclosed in braces {...}. Function arguments also have local scope. For example, in the following function

void afunction(int a, int b)
{
	double val;
	statements
	{
		int val2 = 5;
		statements
	} /* val2 goes out-of-scope here */
	statements
} /* a, b, val go out-of-scope here */

 

the variables a, b, val, and val2 all have local scope. The visibility of a local variable is the block in which it is defined. Thus local variables with the same name defined in different blocks or functions are unrelated.

A local variable has automatic extent, which means that its lifetime is from the point it is defined until the end of its block. At the point it is defined, memory is allocated for it on the “stack”; this memory is managed automatically by the compiler. If the variable is not explicitly initialised, then it will hold an undefined value (e.g., in the above, val has an arbitrary value, while val2 is initialised to 5). It is often good practice to initialise a local variable when it is declared. At the end of the block, the variable is destroyed and the memory recovered; the variable is said to go “out-of-scope”.

 

External Scope and Static Extent

External variables are defined outside of any function, and are thus potentially available to many functions. Functions themselves are always external, because C does not allow functions to be defined inside other functions

A variable defined outside of any function is an external variable, by default. External variables and functions are visible over the entire (possibly multi-file) program; they have external scope (also called program scope). This means that a function may be called from any function in the program, and an external variable2 may be accessed or changed by any function. However, it is necessary to first declare a variable or function in each file before it is used.

The extern keyword is used to declare the existence of an external variable in one file when it is defined in another. Function prototypes may also be preceded by extern, but this is not essential as they are external by default. It is important to note the distinction between declaration and definition. A declaration refers to the specification of a variable or function, in particular its name and type. A definition is also a specification, but additionally involves the allocation of storage. A variable or function may be declared multiple times in a program (provided the declarations are non-conflicting) but may be defined only once. An example of external variables and functions shared across two source-files is shown below.

File one.c:
int globalvar; /* external variable definition */
extern double myvariable; /* external variable declaration (defined elsewhere) */
void myfunc(int idx); /* external function prototype (declaration) */
File two.c:
double myvariable = 3.2; /* external variable definition */
void myfunc(int idx)
/* Function definition */
{
	extern int globalvar; /* external variable declaration */
	...
}

 

Note. Each source file (i.e., a file with filename suffixed by .c) is compiled to form an object module. These are later combined by the linker to form a complete executable program. The identifiers of external variables and functions are visible to the linker, allowing them to be shared across separate object modules, and are said to have “external linkage”. The identifiers of non-external variables and functions are not visible to the linker, and so are private to a single source-file.

External variables and functions have static extent. This means that they are allocated memory and exist before the program starts—before the execution of main()—and continue to exist until the program terminates. External variables that are not initialised explicitly are given the default value of zero; (this is different to local variables, which have arbitrary initial values by default). The value of an external variable is retained from one function call to the next.

External variables are sometimes used as a convenient mechanism for avoiding long argument lists. They provide an alternative to function arguments and return values for communicating data between functions. They may also permit more natural semantics if two functions operate on the same data, but neither calls the other.

The following shows a situation where global variables might be convenient. Suppose you wish to get input from the keyboard one character at a time; the standard function getchar() provides this service. However, suppose you read in some characters and decide you are not yet ready to process them, and wish to push them back onto the input stream for a later time. This cannot be done directly, but can be simulated by storing the pushed-back characters in a buffer, and writing two functions that get and unget the characters, respectively.

#define BUFSIZE 100
char buffer[BUFSIZE]; /* buffer for pushed-back characters */
int bufidx = 0; /* buffer index */
int getch(void)
/* Get a character from stdin. */
{
	if (bufidx > 0) /* get pushed-back data first */
		return buffer[--bufidx];
	return getchar();
}
int ungetch(int c)
/* Simulate pushing a character back onto input stream */
{
	if (bufidx >= BUFSIZE) return -1; /* error: buffer full */
	buffer[bufidx++] = c;
	return 0;
}

 

The problem with external variables is that they tend to expose function internals, which can lead to strong dependencies between functions. Two functions are said to be “tightly coupled” if changes made to one function forces changes on the other. This style of code violates the modular design principle of decoupled functions accessible only via well-defined interfaces. A further problem with external variables is that, since their scope is over the entire multi-file program, it is easy to write code where the same identifier is used to define two different external variables. Overuse of external variables is said to “clutter the global name-space”, and naming conflicts can arise affecting both functions and variables, as shown in the following example.

File one.c:
extern double myvariable;
float myname;
void myfunc(int idx);
File two.c:
extern int myvariable;
char myname(int c);
int myfunc(int idx);

 

As a rule, external variables are easy to overuse and should be avoided where possible.

 

The static Storage Class Specifier

The keyword static is a storage class specifier, but it is perhaps better viewed as a storage class qualifier as it imparts different properties depending on whether an object is a local variable, an external variable, or a function. Local variables keep their local visibility but gain static extent. They are initialised to zero by default and retain their values between function calls.

int increment(void)
{
	static int local_static; /* local scope, static extent, initial value 0 */
	return local_static++; /* 1st call will return: 0, 2nd: 1, 3rd: 2, ... */
}

 

External variables and functions that are qualified as static obtain file scope, which means their visibility is limited to a single source file. Their names are not exported to the linker and are not visible to object modules of other source files.3 This prevents unwanted access by code in other parts of the program and reduces the risk of naming conflicts. For example, the following declarations are unrelated and non-conflicting.

File one.c:
static double myvariable;
static void myfunc(int idx);
File two.c:
static int myvariable; /* no conflict with file one.c */
static int myfunc(int idx); /* no conflict */

 

The example of getch() and ungetch() in the previous section is one situation where static variables would constitute better design. The two functions would remain extern as they might be called from functions in other files, but the variables buffer and bufidx only require file-scope. Thus, static is to preferred over extern where possible as it permits more modular design. As a rule, and where possible, static functions are preferred over external functions, static variables are preferred over external variables, and local variables are preferred over static variables.

Aside. Many programs today operate using multiple threads of control, meaning that various parts of the program operate concurrently, or over a timeslice so as to appear concurrent. Extreme caution is required if using functions that rely on static or external variables in such programs. Temporal dependencies on the value of external variables may be violated as the different threads are switched in and out. In general, variables with static extent are best avoided in multi-threaded programs (unless they are explicitly synchronised).

 

Scope Resolution and Name Hiding

It is possible for a program to have two variables of the same name with overlapping scope such that they are potentially visible from the same place. In such situations, one variable will hide the other. C specifies that the variable with more restricted scope will be visible, and the other hidden. In other words, the variable that is “more local” has dominant visibility. This situation is demonstrated in the following program.

1 #include <stdio.h>
2
3 int modify(int, int);
4
5 int x=1, y=3, z=5; /* external variables */
6
7 int main(void)
8 {
9 int x, z=0; /* local scope 1 */
10
11 x = y + 1;
12 while (x++ < 10) {
13 int x, y = 0; /* local scope 2 */
14 x = z % 5;
15 printf("In loop \tx= %d\ty= %d\tz= %d\n", x, y++, z++);
16 }
17
18 printf("Before modify()\tx= %d\ty= %d\tz= %d\n", x, y, z);
19 z = modify(x, y);
20 printf("After modify()\tx= %d\ty= %d\tz= %d\n", x, y, z);
21 }
22
23 int modify(int a, int b)
24 {
25 z += a + b;
26 return z;
27 }

 

This program produces the following output.
In loop x= 0 y= 0 z= 0
In loop x= 1 y= 0 z= 1
In loop x= 2 y= 0 z= 2
In loop x= 3 y= 0 z= 3
In loop x= 4 y= 0 z= 4
In loop x= 0 y= 0 z= 5
Before modify() x= 11 y= 3 z= 6
After modify() x= 11 y= 3 z= 19

This output is a result of the C name-hiding rules. A brief discussion of the above example may help clarify their properties.

11 Identifier x refers to x-local-scope-1, and y to y-global-scope.
12 This line represents one of the more difficult instances of name-hiding: does x refer to the inner local block or the outer one? In fact, x refers to x-local-scope-1, not to x-local-scope-2, which is declared within the compound statement following the while, and is not in scope for the while conditional itself.
14–15 Identifiers x and y refer to the inner local-scope-2, and z refers to the outer local-scope-1.
18–20 Identifiers x and z refer to the outer local-scope-1, and y refers to the global-scope.
23–27 Identifiers a and b are local to the function block, while z is a global variable.

The problem of name hiding doesn’t end there. Because C’s scoping rules specify that the scope of an identifier begins at its point of declaration rather than the top of the block in which it is defined, a further non-intuitive name-hiding situation can occur. For example, consider the following code segment.

float i = 5.f;
{
	int j = i;
	int i = 0;
	...
}

 

Intuitively, one might think that j would be initialised by the inner-most i, the int. However, it turns out that int i does not exist until it is declared below j, so that j is initialised with the only i in scope at its declaration: float i = 5.f.

Name-hiding issues are easily avoided by defining appropriate variable names. It is bad practice to rely on the scope-resolution rules, and name-hiding leads to confusing and error-prone code. Avoid same-name identifiers for functions or variables that might share the same scope.

 

Summary of Scope and Extent Rules

Functions are extern by default, as are variables defined outside of any function. External functions and variables have external or program scope, which means they are visible across the entire (possible multi-file) program. Functions and external variables that are declared static have file scope, which means their visibility is limited to the source file in which they are defined. Variables defined within a function or block have local scope, and are not visible outside of their enclosing block even if declared static.

Functions and external variables have static extent, meaning that they are created before program execution and exist until the program terminates. Local variables declared static also have static extent. Non-static local variables have local or automatic extent and are destroyed when they go outof-scope. Variables with static extent are initialised to zero by default, but variables with automatic
extent are not given a default initial value.

In general, a variable can be defined with a storage class extern, static, or auto. The general form of a variable definition is as follows,

<storage class> <type qualifier> <type> <identifier> = <value> ;

 

where the assignment to an initial value is optional in general, but mandatory for variables qualified by const. For example,

static const double LightSpeed = 2.9979e8; /* m/s */

 

Additional scope and extent identities. It is possible to define variables with dynamic extent such that their lifetime is managed explicitly by the programmer. Dynamic memory management is performed via the standard library functions malloc() and free().

Some other instances of scope are mentioned here for identifiers that are not functions or variables. Preprocessor macros (see Chapter 10) have file scope from the point they are defined to the bottom of the file (unless undefined by #undef). Named labels, such as used by goto, have function scope.

 

Header Files

Identifiers must the declared in a source file before they can be used. Rather than typing declarations explicitly in each source file that uses them, it is generally convenient to collect common declarations in header files, and include the relevant headers in the source files as required. Inclusion of header files is performed by the C preprocessor as specified by the #include directive.

The #include preprocessor command causes the entire contents of a specified source text file to be processed as if those contents had appeared in place of the #include command.

Header files are used to store declarations shared over multiple source files including function prototypes, external variables, constants, macros, and user-defined data types. Collecting these declarations in header files avoids code duplication, and so avoids possible typographical errors and declaration mismatches. It also makes changes easier to enact as they only need to be made in one place.

Header file names are suffixed with .h by convention. The standard library headers are included using angle brackets to enclose the filename as follows.

#include <filename.h>

 

Angle brackets indicate that the header file is located in some “standard” place according to the compiler-implementation search rules. Usually this means that the header is in the compiler search path. Header files from other libraries may also be included using the angle bracket syntax if they too reside on the compiler search path. A second form of include syntax uses double quotes,

#include "filename.h"

 

which indicate that the header file is located in some “local” place, usually the current directory. The search for files included using the double-quote syntax begins in the local places and then looks in the standard places. The general intent of the " " form is to denote headers written by the application programmer, while the < > form indicates (usually standard) library headers.

 

Modular Programming: Multiple File Programs

The functions and external variables that make up a C program need not all be compiled at the same time; the source text of the program may be kept in several files, and previously compiled routines may be loaded from libraries

Large-scale C programs are organised so that related functions and variables are grouped into separate source files. Grouping code by source file is central to C’s compilation model, which compiles each file separately to produce individual object modules, and these are later linked to form the complete program. Separate compilation, in conjunction with the C scoping rules, gives rise to the paradigm of modular programming.

This code organisation strategy works as follows. Each source file is a module containing related functions and variables. The declarations of functions and variables (and constants and data-types) to be shared with other modules are stored in an associated header file; this is called the public interface. Access to the module from other modules is restricted to the public interface.

Functions defined in a module that are called only by other functions within that module are declared static. These functions comprise the private interface—functions visible only from within the module, as part of the module’s internal implementation. Similarly, external variables used only within the module are declared static. These private interface declarations are not added to the header file, but are declared at the top of the source file.

The advantages of modular programming are as follows.

• Groups of related functions and variables are collected together. This leads to more intuitive use of a library of code than just a disorganised set of functions. Modules represent a higher level of abstraction than functions.
• Implementation details are hidden behind a public interface. This is useful for shielding users from complex algorithms or from non-portable code. Changes to the implementation can later be made without affecting client code (e.g., using a different algorithm, or porting platformspecific code to another platform).
• Users of the module are prevented from accessing functions or variables that are designed only for private implementation (i.e., only for use internal to the module). This minimises the chances incorrect use.
• Modules are decoupled from the rest of the program, allowing them to be built, tested, and debugged in isolation.
• Modules facilitate team program development where individuals can each work on different modules that make up the program.

It is difficult to state definitively the requirements for good modular design, but there are several rules-of-thumb that are generally applicable. First, it is desirable to minimise dependencies between modules. This involves, for example, minimising the use of external variables in the public interface, which tend to expose module implementation details and clutter the global namespace. Second, the public interface should be minimal; it should only contain functions required to use the module, not functions that are just part of the internal implementation. Similarly, variables, constants and data types that are not meant to be shared should not be part of the public interface, and should be declared in the source file, not the header file. Finally, it is good practice to restrict scope as much as possible, such that local variables are preferred over static variables which, in turn, are preferred over external variables, and static functions are preferred over external functions.