Tutorial : C - Structures And Unions

Structures and Unions

A structure is a collection of one or more variables, possibly of different types, grouped together under a single name for convenient handling. ... Structures help to organise complicated data, particularly in large programs, because they permit a group of related variables to be treated as a unit instead of as separate entities.

The C language provides means to create user-defined types called structures. These types are aggregates of basic types and other user-defined types, and permit a logical grouping of related data. This chapter discusses structures, their operations and implications, and the associated topics of type-definitions and unions.

 

 

Structures

A structure is declared using the keyword struct, and the internal organisation of the structure is defined by a set of variables enclosed in braces. Consider the following example, which declares a structure for representing two-dimensional points.

struct Point {
	int x;
	int y;
};

Style Note. By convention, structures should always be named with an uppercase first letter. This distinguishes them from variables and functions (lowercase first letter), and symbolic constants (all uppercase).

The variables x and y are called members of the structure named Point. Variables of type Point may be defined as a list of identifiers at the end of the struct definition,

struct Point {
	int x;
	int y;
} p1, p2, p3; /* Define three variables of type Point. */

or as subsequent definitions using the tag “struct Point”.

struct Point p1, p2, p3; /* Define three variables of type Point. */

When a structure is defined, its members may be initialised using brace notation as follows.

struct Point topleft = { 320, 0 };

Here the member x is initialised to 320 and the member y to 0. In general, the values within the initialiser list correspond to the structure member variables in order of their declaration. Individual members of a struct may be accessed via the member operator “.”. For example,

struct Point topleft;
topleft.x = 320;
topleft.y = 0;

is an alternative way to initialise the variable topleft. Thus, we might compute the distance between two points pt1 and pt2, using the standard library function sqrt(), as shown below.

struct Point delta = { pt1.x - pt2.x, pt1.y - pt2.y };
double distance = sqrt((double)delta.x * delta.x + (double)delta.y * delta.y);

Structures can be nested, such that the definition of one structure type may be composed, in part, of another structure type. For example, a rectangle might be described by its top-left and bottom-right

corners as follows.
	struct Rectangle {
	struct Point topleft;
	struct Point bottomright;
};

To access the lowest-level members of a variable of type Rectangle, therefore, requires two instances of the member operator.

struct Rectangle rect;
rect.topleft.x = 50;

 

 

Operations on Structures

The operations permitted on structures are a subset of the operations permitted on basic types (such as int, double, etc). Structures may be copied or assigned, but it is not possible to directly compare two structures.

struct Point p1 = { 0, 0 };
struct Point p2;
p2 = p1; /* Valid. structs may be assigned. */
if (p1 == p2) /* Invalid. structs may not be compared. */
	printf("Points are equal\n");
if (p1.x == p2.x && p1.y == p2.y) /* Valid. May compare basic types. */
	printf("Points are equal\n");

A structure may be passed to a function and may be returned by a function. struct Point point_difference(struct Point p1, struct Point p2)

/* Return the delta (dx, dy) of p2 with respect to p1. */
{
	p2.x -= p1.x;
	p2.y -= p1.y;
	return p2;
}

As with any other variable, structures are passed by value. Thus, in the above example, p1 and p2 are copies of the passed arguments, and the changes to p2 within the function do not affect the value of the associated variable in the calling function.

struct Point a = {5,10}, b = {20,30}, c;
c = point_difference(a, b); /* c = {15,20}, b is unchanged. */

Passing structures by value can be inefficient if the structure is large, and it is generally more efficient to pass a pointer to a struct rather than making a copy. Defining structure pointers and obtaining the address of a struct variable is the same as for basic types.

struct Point pt = { 50, 50 };
struct Point *pp;
pp = &pt;
(*pp).x = 100; /* pt.x is now 100. */

Note, the parentheses about (*pp).x are necessary to enforce the correct order-of-evaluation, so that the pointer-to-struct is dereferenced before attempting to access member x. To avoid this rather complicated dereferencing syntax, an alternative notation is provided to access structure members via a pointer. The -> operator permits the expression (*pp).x to be rewritten more simply as pp->x. As a further example, given the variable definitions

struct Rectangle rect, *pr = ▭

the following statements are equivalent.

rect.topleft.x = 50;
(*pr).topleft.x = 50;
pr->topleft.x = 50;

 

 

Arrays of Structures

As for basic types, it is possible to create an array of structures.

struct Point pa[10];

The elements of the array may be initialised by a brace-enclosed initialiser list and, when doing so, the size of the array need not be specified. The example below defines an array of four elements.

struct Point pa[] = {
	{0, 0}, {0, 240}, {320, 240}, {320, 0}
};

The size of a structure type in bytes may be computed using the sizeof operator, and the number of elements in the above array can be found using the idiom

int nelems = sizeof(pa) / sizeof(pa[0]);

As for basic types, pointer arithmetic for structure types is managed by the compiler, so that the operations

struct Point *pp = pa;
for (; pp != pa + sizeof(pa) / sizeof(pa[0]); ++pp)
printf("%d %d\n", pp->x, pp->y);

will effectively print the (x,y) values of each element in the array. The compiler knows the size of the structure type and determines the appropriate address offsets accordingly.

Important. The size of a structure might not equal the sum of its constituent parts. For example, if we assume a char is one byte and an int is four bytes, the following structure type

struct Stype {
	char c;
	int i;
};

might not have size five bytes. Most real machines require objects to satisfy certain alignment restrictions (e.g., integers must be located at even addresses), and the compiler will pad the structure with unnamed “holes” to ensure each member is properly aligned. Thus, the above example might have size six or eight bytes rather than five bytes. The sizeof operator returns the correct value.

 

 

Self-Referential Structures
A structure definition may not contain an object of its own type,

struct Node {
	int item;
	struct Node next; /* Invalid. Cannot define an object of an incomplete type. */
}

but it may refer to a pointer of its own type.

struct Node {
	int item;
	struct Node *next; /* Valid. May define a pointer to an incomplete type. */
}

The ability of a structure to refer to incomplete types (including itself) via a pointer is a vital property for constructing a number of important data-structures. These include linked-lists, binary trees, graphs, hash tables, and many more.

The following example demonstrates a simple linked-list implementation. Linked-lists come in two main varieties: singly-linked and doubly-linked. The former consists of a series of nodes containing a value and a pointer to another node. Each node is defined as follows.

struct List {
	int item;
	struct List *next;
};

In this example, the value contained by the node is an integer variable, but it could be a variable of any type. The set of nodes are connected such that the next pointer for each node points to the address of the next node in the list. The end of the list is marked by a node whose next pointer is NULL.

To traverse a singly linked-list, it is necessary to hold a pointer to the first node in the list; this pointer refers to the head of the list. The subsequent nodes may be accessed by iterating through each node in turn. For example, the code segment below shows how to reach the end node.

struct List *node = head;
while (node->next != NULL)
node = node->next;

Linked-lists are useful for their ability to be grown and shrunk easily without issues of memory reallocation and data copying. It is straightforward to add and delete elements, even to splice elements into the middle to the list. The function below demonstrates a list-growing implementation that adds a new node to the end of the list. The first argument may be a pointer to the head of the list, or a pointer to any node midway along the list, and the function will find the end and attach a new node to it. If the function is passed NULL, it creates a new head node and returns a pointer to it.

1 struct List *insert back(struct List *node, int item)
2 /* Add new item to the end of the list. Return pointer to the new List node. */
3 {
4 	/* Allocate memory for new node. */
5 	struct List *newnode = (struct List *) malloc(sizeof (struct List));
6 	if (newnode == NULL)
7 		return NULL; /* allocation failed */
8
9 	newnode−>item = item;
10 	newnode−>next = NULL;
11
12 	/* If list is not empty, find end of list and attach new node. */
13 	if (node) {
14 		while (node−>next)
15 		node = node−>next;
16 		node−>next = newnode;
17 	}
18
19 	return newnode;
20 }

The following function inserts a node at a specified location in the list. A pointer to the node one before the insertion is passed to the function, and its next pointer is used to splice the new node into the next position in the list.

1 struct List *insert after(struct List *node, int item)
2 /* Add new item to the next position in the list after node. */
3 {
4 	/* Allocate memory for new node. */
5 	struct List *newnode = (struct List *) malloc(sizeof (struct List));
6 	if (newnode == NULL)
7 		return NULL; /* allocation failed */
8
9 	/* If list is not empty, splice new node into list. */
10 	if (node) {
11 		newnode−>next = node−>next;
12 		node−>next = newnode;
13 	}
14 	else newnode−>next = NULL;
15
16 	newnode−>item = item;
17 	return newnode;
18 }

A doubly linked-list is similar to a singly linked-list except that each node also contains a pointer to the previous node in the list as follows.

struct List {
	int item;
	struct List *next;
	struct List *prev;
};

The enables the list to be constructed, and permits traversal up the list as well as down. In particular, doubly linked-lists make the deletion of elements simpler and more efficient than is possible with a singly linked-list.

 

 

Typedefs

In effect typedef is like #define, except that since it is interpreted by the compiler, it can cope with textual substitutions beyond the capabilities of the preprocessor. The keyword typedef provides a means for creating new data type names. It does not create a new type, but merely adds a new name for some existing type. For example,

typedef int Length;

makes the name Length a synonym for int. The type Length may then be used wherever the type

int might be used.
Length len, maxlen;
Length lengths[50];

In relation to structures, the ability to define type synonyms permits a significant improvement in structure declaration syntax. For example, the following declaration

typedef struct Point {
	int x;
	int y;
} Point;

permits subsequent definitions of type Point to be written as

Point pt1, pt2;

without the preceding struct keyword. This simplification is more marked for self-referencing structures, such as for a linked-list.

typedef struct list_t List;
	struct list_t {
	int item;
	List *next;
};

Link-list nodes are subsequently defined as type List. Notice that the typedefed name need not match the original structure tag; and for non-self-referencing structures, such as Point above, the structure-tag may be neglected altogether.

There are four main reasons for using typedef. The first is to improve the appearance of declaration syntax as shown above. This is perhaps best exemplified by the ability of typedef to break up complicated declarations such as arrays-of-pointers-to-functions. The second reason for using typedef is that type-synonyms often provide better in-code documentation than a plain type. For example, the type Length gives a better indication of a variable’s purpose than just an int, and makes the code more readable. The third use of typedef is to shield a program from portability problems.

If typedefs are used for data types that may be machine dependent, only the typedefs need change when the program is moved. One common situation is to use typedef names for various integer quantities, then make an appropriate set of choices of short, int, and long for each host machine. Types like size_t and ptrdiff_t from the standard library are examples. 

The typedef names localise the definition of non-portable types to a single location, rather than having them sprinkled throughout the code. For example, on a 32-bit machine, two-byte and fourbyte integer types may be defined as follows,

typedef short INT16;
typedef int INT32;

while, on a 16-bit machine, the same size integers are defined as

typedef int INT16;
typedef long INT32;

The fourth reason for using typedef is to facilitate a basic form of generic programming. In the linked-list examples above, each node contains an item of type int. By defining the contained type with a typedef, the list can be made to contain objects of different types.

typedef int ValueType;
typedef struct List {
	ValueType item;
	struct List *next;
} List;

The algorithms that use the link-list must also use the typedefed name.

List *insert_back(List *node, ValueType item);
List *insert_after(List *node, ValueType item);

With these alterations, the code can be made to operate on a different type with the change of a single line of code.

 

 

Object-Oriented Programming Style

Functions and structures work together to facilitate a basic style of object-oriented programming in terms of encapsulation and modularity. The synergy of these features permits a more modular coding style than is possible with either feature in isolation.

Encapsulation refers to the ability to hide the implementation details of a code module so that only a high-level abstraction of the code is visible to the client.3 The client knows what a module does but does not need to know how it does it. Functions facilitate modularity by hiding algorithm details behind a well-defined interface. Structures perform a similar task by hiding data representation details within a higher-level type definition. Used in combination, it is possible to define abstract data types and manipulate them via functions, thus hiding implementation details and presenting only a conceptual view to the client via the public interface.

This design principle is evident in the previous linked-list example, where the client need only be aware of a List type object, and various functions for adding elements to the list. The client does not need to know anything about the actual representation of the linked-list, or the algorithms for adding the elements.

The following is another simple example of object-oriented program design. Consider a structure that represents a complex number type.

typedef struct {
	double real;
	double imag;
} Complex;

The set of functions shown below perform a variety of operations on these complex number variables. They perform object creation, addition and multiplication of two complex numbers, in-place addition and multiplication, and comparison of two complex numbers. In each case, the client does not need to know how the functions perform their tasks, but simply what type of result is expected when they return.

1 Complex make complex(double r, double i)
2 /* Construct a Complex object with the initial values r and i. */
3 {
4 		Complex c;
5 		c.real = r;
6 		c.imag = i;
7 		return c;
8 }
9
10 Complex add complex(Complex a, Complex b)
11 /* Add two complex numbers and return the result. */
12 {
13 		Complex sum;
14 		sum.real = a.real + b.real;
15 		sum.imag = a.imag + b.imag;
16 		return sum;
17 }
18
19 Complex mult complex(Complex a, Complex b)
20 /* Multiply two complex numbers and return the result. */
21 {
22 		Complex prod;
23 		prod.real = a.real*b.real − a.imag*b.imag;
24 		prod.imag = a.real*b.imag + a.imag*b.real;
25 		return prod;
26 }
27
28 void addequal complex(Complex* a, const Complex* b)
29 /* Add two complex numbers and store the result in a. */
30 {
31 		a−>real += b−>real;
32 		a−>imag += b−>imag;
33 }
34
35 void multequal complex(Complex* a, const Complex* b)
36 /* Multiply two complex numbers and store the result in a. */
37 {
38 		a−>real = a−>real*b−>real − a−>imag*b−>imag;
39 		a−>imag = a−>real*b−>imag + a−>imag*b−>real;
40 }
41
42 int is equal(Complex a, Complex b)
43 /* Return TRUE if the values of a and b are equal. */
44 {
45 		return a.real == b.real && a.imag == b.imag;
46 }

Expandable Array Revisited

we examined an expandable array that grows on demand as new elements are added to it. The code presented was modular and flexible, but has two major limitations. First, it permits a program to have only one vector object and, second, the vector may only contain items of type int. Using structures and typedef, we can overcome these two problems, allowing us to create any number of vectors, and use the same code to produce vectors of different types.

The header file vector.h for the expandable vector is shown below. It is similar to the original code, but includes a typedef ValueType for the contained type and a struct Vector defining the format of the vector type. By simply changing line 4 to typedef char ValueType;

the vector can be made to contain elements of type char, and similarly for other types. The structure Vector groups together the three essential variables to represent a vector object: its size, capacity, and a pointer to the allocated array itself.

It is important to realise that because Vector objects are manipulated exclusively by the associated functions declared in lines 13 to 25, the user does not actually need to know how the Vector type is composed.4 All that the client needs to understand is what task each function in the public interface performs. Access to the Vector internals is managed by functions like get_element() and get_size(), etc. Because of this decoupling, the internal make up of struct Vector may be changed and, similarly, the function algorithms may be changed, all without affecting client code.

1 /* Vector: an expandable vector type that contains elements of type ValueType.
2 */
3
4 typedef double ValueType;
5
6 typedef struct {
7 		ValueType *data; /* pointer to vector elements */
8 		int size; /* current size of vector */
9 		int capacity; /* current reserved memory for vector */
10 } Vector;
11
12 /* Vector creation and destruction. */
13 Vector create vector(int capacity);
14 void release vector(Vector *v);
15
16 /* Vector access operations. */
17 int push back(Vector *v, ValueType item);
18 ValueType pop back(Vector *v);
19 ValueType* get element(Vector *v, int index);
20
21 /* Manual resizing operations. */
22 int get size(Vector *v);
23 int set size(Vector *v, int size);
24 int get capacity(Vector *v);
25 int set capacity(Vector *v, int size);

The operations performed by the public interface are essentially the same as for the original vector implementation. Two new functions

Vector create_vector(int capacity);
void release_vector(Vector *v);

are used to initialise the internal data of a vector object and to release the data of a vector object, respectively. Having created a vector object, it may be passed to the other functions (via a pointer) to perform various operations.

The implementation of the expandable vector involves only minor modifications from the original implementation. All references to int-type elements have been changed to ValueType, and operations are carried out using a pointer to a vector object. The code of source file vector.c is shown below.

1 #include "vector.h"
2 #include <stdlib.h>
3 #include <assert.h>
4
5 static const int StartSize = 5; /* initial vector capacity */
6 static const float GrowthRate = 1.6f; /* geometric growth of vector capacity */
7
8 Vector create vector(int capacity)
9 /* Initialise a vector with the specified capacity. */
10 {
11 		Vector v;
12 		v.size = 0;
13 		v.data = (ValueType *)malloc(capacity*sizeof (ValueType));
14 		v.capacity = (v.data == NULL)?0: capacity;
15 		return v;
16 }
17
18 void release vector(Vector *v)
19 /* Release memory owned by vector. */
20 {
21 		free(v−>data);
22 		v−>size = 0;
23 		v−>capacity = 0;
24 }
25
26 int push back(Vector *v, ValueType item)
27 /* Add element to back of vector. Return index of new element if successful, and -1 if fails. */
28 {
29 /* If out-of-space, allocate more. */
30 		if (v−>size == v−>capacity) {
31 		int newsize = (v−>capacity == 0) ? StartSize : (int)(v−>capacity*GrowthRate + 1.0);
32 		ValueType *p = (ValueType *)realloc(v−>data, newsize*sizeof (ValueType));
33 		if (p == NULL)
34 			return −1;
35
36 		v−>capacity = newsize; /* allocate succeeds, update data-structure */
37 		v−>data = p;
38 }
39
40 /* We have enough room. */
41 v−>data[v−>size] = item;
42 		return v−>size++;
43 }
44
45 ValueType pop back(Vector *v)
46 /* Return element from back of vector, and remove it from the vector. */
47 {
48 		assert(v−>size > 0);
49 		return v−>data[−−v−>size];
50 }
51
52 ValueType* get element(Vector *v, int index)
53 /* Return pointer to the element at the specified index. */
54 {
55 		assert(index >= 0 && index < v−>size);
56 		return v−>data + index;
57 }
58
59 /* Manual size operations. */
60 int get size(Vector *v) { return v−>size; }
61 int get capacity(Vector *v) { return v−>capacity; }
62
63 int set size(Vector *v, int size)
64 /* Set vector size. Return 0 if successful, -1 if fails. */
65 {
66 if (size > v−>capacity) {
67 		ValueType *p = (ValueType *)realloc(v−>data, size*sizeof (ValueType));
68 		if (p == NULL)
69 			return −1;
70
71 		v−>capacity = size; /* allocate succeeds, update data-structure */
72 		v−>data = p;
73 }
74
75 v−>size = size;
76 return 0;
77 }
78
79 int set capacity(Vector *v, int size)
80 /* Shrink or grow allocated memory reserve for array.
81 * A size of 0 deletes the array. Return 0 if successful, -1 if fails. */
82 {
83 		if (size != v−>capacity) {
84 		ValueType *p = (ValueType *)realloc(v−>data, size*sizeof (ValueType));
85 		if (p == NULL && size > 0)
86 			return −1;
87
88 			v−>capacity = size;
89 			v−>data = p;
90 		}
91
92 		if (size < v−>size)
93 		v−>size = size;
94 		return 0;
95 }

The result of this code is an expandable vector library that can create and operate on any number of vector objects. These vectors can contain elements of type ValueType which is specified at compile time.

The code has an object-oriented style, where a high-level data type is defined using a structure (Vector), and operations on the structure are performed exclusively by a set of associated functions. The client is shielded from the data representation and algorithmic details; these aspects may be changed without affecting client code. The synergy of structures, functions and file-modularity permits highly modular and flexible code design.

This code still has one limitation remaining. Vectors can be made to contain elements of different types, but not at the same time. In a given program, all vectors must contain the same type. This limitation can be overcome using the facility of the generic object pointer void*. We return to the topic of generic programming.

 

 

Unions

A union is a variable that may hold (at different times) objects of different types and sizes, with the compiler keeping track of size and alignment requirements. Unions provide a way to manipulate different kinds of data in a single area of storage, without embedding any machine-dependent information in the program

The declaration of a union type is similar to the declaration of a struct type. For example,

union Utype {
	int ival;
	float fval;
	char *sval;
};
union Utype x, y, z; /* Define variables of type Utype. */

Accessing members of a union type is also the same as for structures, with the . member operator for union objects and the -> operator for pointers to union objects.

However, the behaviour of union variables is quite different to structures. Where a struct defines a group of related variables and provides storage for all of its members, a union provides storage for a single variable, which may be one of several types. In the above example, the compiler will allocate sufficient memory to store the largest of the types int, float, and char *. At any given time, a Utype variable holds a value for one of the three possible types, and it is the programmers responsibility to keep track of which type that might be. The type retrieved must be the same as the type most recently stored, and the result of retrieving a different type is implementation dependent.

union Utype x;
x.fval = 56.4; /* x holds type float. */
printf("%f\n", x.fval); /* OK. */
printf("%d\n", x.ival); /* Implementation dependent. */

Unions are usually used for one of three purposes. The first is to create a “variant” array—an array that can hold heterogeneous elements. This can be performed with a reasonable degree of type safety by wrapping the union within a structure which records the union variable’s current type.
For example,

typedef union { /* Heterogeneous type. */
	int ival;
	float fval;
} Utype;
enum { INT, FLOAT }; /* Define type tags. */
typedef struct {
	int type; /* Tag for the current stored type. */
	Utype val; /* Storage for variant type. */
} VariantType;
VariantType array[50]; /* Heterogeneous array. */
array[0].val.ival = 56; /* Assign value. */
array[0].type = INT; /* Mark type. */
...
for (i = 0; i < sizeof(array)/sizeof(array[0]); ++i) /* Print array. */
	if (array[i].type == INT)
		printf("%d\n", array[i].val.ival);
	else if (array[i].type == FLOAT)
		printf("%f\n", array[i].val.fval);
	else
		printf("Unknown type\n");

Checking for the correct type remains the programmer’s responsibility, but encoding the variable type in a structure eases the pain of recording the current state.

The second use of a union is to enforce the alignment of a variable to a particular address boundary. This is a valuable property for implementing memory allocation functions. And the third key use of a union is to get “under the hood” of C’s type system to discover something about the computer’s underlying data representation. For example, to print the representation of a floating-point number, one might use the following function (assuming int and float are both four-bytes).

void print_representation(float f)
/* Print internal representation of a float (adapted from H&S page 145). */
{
	union { float f; int i; } fi = f;
	printf("The representation of %e is %#x\n", fi.f, fi.i);
}

Both the second and third uses of unions described here are advanced topics, and a more complete discussion is beyond the scope of this text.