Tutorial : C - Arrays And Strings

Arrays and Strings

An array is a group of variables of a particular type occupying a contiguous region of memory. In C, array elements are numbered from 0, so that an array of size N is indexed from 0 to N − 1. An array must contain at least one element, and it is an error to define an empty array.

double empty[0]; /* Invalid. Won’t compile. */



Array Initialisation

As for any other type of variable, arrays may have local, external or static scope. Arrays with static extent have their elements initialised to zero by default, but arrays with local extent are not initialised by default, so their elements have arbitrary values.

It is possible to initialise an array explicitly when it is defined by using an initialiser list. This is a list of values of the appropriate type enclosed in braces and separated by commas. For example,

int days[12] = { 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 };

If the number of values in the initialiser list is less than the size of the array, the remaining elements of the array are initialised to zero. Thus, to initialise the elements of an array with local extent to zero, it is sufficient to write

int localarray[SIZE] = {0};

It is an error to have more initialisers than the size of the array.

If the size of an array with an initialiser list is not specified, the array will automatically be allocated memory to match the number of elements in the list. For example,

int days[] = { 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 };

the size of this array will be twelve. The size of an array may be computed via the sizeof operator,

int size = sizeof(days); /* size equals 12 * sizeof(int) */

which returns the number of characters of memory allocated for the array. A common C idiom is to use sizeof to determine the number of elements in an array as in the following example.

nelems = sizeof(days) / sizeof(days[0]);
for(i = 0; i<nelems; ++i)
	printf("Month %d has %d days.\n", i+1, days[i]);

This idiom is invariant to changes in the array size, and computes the correct number of elements even if the type of the array changes. For this reason, an expression of the form

sizeof(array) / sizeof(array[0])

is preferred over, for example,

sizeof(array) / sizeof(int)

as array might one day become an array of type unsigned long.

Note. sizeof will only return the size of an array if it refers to the original array name. An array name is automatically converted to a pointer in an expression, so that any other reference to the array will not be an array name but a pointer. For example,

int *pdays = days;
int size1 = sizeof(days); /* size1 equals 12 * sizeof(int) */
int size2 = sizeof(days + 1); /* size2 equals sizeof(int *) */
int size3 = sizeof(pdays); /* size3 equals sizeof(int *) */

Similarly, if an array is passed to a function, it is converted to a pointer.

int count_days(int days[], int len)
	int total=0;
	/* assert will fail: sizeof(days) equals sizeof(int *) and len equals 12 */
	assert(sizeof(days) / sizeof(days[0]) == len);
	total += days[len];
	return total;



Character Arrays and Strings

Character arrays are special. They have certain initialisation properties not shared with other array types because of their relationship with strings. Of course, character arrays can be initialised in the normal way using an initialiser list.

char letters[] = { ’a’, ’b’, ’c’, ’d’, ’e’ };

But they may also be initialised using a string constant, as follows.

char letters[] = "abcde";

The string initialisation automatically appends a \0 character, so the above array is of size 6, not 5. It is equivalent to writing,

char letters[] = { ’a’, ’b’, ’c’, ’d’, ’e’, ’\0’ };

Thus, writing

char letters[5] = "abcde"; /* OK but bad style. */

while not an error, is very poor style, as the size of the array is too small for its initialiser list.
An important property of string constants is that they are allocated memory; they have an address and may be referred to by a char * pointer. For constants of any other type, it is not possible to assign a pointer because these constants are not stored in memory and do not have an address. So the following code is incorrect.

double *pval = 9.6; /* Invalid. Won’t compile. */
int *parray = { 1, 2, 3 }; /* Invalid. Won’t compile. */

However, it is perfectly valid for a character pointer to be assigned to a string constant.

char *str = "Hello World!\n"; /* Correct. But array is read-only. */

This is because a string constant has static extent—memory is allocated for the array before the program begins execution, and exists until program termination—and a string constant expression returns a pointer to the beginning of this array.

Note. A string constant is a constant array; the memory of the array is read-only. The result of attempting to change the value of an element of a string constant is undefined. For example,

char *str = "This is a string constant";
str[11] = ’p’; /* Undefined behaviour. */

The static extent of string constants leads to the possibility of various unusual code constructs. For example, it is legitimate for a function to return a pointer to a string constant; the string is not destroyed at the end of the function block.

char * getHello()
/* Return a pointer to an array defined within the function */
	char *phello = "Hello World\n";
	return phello;

It is also valid to directly index a string constant, as demonstrated in the following function, which converts a decimal value to a value in base b, and, for bases 11 to 36, correctly substitutes letters for digits where required.

1 void print base b (unsigned x, unsigned b)
2 /* Convert decimal value x to a representation in base b. */
3 {
4 		char buf[BUF SIZE];
5 		int q=x, i=0;
6 		assert(b >= 2);
8 		/* Calculate digit for each place in base b */
9 		do {
10 			assert(i < BUF SIZE);
11 			x = q;
12 			q = x/b;
13 			buf[i++] = "0123456789abcdefghijklmnopqrstuvwxyz"[x − q*b];
14 		} while (q>0);
16 		/* Print digits, in reverse order (most-significant place first) */
17 		for (−−i; i>=0; −−i)
18 			printf("%c", buf[i]);
19 		printf("\n");
20 }

So, for a pointer to a string constant, the string constant is read-only. However, for a character array initialised by a string constant, the result is read-writable. This is because, with an array definition, the compiler first allocates memory for the character array and then copies the elements of the string constant into this memory region. Note, the only time a string is copied automatically by the compiler is when a char array is initialised. In every other situation, a string has to be manually copied character-by-character (or by functions such as strcpy() or memcpy()).

A collection of valid operations on various array types is shown below.

short val = 9;
short *pval = &val; /* OK */
double array[] = {1.0, 2.0, 3.0 };
double *parray = array; /* OK */
char str[] = "Hello World!\n"; /* Correct. Array is read-write. */
str[1] = ’a’; /* OK */




Strings and the Standard Library

The standard library contains many functions for manipulating strings, most of which are declared in the header-file string.h. This section describes several of the more commonly-used functions.
• size_t strlen(const char *s). Returns the number of characters in string s, excluding the terminating ’\0’ character. The special unsigned type size_t is used instead of plain int to cater for the possibility of arrays that are longer than the maximum representable int.
• char *strcpy(char *s, const char *t). Copies the string t into character array s, and returns a pointer to s.
• int strcmp(const char *s, const char *t). Performs a lexicographical2 comparison of strings s and t, and returns a negative value if s<t, a positive value if s>t, and zero if s == t.
• char *strcat(char *s, const char *t). Concatenates the string t onto the end of string s. The first character of t overwrites the ’\0’ character at the end of s.
• char *strchr(const char *s, int c). Returns a pointer to the first occurrence of character c in string s. If c is not present, then NULL is returned.
• char *strrchr(const char *s, int c). Performs the same task as strchr() but starting from the reverse end of s.
• char *strstr(const char *s, const char *t). Searches for the first occurrence of substring t in string s. If found, it returns a pointer to the beginning of the substring in s, otherwise it returns NULL.

The functions strncpy(), strncmp(), and strncat() perform the same tasks as their counterparts strcpy(), strcmp(), and strcat(), respectively, but include an extra argument n, which limits their operations to the first n characters of the right-hand string.

A standard function that can perform the operations of both strcpy() and strcat(), and even more, is sprintf(). It is a general purpose string formatting function that behaves identically to printf(), but copies the resulting formatted string to a character array rather than sending it to stdout. sprintf() is a very versatile string manipulation function.

Aside. In general, the concatenation of two strings requires the use of a function like strcat(). However, string constants may be concatenated at compile time by placing them adjacent to one another. For example, "this is " "a string" is equivalent to "this is a string". Compiletime concatenation is useful for writing long strings, since typing a multi-line string constant like

"this is
	a string"

is an error. An alternative way to write multi-line string constants is to write

"this is \
a string"

where the first character of the second half of the string occurs in the first column of the next line without preceding white-space. (This is one occasion where white-space matters in a C program.) Usually the adjacency method is preferred over the ’\’ method.



Arrays of Pointers

Since pointers are themselves variables, they can be stored in arrays just as other variables can. For example, an array of N pointers to ints has the following syntax.

int *parray[N];

Each pointer in an array of pointers behaves as any ordinary pointer would. They might point to an object, to NULL, to an illegal memory location, or to an array.

double val = 9.7;
double array[] = { 3.2, 4.3, 5.4 };
double *pa[] = { &val, array+1, NULL };

In the above example, element pa[i] is a pointer to a double, and *pa[i] is the double variable that it points to. The dereferenced *pa[0] is equal to 9.7, and *pa[1] is 4.3, but pa[2] is equal to NULL and may not be dereferenced.

If an element in an array of pointers also points to an array, the elements of the pointed-to array may be accessed in a variety of different ways. Consider the following example.

int a1[] = { 1, 2, 3, 4 };
int a2[] = { 5, 6, 7 };
int *pa[] = { a1, a2 }; /* pa stores pointers to beginning of each array. */
int **pp = pa; /* Pointer-to-a-pointer holds address of beginning of pa. */
int *p = pa[1]; /* Pointer to the second array in pa. */
int val;

val = pa[1][1]; /* equivalent operations: val = 6 */
val = pp[1][1];
val = *(pa[1] + 1);
val = *(pp[1] + 1);
val = *(*(pp+1) + 1));
val = p[1];

Notice that in an expression pa and pp are equivalent, but the difference is that pa is an array name and pp is a pointer. That is, pp is a variable and pa is not. 

Arrays of pointers are useful for grouping related pointers together. Typically these are one of three types: pointers to large objects (such as structs), pointers to arrays, or pointers to functions.

For the first two categories, most interesting applications of pointer arrays involve the use of dynamic memory, which is discussed in Chapter 9. However, a simple example of an array of pointers to arrays is the following program which obtains a number from the user and prints the name of the corresponding month.

1 #include <stdio.h>
3 int main(void)
4 {
5 		char *months[] = { "Illegal", "January", "February", "March",
6 			"April", "May", "June", "July", "August", "September",
7 			"October", "November", "December" };
8 		int i, j;
10 		printf("Input an integer between 1 and 12: ");
11 		scanf("%d", &i);
12 		if (i<1 | | i>12) i=0;
14 		printf("Month number %d is %s.\n", i, months[i]); /* print string */
16 		printf("The letters of the month are: ");
17 		for (j = 0; months[i][j] != ’\0’; ++j) /* access elements using [ ][ ] */
18 			printf("%c ", months[i][j]);
19 }

A common application for an array of function pointers is the construction of a “dispatch” table, which is used to invoke a particular function depending on the value of an array index. Each function in the array must have the same interface. The following example shows the syntax for defining an array of function pointers.

int (*pf[10])(char *);

This statement defines a variable pf, which is an array of pointers to functions where each function fits the interface specification: takes a char * argument and returns an int. The example program below shows the use of an array of function pointers to perform simple arithmetic operations. Each function has the interface signature of two double arguments and returns a double.

1 #include <stdio.h>
2 #include <assert.h>
4 double add(double a, double b) { return a + b; }
5 double sub(double a, double b) { return a − b; }
6 double mult(double a, double b) { return a * b; }
7 double div(double a, double b) { assert(b != 0.0); return a / b; }
9 int main(void)
10 {
11 		int i;
12 		double val1, val2;
13 		double (*pf[ ])(double,double) = { add, sub, mult, div };
15 		printf("Enter two floating-point values, and an integer between 0 and 3: ");
16 		scanf("%lf%lf%d", &val1, &val2, &i);
17 		if (i<0 | | i>3) i = 0;
19 		printf("Performing operation %d on %3.2f and %3.2f equals %3.2f\n",
20 			i, val1, val2, pf[i](val1, val2));
21 }

The declaration syntax for function pointers and arrays of function pointers rapidly becomes very complicated. A good mechanism for simplifying declarations is to break them up using typedef. The keyword typedef is used for creating new data-type names. For example, in the following declaration,

typedef char * String;

the name String becomes a synonym for char *, and can be used to define variables of this type.

String message = "This is a string.";

With regard to complicated function pointer declarations, different parts of the declaration can be given a name using typedef, so that the combined whole is more readable. For example, the declaration in the program above can be rewritten as

typedef double (*Arithmetic)(double,double);
Arithmetic pf[] = { add, sub, mult, div };

The name Arithmetic becomes a synonym for a function pointer of the specified type, and defining an array of such function pointers is simple. We discuss further uses of typedef



Multi-dimensional Arrays

C provides rectangular multi-dimensional arrays, although in practice they are much less used than arrays of pointers

A multi-dimensional array is defined using multiple adjacent square brackets, and the elements of the array may be initialised with values enclosed in curly braces, as in the following example.

float matrix[3][4] = {
	{ 2.4, 8.7, 9.5, 2.3 },
	{ 6.2, 4.8, 5.1, 8.9 },
	{ 7.2, 1.6, 4.4, 3.6 }

The braces of the initialisation list are nested such that the inner braces enclose each row of the twodimensional array. In memory, a multi-dimensional array is laid out row-wise as a single dimensional array. For example, the layout in memory of matrix is equivalent to the following 1-D array. 

float oneD[] = { 2.4, 8.7, 9.5, 2.3, 6.2, 4.8, 5.1, 8.9, 7.2, 1.6, 4.4, 3.6 };

As for one dimensional arrays, multi-dimensional arrays may be defined without a specific size. However, only the left-most subscript (i.e., the number of rows) is free, and the other dimensions must be given a definite value. For example,

float matrix[][4] = { /* The 4 must be specified. */
	{ 2.4, 8.7, 9.5, 2.3 },
	{ 6.2, 4.8, 5.1, 8.9 }

To access an element of a multi-dimensional array, the correct notation is to enclose each subscript in a separate pair of square braces. This differs from many other programming languages, which use comma separated subscripts.

float a = matrix[1][2]; /* Correct. */
float b = matrix[1,2]; /* Wrong. */

A multi-dimensional array may have any number of dimensions, and higher-dimensional initialiser lists involve a deeper level of nested braces for each dimension. The following example illustrates the general format.

short array3d[4][2][3] = {
	{ { 0, 1, 2 }, { 3, 4, 5 } },
	{ { 6, 7, 8 }, { 9, 10, 11 } },
	{ { 12, 13, 14 }, { 15, 16, 17 } },
	{ { 18, 19, 20 }, { 21, 22, 23 } }

Note, the above array may have been defined with the left-most subscript unspecified,

short array3d[][2][3] = ... ;

but the other dimensions must be fully qualified.

An example program using multi-dimensional arrays is given below. This program defines a two-dimensional array to represent a 3-by-3 matrix and passes it to a function which computes the product of two matrices.

1 #include <stdio.h>
3 #define SIZE 3
5 void multiply(int (*r)[SIZE], const int a[ ][SIZE], const int b[SIZE][SIZE])
6 /* Multiply two (SIZE by SIZE) matrices, a and b, and store result in r.
7 * The result matrix, r, must be zeroed before-hand. */
8 {
9 		int i, j, k;
11 		for (i = 0; i<SIZE; ++i)
12 		for (j = 0; j<SIZE; ++j)
13 		for (k = 0; k<SIZE; ++k)
14 			r[i][j] += a[i][k] * b[k][j];
15 }
17 int main(void)
18 {
19 		int m1[ ][SIZE] = {
20 			{ 1, 2, 3 },
21 			{ 4, 5, 6 },
22 			{ 7, 8, 9 }
23 		};
25 		int m2[SIZE][SIZE] = {0};
26 		int i, j;
28 		multiply(m2, m1, m1);
29 		for (i=0; i<SIZE; ++i) {
30 			for (j=0; j<SIZE; ++j)
31 				printf("%3d ", m2[i][j]);
32 			printf("\n");
33 		}
34 }

The output for this program is the square of matrix m1.
    30      36  42
    66      81  96
    102 126 150


Notice, on line 5, the function interface for multiply() shows three equivalent declarations: a pointer to an array, a 2-D array with an unspecified left subscript, and a fully specified 2-D array, respectively. Notice that the pointer to an array is required to specify the size of the nonleft-most subscript. In addition, notice the parentheses about the pointer-to-an-array identifier, int (*r)[SIZE], this is to distinguish it from an array-of-pointers, int *r[SIZE].

Often multi-dimensional arrays and arrays of pointers may be used in an identical fashion. This can be a source of confusion to novice C programmers. For example, given the following array definitions,

char a[] = { 1, 2, 3 };
char b[] = { 4, 5, 6 };
char *c[] = { a, b };
char d[][3] = { { 1, 2, 3 }, { 4, 5, 6 } };

The subscripts c[i][j] and d[i][j] will give the same results.

However, multi-dimensional arrays and arrays of pointers are different in both their representation and application. First, a multi-dimensional array occupies a single contiguous region of memory, while an array-of-pointers may point to disparate memory locations. Second, a multi-dimensional array is rectangular—each row is the same length, while an array of pointers may refer to arrays of different length (including nothing, NULL). Finally, a multi-dimensional array requires all but the left-most subscript to be specified when it is passed to function, while an array of pointers makes no such requirement.

In summation, arrays of pointers are usually the more flexible (and often more efficient) option, and so are used far more frequently than multi-dimensional arrays.