Tutorial : C - Input And Output

Input and Output

 

The C language provides no direct facilities for input and output (IO), and, instead, these operations are supplied as functions in the standard library. This chapter describes the most commonly used functions. It also discusses the topic of command-shell redirection which, while non-standard, is widely supported, and the topic of command-line arguments, which enables a program to receive instructions from its calling environment.

It is important to realise that this chapter does not present all the functions related to IO in the standard library and, of the functions it does discuss, it does not cover every detail.

 

 

Formatted IO

The standard functions for formatted IO, printf() and scanf(), have been mentioned at a cursory level in previous chapters. These functions are very powerful and possess a level of sophistication beyond the scope of this text (although we will touch on these more complex aspects briefly). However, for most common purposes, they are intuitive, flexible and simple to use.

 

 

Formatted Output: printf()

The function printf() is a general purpose print function that converts and formats its arguments to a character string, and prints the result to standard output (typically the screen). The general interface for printf() is

int printf(const char *format, arg1, arg2, ...);

The first argument is a format string, which defines the layout of the printed text. This is followed by zero or more optional arguments, with the number of arguments, and their type, being determined by the contents of the format string. The return value is the number of characters printed, unless an error occurs during output whereupon the return value is EOF.

The format string is composed of ordinary characters and conversion specification characters. The former are printed verbatim, while the latter are used to control the conversion of the optional arguments following the format string. Conversion specifications are identified by a % character followed by a number of optional fields and terminated by a type conversion character. A simple example is

printf("%d green %s sitting on a wall.\n", 10, "bottles");

where the ordinary characters “green” and “sitting on a wall.\n” are printed verbatim, and the conversion specifiers %d and %s insert the additional arguments at the appropriate locations. The type conversion character must match its associated argument type; in the example, the %d indicates an integer argument and the %s indicates a string argument.

There are different conversion characters for ints (d, i, o, x, c), unsigned ints (u), doubles (f, e, g), strings (s), and pointers (p). Details of these may be found in any C reference text. To print a % character, the conversion specification %% is used.

Between the % and the type conversion character there may exist a number of optional fields. These control the formatting of the converted argument. Consider, for example, the conversion specifier %-#012.4hd

The first field is a set of flags, which modify the meaning of the conversion operation (e.g., make the argument left-justified, or pad with zeros). The second field specifies a minimum width reserved for the converted argument (in characters), and so provides padding for under-sized values. The third field is a precision specification, which has various different meanings for integers, floating point values and strings. The fourth field is a size modifier, which indicates conversion to a longer or shorter type than the default conversion types (where the default types are, for example, int or double).

Again, a good reference text will have more information regarding these conversion specifications. Most often printf() involves just the % and a conversion character, and rarely get more complex than the following example,

printf("Value = %-10.3f radians.\n", fval);

which, if passed the floating point value 3.14159, would print

Value = 3.142 radians.

where the converted floating point value is left-justified (due to the - flag), is padded with sufficient space for 10 characters, and displays 3 digits after the decimal point.

It is essential that the type conversion specifier matches the type of the argument, as the compiler cannot catch type-mismatches for variable-length argument lists. However, there is no need for conversion characters for types float and short, etc, as these types are automatically converted to double and int, respectively, by the usual argument promotion rules.

Aside : The ability of write functions with variable-length argument lists is not restricted to implementers of the standard library. The standard provides facilities that enable application programmers to write functions with these same capabilities. The standard header stdarg.h contains a set of macro definitions that define how to step through an argument list. The declaration of a variable-length argument list is marked by ellipsis (...) in the function interface, and typically the type and number of arguments is specified using a format string, as in the following example.

int varfunc(char *format, ...);

Note, the ellipsis declaration may only appear at the end of an argument list. The implementation of such functions using the macros from stdarg.h is beyond the scope of this text.

 

 

Formatted Input: scanf()

The scanf() function is the input analog of printf(), providing many of the same conversion specifications in the opposite direction (although there are differences, so be wary). It obtains data from standard input, which is typically the keyboard. The general interface for scanf() is

int scanf(const char *format, ...);

This is identical to printf() in form, with a format string and a variable argument list, but an important difference is that the arguments for scanf() must be pointer types. This allows the input data to be stored at the address designated by the pointer using pass-by-reference semantics. For example,

double fval;
scanf("%lf", fval); /* Wrong */
scanf("%lf", &fval); /* Correct, store input in fval. */

scanf() reads characters from standard input and interprets them according to the format string specification. It stops when it exhausts the format string, or when some input fails to match a conversion specification. Its return value is the number of values successfully assigned in its variable-length argument list.3 If a conflict occurs between the a conversion specification and the actual input, the character causing the conflict is left unread and is processed by the next standard input operation.

The mechanics of the format string and its conversion specifications are even more complicated for scanf() than for printf(), and there are many details and caveats that will not be discussed here. Most of the conversion characters for printf()—d, i, o, x, c, u, f, e, g, s, p, etc—have similar meanings for scanf(), but there are certain differences, some subtle. Thus, one should not use the documentation for one as a guide for the other. Some of these differences are as follows.

• Where printf() has four optional fields, scanf() has only two. It has the width and size modifier fields but not the flags and precision fields.

• For printf() the width field specifies a minimum reserve of space (i.e., padding), while for scanf() it defines a maximum limit on the number of characters to be read.

• An asterisk character (*) may be used in place of the width field for both printf() and scanf(), but with different meanings. For printf() it allows the width field to be determined by an additional argument, but for scanf() it suppresses assignment of an input value to its argument.

• The conversion character [ is not valid for printf(), but for scanf() it permits a scanset of characters to be specified, which allows scanf() to control exactly the characters it reads in.

• The size modifier field is typically neglected for printf(), but is vital for scanf(). For example, to read a float, one uses the conversion specifier %f. To read a double, the size modifier l (for long) must appear, %lf.

Of the above, the third and fourth points are rather advanced features that we will not dwell on further. However, the last point is important, and a common source of errors for new C programmers. The conversion specifier and size modifier must match the associated argument type or the result is undefined.

The scanf() format string consists of conversion specifiers, ordinary characters, and white-space. Where ordinary characters appear in the format string, they must match exactly the format of the input. For example, the following statement is used to read a date of the form dd/mm/yy.

int day, month, year;
scanf("%d/%d/%d", &day, &month, &year);

In general scanf() ignores white-space characters in its format string, and skips over white-space in stdin as it looks for input values. Exceptions to this rule arise with the %c and %[ conversion specifiers, which do not skip white-space. For example, if the user types in “one two” for each of the statements below, they will obtain different results.

char s[10], c;
scanf("%s%c", s, &c); /* s = "one", c = ’ ’ */
scanf("%s %c", s, &c); /* s = "one", c = ’t’ */

In the first case, the %c reads in the next character after %s leaves off, which is a space. In the second, the white-space in the format string causes scanf() to consume any white-space after “one”, leaving the first non-space character (t) to be assigned to c.

While the many details of scanf() formatting complicates a complete understanding, its basic use is quite simple. Rarely does an input statement get more complicated than

short a;
double b;
char c[20];
scanf("%hd %lf %s", &a, &b, c);

However, it is worth noting that the above form of string (%s) input is not ideal. A string is read up to the first white-space character unless terminated early by a width field. Thus, a very long input of consecutive non-space characters may overflow the string’s character buffer. To prevent overflow, a string conversion specification should always include a width field. Consider a situation where a user types in the words “small supererogatory” for the following input code.

char s1[10], s2[10], s3[10];
scanf("%9s %9s %9s", s1, s2, s3);

Notice the width fields are one-less than the array sizes to allow room for the terminating \0. The first word “small” fits into s1, but the second word is over-long—its first nine characters “supererog” are placed in s2 and the rest “atory” goes into s3.

A few final warnings about scanf(). First, keep in mind that the arguments in its variable length argument list must be pointers; forgetting the & in front of non-pointer variables is a very common mistake. Second, when there is a conflict between a conversion specification and the actual input, the offending character is left unread. Thus, an expression like

while (scanf("%d", &val) != EOF)

is dangerous as it will loop forever if there is a conflict. Third, while scanf() is a good choice when the exact format of the input is known, other input techniques may be better suited if the format may vary. For example, the combination of fgets() and sscanf(), described in the next section, is a useful alternative if the input format is not precisely known. The fgets() function reads a line of characters into a buffer, and sscanf() extracts the data, and can pick out different parts using multiple passes if necessary.

 

 

String Formatting

The functions sprintf() and sscanf() perform essentially the same operations as printf() and scanf(), respectively, but, rather than interact with stdout or stdin, they operate on a character array argument. They present the following interfaces.

int sprintf(char *buf, const char *format, ...);
int sscanf(const char *buf, const char *format, ...);

The sprintf() function stores the resulting formatted string in buf and automatically appends this string with a terminating \0 character. It returns the number of characters stored (excluding \0). This function is very useful for a wide range of string manipulation operations. For example, the following code segment creates a format string at runtime, which prevents scanf() from overflowing its character buffer.

char buf[100], format[10];
sprintf(format, "%%%ds", sizeof(buf)-1); /* Create format string "%99s". */
scanf(format, buf); /* Get string from stdin. */

The input string is thus limited to not more than 99 characters plus 1 for the terminating \0. sscanf() extracts values from the string buf according to the format string, and stores the results in the additional argument list. It behaves just like scanf() with buf replacing stdin as the source of input characters. An attempt to read beyond the end of string buf for sscanf() is equivalent to reaching the end-of-file for scanf(). The sscanf() function is often used in conjunction with a line input function, such as fgets(), as in the following example.

char buf[100];
double dval;
fgets(buf, sizeof(buf), stdin); /* Get a line of input, store in buf. */
sscanf(buf, "%lf", &dval); /* Extract a double from buf. */

 

 

File IO

The C language is closely tied to the UNIX operating system; they were initially developed in parallel, and UNIX was implemented in C.6 Thus, much of the standard C library is modelled on UNIX facilities, and in particular the way it performs input and output by reading or writing to files.

In the UNIX operating system, all input and output is done by reading or writing files, because all peripheral devices, even keyboard and screen, are files in the file system. This means that a single homogeneous interface handles all communication between a program and peripheral devices

 

 

Opening and Closing Files

A file is referred to by a FILE pointer, where FILE is a structure declaration defined with a typedef in header stdio.h.

This file pointer “points to a structure that contains information about the file, such as the location of a buffer, the current character position in the buffer, whether the file is being read or written, and whether errors or end-of-file have occurred” .  All these implementation details are hidden from users of the standard library via the FILE type-name and the associated library functions.

A file is opened by the function fopen(), which has the interface

FILE *fopen(const char *name, const char *mode);

The first argument, name, is a character string containing the name of the file. The second is a mode string, which determines how the file may be used. There are three basic modes: read "r", write "w" and append "a". The first opens an existing file for reading, and fails if the file does not exist. The other two open a file for writing, and create a new file if it does not already exist. Opening an existing file in "w" mode, first clears the file of its existing data (i.e., overwrites the existing file). Opening in "a" mode preserves the existing data and adds new data to the end of the file.

Each of these modes may include an additional “update” specification signified by a + character (i.e., "r+", "w+", "a+"), which enables the file stream to be used for both input and output. This ability is most useful in conjunction with the random access file operations described in Section 13.2.4 below.

Some operating systems treat “binary” files differently to “text” files. (For example, UNIX handles binary and text files the same; Win32 represents them differently.) The standard C library caters for this variation by permitting a file to be explicitly marked as binary with the addition of a b character to the file-open mode (e.g., "rb" opens a binary file for reading). If opening a file is successful, fopen() returns a valid FILE * pointer. If there is an error, it returns NULL (e.g., attempting to open a file for reading that does not exist, or attempting to open a file without appropriate permissions). As with other functions that return pointers to limited resources, such as the dynamic memory allocation functions, it is prudent to always check the return value for NULL.

To close a file, the file pointer is passed to fclose(), which has the interface

int fclose(FILE *fp);

This function breaks the connection with the file and frees the file pointer. It is good practice to free file pointers when a file is no longer needed as most operating systems have a limit on the number of files that a program may have open simultaneously. However, fclose() is called automatically for each open file when a program terminates.

 

 

Standard IO

When a program begins execution, there are three text streams predefined and open. These are standard input (stdin), standard output (stdout) and standard error (stderr). The first two signify “normal” input and output, and for most interactive environments are directed to the keyboard and screen, respectively. Their input and output streams are usually buffered, which means that characters are accumulated in a queue and sent in packets, minimising expensive system calls. Buffering may be controlled by the standard function setbuf(). The stderr stream is reserved for sending error messages. Like stdout it is typically directed to the screen, but its output is unbuffered.

 

 

Sequential File Operations

Once a file is opened, operations on the file—reading or writing—usually negotiate the file in a sequential manner, from the beginning to the end. The standard library provides a number of different operations for sequential IO.

The simplest functions process a file one character at a time. To write a character there are the functions

int fputc(int c, FILE *fp);
int putc(int c, FILE *fp);
int putchar(int c);

where calling putchar(c) is equivalent to calling putc(c, stdout). The functions putc() and fputc() are identical, but putc() is typically implemented as a macro for efficiency. These functions return the character that was written, or EOF if there was an error (e.g., the hard disk was full).

To read a character, there are the functions

int fgetc(FILE *fp);
int getc(FILE *fp);
int getchar(void);

which are analogous to the character output functions. Calling getchar() is equivalent to calling getc(stdin), and getc() is usually a macro implementation of fgetc(). These functions return the next character in the character stream unless either the end-of-file is reached or an error occurs.

In these anomalous cases, they return EOF. It is possible to push a character c back onto an input stream using the function

int ungetc(int c, FILE *fp);

The pushed back character will be read by the next call to getc() (or getchar() or fscanf(), etc) on that stream.

Note. The symbolic constant EOF is returned by standard IO functions to signal either end-of-file or an IO error. For input functions, it may be necessary to determine which of these cases is being flagged. Two standard functions, feof() and ferror(), are provided for this task and, respectively, they return non-zero if the prior EOF was due to end-of-file or an output error.

Formatted IO can be performed on files using the functions

int fprintf(FILE *fp, const char *format, ...);
int fscanf(FILE *fp, const char *format, ...);

These functions are generalisations of printf() and scanf(), which are equivalent to the calls fprintf(stdout, format, ...) and fscanf(stdin, format, ...), respectively.

Characters can be read from a file a line at a time using the function

char *fgets(char *buf, int max, FILE *fp);

which reads at most max-1 characters from the file pointed to by fp and stores the resulting string in buf. It automatically appends a \0 character to the end of the string. The function returns when it encounters a \n character (i.e., a newline), or reaches the end-of-file, or has read the maximum number of characters. It returns a pointer to buf if successful, and NULL for end-of-file or if there was an error.

Character strings may be written to a file using the function

int fputs(const char *str, FILE *fp);

which returns a non-negative value if successful and EOF if there was an error. Note, the string need not contain a \n character, and fputs() will not append one, so strings may be written to the same line with successive calls.

For reading and writing binary files, a pair of functions are provided that enable objects to be passed to and from files directly without first converting them to a character string. These functions are

size_t fread(void *ptr, size_t size, size_t nobj, FILE *fp);
size_t fwrite(const void *ptr, size_t size, size_t nobj, FILE *fp);

and they permit objects of any type to be read or written, including arrays and structures. For example, if a structure called Astruct were defined, then an array of such structures could be written to file as follows.

struct Astruct mystruct[10];
fwrite(&mystruct, sizeof(Astruct), 10, fp);

 

 

Random Access File Operations

The previous file IO functions progress through a file sequentially. The standard library also provides a means to move back and forth within a file to any specified location. These file positioning functions are

long ftell(FILE *fp);
int fseek(FILE *fp, long offset, int from);
void rewind(FILE *fp);

The first, ftell(), returns the current position in the file stream. For binary files this value is the number of characters preceding the current position. For text files the value is implementation defined. In both cases the value is in a form suitable for the second argument of fseek(), and the value 0L represents the beginning of the file.

The second function, fseek(), sets the file position to a location specified by its second argument. This parameter is an offset, which shifts the file position relative to a given reference location. The reference location is given by the third argument and may be one of three values as defined by the symbolic constants SEEK_SET, SEEK_CUR, and SEEK_END. These specify the beginning of the file, the current file position, and the end of file, respectively. Having shifted the file position via fseek(), a subsequent read or write will proceed from this new position.

For binary files, fseek() may be used to move the file position to any chosen location. For text files, however, the set of valid operations is restricted to the following.

fseek(fp, 0L, SEEK_SET); /* Move to beginning of file. */
fseek(fp, 0L, SEEK_CUR); /* Move to current location (no effect). */
fseek(fp, 0L, SEEK_END); /* Move to end of file. */
fseek(fp, pos, SEEK_SET); /* Move to pos. */

In the last case, the value pos must be a position returned by a previous call to ftell(). Binary files, on the other hand, permit more arbitrary use, such as

fseek(fp, -4L, SEEK_CUR); /* Move back 4 bytes. */

The program below shows an example of ftell() and fseek() to determine the length of a file in bytes. The file itself may be plain text, but it is opened as binary so that ftell() returns the number of characters to the end-of-file.

1 /* Compute the length of a file in bytes. From Snippets (ansiflen.c) */
2 long flength(char *fname)
3 {
4 		long length = −1L;
5 		FILE *fptr;
6
7 		fptr = fopen(fname, "rb");
8 		if (fptr != NULL) {
9 			fseek(fptr, 0L, SEEK END);
10 			length = ftell(fptr);
11 			fclose(fptr);
12 		}
13 		return length;
14 }

The third function, rewind(), returns the position to the beginning of the file. Calling rewind(fp) is equivalent to the statement fseek(fp, 0L, SEEK_SET).

Two other file positioning functions are available in the standard library: fgetpos() and fsetpos(). These perform essentially the same tasks as ftell() and fseek(), respectively, but are able to handle files too large for their positions to be representable by a long integer.

 

 

Command-Shell Redirection

Often programs are executed from a command-interpreter environment (also called a shell). Most operating systems possess such an interpreter. For example, Win32 has a DOS-shell and UNIX-like systems have various similar shell environments such as the C-shell, the Bourne-shell, the Korn-shell, etc. Most shells facilitate redirection of stdin and stdout using the commands < and >, respectively. Redirection is not part of the C language, but an operating system service that supports the C inputoutput model.

1 #include <stdio.h>
2
3 /* Write stdin to stdout */
4 int main(void)
5 {
6 		int c;
7 		while ((c = getchar()) != EOF)
8 			putchar(c);
9 }

Consider the example program above. It simply reads characters from stdin and forwards them to stdout. Normally this means the characters typed at the keyboard are echoed on the screen after the user hits the “enter” key. Assume the program executable is named “repeat”. repeat

type some text 123
type some text 123

However, a file may be substituted for the keyboard by redirection.

repeat <infile.txt
display contents of infile.txt

Alternatively, a file may be substituted for the screen, or for both keyboard and screen as in the following example, which copies the contents of infile.txt to outfile.txt.

repeat <infile.txt >outfile.txt

Further redirection commands are >> and |. The former redirects stdout but, unlike >, appends the redirected output rather than overwriting the existing file contents. The latter is called a “pipe”, and it directs the stdout of one program to the stdin of another. For example,

prog1 | prog2

prog1 executes first and its stdout is accumulated in a temporary buffer and, once the program has terminated, prog2 executes with this set of output as its stdin.

The stderr stream is not redirected, and so will still print messages to the screen even if stdout is redirected.

 

 

Command-Line Arguments
 
The C language provides a mechanism for a program to obtain input parameters from the environment in which it executes. These are termed “command-line arguments” and are passed to the program when it begins execution. Until now we have specified the interface of main() as

int main(void)

which is one of the two interfaces permitted by the ISO C standard. The other consists of two arguments

int main(int argc, char *argv[])

where argc stands for argument count and argv stands for argument vector. Style Note. The two parameters, of course, do not have to be named argc and argv, but this is a long-standing convention and should be upheld to assist code readability.

The value of argc is the number of command-line parameters passed to the program upon execution. The parameter argv is a pointer to an array of character strings, where each string is one of the passed command-line parameters. The length of this array is argc+1 and the end element, argv[argc] is NULL. The value of argc is always at least one, as argv[0] is the name by which the program was invoked. If argc equals one, then there are no command-line arguments after the program name.

Consider the example program below. This program simply takes the set of command-line strings referred to by argv and prints them to stdout. We will assume the program executable is named “echo”.

1 #include <stdio.h>
2
3 /* Print command-line arguments to stdout. */
4 int main (int argc, char *argv[ ])
5 {
6 		int i;
7 		for (i = 1; i < argc; ++i)
8 			printf("%s ", argv[i]);
9 		printf("\n");
10 }

When a program is invoked from a command-line, each white-space separated string of characters including the program name becomes a command-line argument. Thus, typing

echo one two three 123

stores the strings "echo", "one", "two", "three" and "123" in locations referred to by argv[0] to argv[4], respectively, and argc is equal to five. This program then prints all arguments except argv[0]. Note, redirection commands do not appear as command-line arguments, so

echo one >outfile.txt two three 123

will print

one two three 123

in the file named outfile.txt.