dynamic_fgets: Reading long input lines in C

Accepting lines of input that are arbitrarily long is not something the C standard library was designed for. However, it can be an immensely useful feature to have. I recently came across this problem while rewriting the file input parts of libbitconvert. Here’s my solution, modeled after the C standard library’s fgets:

int dynamic_fgets(char** buf, int* size, FILE* file)
{
	char* offset;
	int old_size;

	if (!fgets(*buf, *size, file)) {
		return BCINT_EOF_FOUND;
	}

	if ((*buf)[strlen(*buf) - 1] == 'n') {
		return 0;
	}

	do {
		/* we haven't read the whole line so grow the buffer */
		old_size = *size;
		*size *= 2;
		*buf = realloc(*buf, *size);
		if (NULL == *buf) {
			return BCERR_OUT_OF_MEMORY;
		}
		offset = &((*buf)[old_size - 1]);

	} while ( fgets(offset, old_size + 1, file)
		&& offset[strlen(offset) - 1] != 'n' );

	return 0;
}

And here is an example of how to use it:

char* input;
int input_size = 2;
int rc;

input = malloc(input_size);
if (NULL == input) { return BCERR_OUT_OF_MEMORY; }
rc = dynamic_fgets(&input, &input_size, stdin);
if (BCERR_OUT_OF_MEMORY == rc) { return rc; }
/* use input */
free(input);

To show you how dynamic_fgets works, I’ll break into down line by line and then describe some of its features:

Line by line

First we read as much of the line as we can given the initial size of the input buffer. If the input is already at end-of-file, fgets returns NULL and we exit.

	if (!fgets(*buf, *size, file)) {
		return BCINT_EOF_FOUND;
	}

Since we haven’t exited, there is at least some useful information in the input buffer. If the input ends in a new line, then the input butter was big enough to hold the entire line so we are done.

	if ((*buf)[strlen(*buf) - 1] == 'n') {
		return 0;
	}

At this point, we know that the input buffer is too small to hold the entire line of input so we have to increase its size. I chose to double the size of the input each time it needs to grow.

	do {
		/* we haven't read the whole line so grow the buffer */
		old_size = *size;
		*size *= 2;
		*buf = realloc(*buf, *size);
		if (NULL == *buf) {
			return BCERR_OUT_OF_MEMORY;
		}

offset will store the position of the end of the data we have so far, which is where we will start reading the rest of the line. We use old_size - 1 so that we start where the null terminator is.

		offset = &((*buf)[old_size - 1]);

Here we read the input line again, reading a total of old_size + 1 bytes, which we can do because we are reusing the null terminator byte from the old buffer plus all of the new space in the buffer, which is the same as old_size because we doubled the buffer size. We loop, increasing the buffer size, until the entire input line has been read, which is signified by reaching end-of-file or having a newline as the last character of offset (and, thus, the input buffer).

	} while ( fgets(offset, old_size + 1, file)
		&& offset[strlen(offset) - 1] != 'n' );

Finally we return the success code. Note that it’s possible fgets reached end-of-file before we saw a newline character, but we don’t report it in order to be consistent with the interface of fgets. A subsequent call to dynamic_fgets would indicate end-of-file was found.

	return 0;

Features and caveats

For dynamic_fgets to work, it must be passed a dynamically-allocated input buffer. The user should check the return code to verify that no memory allocation errors occurred and to see if end-of-file has been reached. This necessarily deviates from the fgets interface because memory allocation attempts can result in errors.

Because realloc might change the location of the input buffer, the user must pass a char** instead of the char* that is normally passed to fgets.

The user must also pass an int* (rather than an int as in fgets), which allows dynamic_fgets to be run multiple times on the same input buffer without freeing and reallocating the buffer since the size is saved between calls to dynamic_fgets. This causes it to work in a semantically similar way to fgets, which many people use to read sequential lines into the same buffer (from where they are further processed) to save memory.

Conclusion

Being able to read in arbitrary-length input lines is a very nice feature to have. I hope that people find dynamic_fgets useful in providing that.

Since it is part of libbitconvert, dynamic_fgets is licensed under the ISC licence. I expect dynamic_fgets to reside in libbitconvert’s bitconvert.c for the foreseeable future.

Comments or questions about dynamic_fgets would be appreciated.

7 Responses to “dynamic_fgets: Reading long input lines in C”

Feed for this Entry Trackback Address

mulder
April 4, 2009 at 10:26 am

Hey, I wouldn’t use that buf[strlen(buf) – 1] == ‘n’ construct because it will break if there’s a 0 char in your input (in which case fgets() may not return NULL and strlen(buf) returns 0, so strlen(buf) – 1 is plenty out of bounds).

I’d rather use strchr()

if ((p = strchr(buf, ‘n’)) != NULL) {
*p = 0;
}

Your realloc() call has a memory leak if it fails to return a new memory block, so you should keep the original buffer in a temp variable which you can still free() it if realloc() fails.

Anonymous Coward
August 25, 2009 at 4:26 am

How would you use dynamic_fgets in a loop to read multiple lines from a file? I am quite rusty at C, so it’s not jumping out at me. Thanks for your help.

Anonymous Coward
August 25, 2009 at 4:48 am

I answered my own question. I should have waited a bit before pulling the trigger on my question

input = malloc( input_size );
while( dynamic_fgets( &input, &input_size, config ) != BCINT_EOF_FOUND )
{
/* do something with input */
free( input );
input_size = 2;
input = malloc( input_size );
}
free( input );

ossguy
August 25, 2009 at 2:16 pm

You don’t need to free and reallocate input after each line. dynamic_fgets is designed so that you can reuse the buffer across multiple invocations, as long as you keep the old value of input_size. This will make your code much faster as expensive allocations are avoided.

Also, I would suggest setting input_size to something higher than 2 in a production environment. Ideally you would like to avoid reallocations when possible so setting input_size to a value slightly higher than your average line length would be best. You can tweak it further depending on the specific memory usage and time constraints of your program.

Anonymous Coward
August 25, 2009 at 8:05 pm

Great, thanks for the tips. I have spent too many recent years in the Perl world where I take things like this for granted

Anonymous Coward
September 1, 2009 at 4:20 am

If I do not free and reallocate after each line, then my last line of the file contains fragments of previous lines. This is especially true when having the “input_size” set to 256, which is much larger than the size of the lines I am reading.

Anonymous Coward
September 2, 2009 at 3:15 am

How could I modify your function to read a line from a binary file? I am having issues when I run into a binary sequence that looks like a NULL, so it thinks it has reached the end of the file.

dynamic_fgets: Reading long input lines in C

7 Responses to “dynamic_fgets: Reading long input lines in C”

Leave a Reply

About me

Actionstream