Accepting lines of input that are arbitrarily long is not something the C standard library was designed for. However, it can be an immensely useful feature to have. I recently came across this problem while rewriting the file input parts of libbitconvert. Here’s my solution, modeled after the C standard library’s fgets
:
int dynamic_fgets(char** buf, int* size, FILE* file) { char* offset; int old_size; if (!fgets(*buf, *size, file)) { return BCINT_EOF_FOUND; } if ((*buf)[strlen(*buf) - 1] == 'n') { return 0; } do { /* we haven't read the whole line so grow the buffer */ old_size = *size; *size *= 2; *buf = realloc(*buf, *size); if (NULL == *buf) { return BCERR_OUT_OF_MEMORY; } offset = &((*buf)[old_size - 1]); } while ( fgets(offset, old_size + 1, file) && offset[strlen(offset) - 1] != 'n' ); return 0; }
And here is an example of how to use it:
char* input; int input_size = 2; int rc; input = malloc(input_size); if (NULL == input) { return BCERR_OUT_OF_MEMORY; } rc = dynamic_fgets(&input, &input_size, stdin); if (BCERR_OUT_OF_MEMORY == rc) { return rc; } /* use input */ free(input);
To show you how dynamic_fgets
works, I’ll break into down line by line and then describe some of its features:
First we read as much of the line as we can given the initial size of the input buffer. If the input is already at end-of-file, fgets
returns NULL
and we exit.
if (!fgets(*buf, *size, file)) { return BCINT_EOF_FOUND; }
Since we haven’t exited, there is at least some useful information in the input buffer. If the input ends in a new line, then the input butter was big enough to hold the entire line so we are done.
if ((*buf)[strlen(*buf) - 1] == 'n') { return 0; }
At this point, we know that the input buffer is too small to hold the entire line of input so we have to increase its size. I chose to double the size of the input each time it needs to grow.
do { /* we haven't read the whole line so grow the buffer */ old_size = *size; *size *= 2; *buf = realloc(*buf, *size); if (NULL == *buf) { return BCERR_OUT_OF_MEMORY; }
offset
will store the position of the end of the data we have so far, which is where we will start reading the rest of the line. We use old_size - 1
so that we start where the null terminator is.
offset = &((*buf)[old_size - 1]);
Here we read the input line again, reading a total of old_size + 1
bytes, which we can do because we are reusing the null terminator byte from the old buffer plus all of the new space in the buffer, which is the same as old_size
because we doubled the buffer size. We loop, increasing the buffer size, until the entire input line has been read, which is signified by reaching end-of-file or having a newline as the last character of offset
(and, thus, the input buffer).
} while ( fgets(offset, old_size + 1, file) && offset[strlen(offset) - 1] != 'n' );
Finally we return the success code. Note that it’s possible fgets
reached end-of-file before we saw a newline character, but we don’t report it in order to be consistent with the interface of fgets
. A subsequent call to dynamic_fgets
would indicate end-of-file was found.
return 0;
For dynamic_fgets
to work, it must be passed a dynamically-allocated input buffer. The user should check the return code to verify that no memory allocation errors occurred and to see if end-of-file has been reached. This necessarily deviates from the fgets
interface because memory allocation attempts can result in errors.
Because realloc
might change the location of the input buffer, the user must pass a char**
instead of the char*
that is normally passed to fgets
.
The user must also pass an int*
(rather than an int
as in fgets
), which allows dynamic_fgets
to be run multiple times on the same input buffer without freeing and reallocating the buffer since the size is saved between calls to dynamic_fgets
. This causes it to work in a semantically similar way to fgets
, which many people use to read sequential lines into the same buffer (from where they are further processed) to save memory.
Being able to read in arbitrary-length input lines is a very nice feature to have. I hope that people find dynamic_fgets
useful in providing that.
Since it is part of libbitconvert, dynamic_fgets
is licensed under the ISC licence. I expect dynamic_fgets
to reside in libbitconvert’s bitconvert.c
for the foreseeable future.
Comments or questions about dynamic_fgets
would be appreciated.
Hey, I wouldn’t use that buf[strlen(buf) – 1] == ‘n’ construct because it will break if there’s a 0 char in your input (in which case fgets() may not return NULL and strlen(buf) returns 0, so strlen(buf) – 1 is plenty out of bounds).
I’d rather use strchr()
if ((p = strchr(buf, ‘n’)) != NULL) {
*p = 0;
}
Your realloc() call has a memory leak if it fails to return a new memory block, so you should keep the original buffer in a temp variable which you can still free() it if realloc() fails.
How would you use dynamic_fgets in a loop to read multiple lines from a file? I am quite rusty at C, so it’s not jumping out at me. Thanks for your help.
I answered my own question. I should have waited a bit before pulling the trigger on my question 🙂
input = malloc( input_size );
while( dynamic_fgets( &input, &input_size, config ) != BCINT_EOF_FOUND )
{
/* do something with input */
free( input );
input_size = 2;
input = malloc( input_size );
}
free( input );
You don’t need to free and reallocate
input
after each line.dynamic_fgets
is designed so that you can reuse the buffer across multiple invocations, as long as you keep the old value ofinput_size
. This will make your code much faster as expensive allocations are avoided.Also, I would suggest setting
input_size
to something higher than 2 in a production environment. Ideally you would like to avoid reallocations when possible so settinginput_size
to a value slightly higher than your average line length would be best. You can tweak it further depending on the specific memory usage and time constraints of your program.Great, thanks for the tips. I have spent too many recent years in the Perl world where I take things like this for granted 🙂
If I do not free and reallocate after each line, then my last line of the file contains fragments of previous lines. This is especially true when having the “input_size” set to 256, which is much larger than the size of the lines I am reading.
How could I modify your function to read a line from a binary file? I am having issues when I run into a binary sequence that looks like a NULL, so it thinks it has reached the end of the file.