C Programming Skills #2

[Note: This was originally written for "FYI", the internal newsletter for the worldwide software engineering departments of Dendrite International]

Here again we have an example of a simple function, made much more complicated in an attempt to be "generic" by having it "make no assumptions" about the input. So what's wrong with it ?

#include <ctype.h>
/*
* Copy a database time into a target string 
* (i.e. "16.02.00.000000" -> "16:02")
*/
static void	CopyTime(char *pTarget, char *pSource)
{
	int	I;
	char	Tmp[6];  	Tmp[sizeof(Tmp) - 1] = '\0';
	for (I = 0; I < sizeof(Tmp) - 1; I++)
	{
		if (isdigit(pSource[I]))
			Tmp[I] = pSource[I];
		else
			Tmp[I] = ':';
	}
	strcat(pTarget, Tmp);  
}

This function is intended to take an input string in the form "HH.MM.SS.etc", but it goes to some length as to not assume the string is in that form. But it makes that assumption anyway -- it's just a little more hidden in the code. Try it with any form of bad input, and compare its results to those from this simplified version:

static char *CopyTime(char *pTarget, const char *pSource)
{
	memcpy(pTarget, pSource,5);
	pTarget[2] = ':';
	pTarget[5] = END_OF_STRING;
	return(pTarget);
}

Input	Output Orig. Method	Output New Method	Comparison
16.20.33.12345	16:20	16:20	Both OK
09H45M59S999T	09:45	09:45	Both OK
4.20.33.12345	4:20:	4.:0.	Both Bad
16.2.33.5555	16:2:	16:2.	Both Bad
GO4IT	::4::	GO:IT	Both Bad
HAPPYHAPPYJOYJOY	:::::	HA:PY	Both Bad

As you can see, the "robust" original version is no better at producing a correct string than is my simplified version, which will run faster, require less code space, and is more obvious in what it's doing. So what's the point of doing the long version ?

Of course, you might say there's not much virtue in being "equally bad". So, if we're going to doing extra work on the assumption that the input may be bad, why not do something useful. How about if we verified the input, and "corrected " it -- make sure we have a number before the punctuation, a number after the punctuation, and on output they both have two digits. If we can't, how about returning an error condition, so that we know we've got a problem.

As a side note, I've made two minor changes to the programming interface to the function. In my new version, the function declares the source string as "constant", and returns a character pointer. I've always felt that since we have the ability to return a value from a function, we might as well use it as much as we can. Here, the return value is the destination string, as per the C library standard. This allows the programmer to do such wild things as:

char buf[100];
strcat(strcat(strcat(strcpy(buf,FirstName)," "),MiddleInit),". "), LastName);

if you're into that kind of thing. An alternate use for the return value would be to offer a success or failure code.

A common misconception is that if a function parameter is declared as being a const char pointer, the actual value passed has to point to a const character array. In fact, the reverse is true. The "const" only says that this function won't change the contents where it's pointing, and that a constant character array can be used. However, if the "const" is omitted from the declaration, the compiler will have to assume that the function will change the contents, and therefore, it will have to reject any attempt to pass a const object. With the "const" is the declaration, const arrays can be used. The use of const arrays can be very handy, as they can be placed into read-only blocks of the program. This is helpful in Windows, where it allows any number of read-only segments, but prefers each program use only one 64K read-write segment. Also, on the Newton, where programs are distributed on FlashMemory cards, and potentially in ROM, the ability to position as much as possible in read-only memory could be important.

One other issue to deal with in the original listing is the array indexing. Treating a pointer as an array is very useful for random access in a table. However, here, all we need is linear access to the data, and for that, pointers are more efficient. By using array indexing, the compiler must calculate an offset into the buffer -- by multiplying if this is anything other than a character array -- and add it to the start of the buffer. By stepping through an array with a pointer, the compiler can use the register increment instruction-- usually the fastest instruction on the machine.

Putting together everything given above, we get the following. It's now larger & slower than the original (but not as bad as it could have been), but is genuinely more robust and bullet-proof than the first version.

static char *CopyTime(char *pTarget, const char *pSource)
{
	int	hours;
	int	mins;
	char	*pTmp;  
	hours = atoi(pSource); /* Everything before 1st nondigit */
	while (isdigit(pTmp))
	{
		if (*pTmp == END_OF_STRING)
			return(NULL);	/* all digits == ERROR */
		pTmp++;
	}

	/* Check here for specific character after digits ?*/
	pTmp++;

	if (isdigit(pTmp) == FALSE)
		return(NULL);	/* error is digit doesn't follow ':' */

	mins = atoi(pTmp);
	sprintf(pTarget, "%02d:%02d",hours, min);

	return(pTarget);
}

Input	Output Orig. Method	Output New Method	Comparison
16.20.33.12345	16:20	16:20	Both OK
09H45M59S999T	09:45	09:45	Both OK
4.20.33.12345	4:20:	04:20	NOW OK!
16.2.33.5555	16:2:	16:02	NOW OK!
GO4IT	::4::	NULL	Error Noted!
HAPPYHAPPYJOYJOY	:::::	NULL	Error Noted!

But, we're not quite done yet. Let us consider the stability of our input. If we are going to be getting these initial strings from user input, then the above routine would probably be the best, because we could get bad input at any time. But what if the input string came from another routine, which gets it values from the system clock. In this case, the input will either be good; or it will be bad, in which case, we'll fix the other routine and then it'll be good. However, the important point is, once it's "fixed", our input will be stable and reliable. So, what we really want is strong error checking while we are testing the program, and then, when everything is working, a fast and efficient version for when we release it to our customers. This reveals the beauty of the "assert" macro, of the standard C library.

"Assert()" tests for a condition, and if it's not met, lets you know there's trouble in no uncertain terms -- it gives you a message and kicks you out of the program. It's not very pretty. But the truly wondrous aspect of it, is that if you include the #define NDEBUG ("no debug"), all of the assert()s are magically removed from your program. (Actually, the preprocessor replaces them with "(void) NULL;" which the compiler then removes). Which means we can now use the following:

#include <assert.h>
static char *CopyTime(char *pTarget, const char *pSource)
{
	assert(pSource[0] >= '0' && pSource[0] <= '2');
	assert(pSource[1] >= '0' && pSource[1] <= '9');
	assert(pSource[2] == '.');
	assert(pSource[3] >= '0' && pSource[3] <= '5');
	assert(pSource[4] >= '0' && pSource[4] <= '9');

	memcpy(pTarget, pSource,5);

	pTarget[2] = ':';
	pTarget[5] = END_OF_STRING;

	return(pTarget);
}

Now, if at any point during testing, some part of the input string is bad, we'll get a message similar to the following:

Assertion failed: pSource[2] == '.', file testi.c, line 8
Abnormal program termination

But when we deliver it to a customer, all that runs is the fast short version.