James's Weblog

Comparison of output parameter behaviors

August 22, 2004 at 1:17 pm (PT) in Programming

Yesterday I wrote about why output parameters should be carefully documented. If an error occurs, the output values should be documented as:

being left in an undefined state
being set to a known, bogus value (e.g. NULL, 0, −1, 0xFFFFFFFF, …)
being left untouched

Which should you use?

Undefined state. Specifying that the output value is in undefined is certainly the easiest, laziest approach. It requires no additional effort for the function author, and if callers make assumptions about the output state, they take all the blame. Unfortunately, the lazy approach encourages future laziness. Especially if the function allocates resources, it becomes all too easy for a sloppy maintenance programmer to add an early exit-point and to create a resource leak.

Specifying that the output value be in a defined state can make a maintenance programmer more mindful. The function contract is not solely to benefit programmers writing callers; it also acts as a guide to anyone modifying the callee.

So if the output value is put in a defined state, what should that state be?

Known, bogus values. The primary advantage of using bogus values is convenience to the callers. Callers can check the value of the output argument instead of checking the returned error code; functions can use assertions to guard against bogus values as a mechanism for simple sanity checking.

Code that uses bogus values typically must set them at the exit-point for every error case:

/** foo -- Some function.
  *
  * PARAMETERS:
  *   someInput       : Some input value.
  *   OUT someOutputP : On success, set to some output value.
  *                     On error, set to BOGUS_VALUE.
  *
  * RETURNS:
  *   Some error code.
  */
err_t foo(int someInput, data_t* someOutputP)
{
    err_t err = ERR_NONE;


    err = initializeAndAcquireResources(someOutputP);


    if (err != ERR_NONE)
    {
        *someOutputP = BOGUS_VALUE;
        return err;
    }


    err = bar();


    if (err != ERR_NONE)
    {
        freeResources(someOutputP);
        *someOutputP = BOGUS_VALUE;
        return err;
    }


    /* ... et cetera ... */


    return ERR_NONE;
}

or sometimes it can be simplified slightly by setting the bogus value at the very beginning:

err_t foo(int someInput, data_t* someOutputP)
{
    err_t err = ERR_NONE;


    *someOutputP = BOGUS_VALUE;


    err = initializeAndAcquireResources(someOutputP);


    if (err != ERR_NONE)
    {
        return err;
    }


    err = bar();


    if (err != ERR_NONE)
    {
        freeResources(someOutputP);
        *someOutputP = BOGUS_VALUE;
        return err;
    }


    /* ... et cetera ... */


    return ERR_NONE;
}

or people who understand that gotos aren’t always evil will minimize the number of exit-points:

err_t foo(int someInput, data_t* someOutputP)
{
    err_t err = ERR_NONE;


    *someOutputP = BOGUS_VALUE;


    err = initializeAndAcquireResources(someOutputP);


    if (err != ERR_NONE)
    {
        goto exit;
    }


    err = bar();


    if (err != ERR_NONE)
    {
        goto exit;
    }


    /* ... et cetera ... */


exit:
    if (err != ERR_NONE)
    {
        freeResources(someOutputP);
        *someOutputP = BOGUS_VALUE;
    }
    return err;
}

The first version is hard to maintain. Anyone modifying the code to add another error-check must remember to set *someOutputP = BOGUS_VALUE. (The function specifies this behavior for someOutputP, so a maintenance programmer should be more mindful, but this still isn’t a guarantee against carelessness.)

The second version is slightly better, but there still is the problem that some exit-points may require setting (and freeing of) someOutputP. Worse, anyone maintaining the code must keep track of which exit-points require such actions and which don’t, and re-arranging code within the function becomes error-prone.

The third version is best but is still susceptible to a maintainer breaking the contract (and possibly leaking resources) by carelessly using an early return instead of using a goto.

Using bogus values has some other, general issues:

Exactly what is BOGUS_VALUE? For pointers, this can be NULL, but what about for other data types? Is a different bogus value defined for each abstract data type? This requires some effort and may be hard to keep consistent and intuitive.
Defining a bogus value requires shrinking the range of legal values. Treating 0xFFFFFFFF as bogus? Make sure you never need try to handle input values larger than 0xFFFFFFFE. Usually it’s not a big deal, but it’s yet another thing to watch out for.

Left untouched. Leaving output values untouched requires some additional legwork for the callee but is easier to maintain and is harder to break carelessly. This behavior also follows a transactional model where if the function fails, no state has changed.

The typical implementation for this approach is:

/** foo -- Some function.
  *
  * PARAMETERS:
  *   someInput       : Some input value.
  *   OUT someOutputP : On success, set to some output value.
  *                     On error, left untouched.
  *
  * RETURNS:
  *   Some error code.
  */
err_t foo(int someInput, data_t* someOutputP)
{
    err_t err = ERR_NONE;
    bool acquired = false;
    data_t someOutput;


    err = initializeAndAcquireResources(&someOutput);


    if (err != ERR_NONE)
    {
        goto exit;
    }


    err = bar();


    if (err != ERR_NONE)
    {
        goto exit;
    }
    acquired = true;


    /* ... et cetera ... */


exit:
    if (err != ERR_NONE)
    {
        if (acquired)
        {
            freeResources(&someOutput);
        }
    }
    else
    {
        /* Everything succeeded, so commit the value. */
        *someOutputP = someOutput;
    }
    return err;
}

It’s more work to write, but it’s harder for a maintainer to screw up; accidentally adding an early exit-point might cause a resource leak, but it won’t break the contract.

Additionally, leaving output values untouched allows callers to make constructs such as:

data_t data;
initialize(&data);
if (foo(&data) != ERR_NONE)
{
    /* Try a different method. */
    if (baz(&data) != ERR_NONE)
    {
        /* Okay, now we can give up and signal an error. */
    }
}

in which case it’s advantageous if foo is guaranteed not to mutate its argument in error cases.

Summary.

	Pros	Cons
Undefined	By definition, impossible to break the contract.	Encourages sloppiness.
Bogus value	Convenient to callers.	Doesn’t make sense for all data types. Sloppy maintainers can break the contract.
Untouched	Hard to break the contract.	More work to implement.

For functions that output pointers to allocated memory, using a bogus value of NULL usually makes sense. In general, however, I prefer leaving output arguments untouched.

No Comments Yet »

RSS feed for comments on this post.

Name:
Email:	(will never be displayed)
Website:

Comparison of output parameter behaviors

No Comments Yet »

Leave a comment