In my humble opinion, creating Hidden Dependencies might be the worst thing you can do in Software Engineering. Claiming that “Hidden dependencies are bad” may sound like a trivial statement, but yet they are all around. People know they should avoid them, but maybe sometimes they just don’t realize that they are creating a Hidden Dependency. Below is a modified example, based on real-life code I was looking at.
Function Enumeration
Consider the following code. It shows general code that can call functions by enumerated values. Why is useful? For example, it can be a cumbersome way to implement “Polymorphism for the poor” by allowing a data object to express what function is adequate to act on it. Why not use a simple pointer to function? There are examples where the address is not known when the object is being constructed (e.g. an array of objetcts is built offline and are not part of the linkage process, or when one linkage unit (process) makes a call to another (ioctl)). Anyhow, it is valuable for the following example of Hidden Dependencies.
In the code I saw, an enum of functions is defined like this:
typedef enum
{
FUNCTION_TYPE_APPLE = 0,
FUNCTION_TYPE_PEAR,
FUNCTION_TYPE_COCONUT,
FUNCTION_TYPE_ORANGE
}FUNCTION_TYPES;
And the following function-pointer definition:
typedef int (*P_FUNC)(int);
The following array was used as a function Lookup Table:
P_FUNC functions_array[] = {foo1, foo2, foo3, foo4};
This function was used a “gateway function”, allowing general code to call functions by their enumerated number:
int gate_call(FUNCTION_TYPES func_type, int arg)
{
return functions_array[func_type](arg);
}
What is wrong here?
There is a Hidden Dependency between the definition of the enum FUNCTION_TYPES and the initialization of the Lookup Table array functions_array. The enum definition and the array definition does not necessarily reside in the same file, and even if they do, a newby developer may change one without changing the other. Bad things will happen if:
- A new function type is added to FUNCTION_TYPES. If it is added in the beginning, it will skew all the functions leading to a wrong call (e.g. calling gate_call() with FUNCTION_TYPE_APPLE will cause a call to foo2 instead of foo1) and an undefined call when calling the with the last enumerated value (e.g. calling with FUNCTION_TYPE_ORANGE will cause a call outside of the array boundaries). Similar phenomenas will happen if the new type is added anywhere else in the enum, with partial skewing.
- A function type is removed – if it happens to be the last one, we will dodge the bullet since the damage will be just the one redundant array cell initialized with no use (the one initialized to foo4). However, removing from any other location will skew all the following types (similar to bullet #1).
- Changing the order of enumerated types in the enum will cause calls to unintended functions (e.g. imagine some nice guy has decided to refactor the enum and neatly order the types alphabetically, without being aware of the array).
As things happened in reality, the ownership of the code in our team was partitioned differently than in the original team. It was not long before a modification of the enum indeed caused an unintentional break of this code. The break was one of those really bad ones – everything seemed to work, until one day a specific scenario was triggered (by good fortune), and failed.
The core of the problem
Creating two allegedly separate code segments which contains a Hidden Inter-Dependency is the core of the problem here. Code changes all the time, for many reasons. Many times, the different code segments are owned by different people, who make changes in different times. The code described above was an accident waiting to happen, and indeed eventually it did happen.
Fortifications, mitigations, solutions
When you find yourself writing code like the above, what can be done to mitigate or eliminate the risk, assuming you have to keep this structure for some reason? You want the system to help you catch rouge modifications the first time you build. In other words, you want to install provisions that will fail the build in case a violation is commited, or alternatively adapt the code to the new circumstances. If no such provisions exist, try to at least install debug hooks (e.g. asserts) that will catch problems when runing in Debug mode/build.
In the example above, you can “lock” each function to a certain enum member by using the following code, if your compiler is C99 compliant:
P_FUNC functions_array[] = { [FUNCTION_TYPE_APPLE] = foo1,
[FUNCTION_TYPE_PEAR] = foo2,
[FUNCTION_TYPE_COCONUT] = foo3,
[FUNCTION_TYPE_ORANGE] = foo4
};
This prevents “skewing” if the mapping between enum elements and functions, but the Hidden Dependency is not yet removed completely . If someone removes a member from the enum, you are fine because the Lookup Table array initialization will fail (the initialization will use an non-existent enum member) and the problem will be found immediately. However, if anyone adds a member to the enum without adding an initializer to the Lookup Table struct, this opens the door to function calls through uninitialized cells of the Lookup Table array.
To solve that, first consider the following two macro definitions:
#define ARR_NUM_ELEMENTS(arr) (sizeof(arr)/sizeof(arr[0]))
#define C_ASSERT(expression) (typedef int dummy_arr_typedef[(expression)?1:-1])
The first macro simply calculates the number of elements in a given array. The second macro is a well know, and very useful trick to assert expression during build time. Meaning, if the expression is false, the compilation will fail (as opposed to the run, with the “normal” assert directive). It is done by attempting to ‘typedef’ a new type to be an array with negative number of elements, in case the expression is false. Since such an array definition is illegal, the compilation will fail. Note that this is a simplified version of the macro that cannot be used twice in the same compilation unit. I will address the full version in a post dedicated to a group of similar constructs.
Before we use those macros, we have to slightly supplement the enum definition by adding a “max” member to it. As long as it stays last, it will indicate the number of enum members, whatever happens to the rest of them:
typedef enum
{
FUNCTION_TYPE_APPLE = 0,
FUNCTION_TYPE_PEAR,
FUNCTION_TYPE_COCONUT,
FUNCTION_TYPE_ORANGE,
FUNCTION_TYPE_MAX //Always keep last
}FUNCTION_TYPES;
Now, lets fortify the Lookup Table array definition as follows:
P_FUNC functions_array[] = { [FUNCTION_TYPE_APPLE] = foo1,
[FUNCTION_TYPE_PEAR] = foo2,
[FUNCTION_TYPE_COCONUT] = foo3,
[FUNCTION_TYPE_ORANGE] = foo4
};
C_ASSERT(ARR_NUM_ELEMENTS(functions_array) == FUNCTION_TYPE_MAX);
With this addition, if anybody adds a new member to the enum without adapting the Lookup Table array initialization, there C_ASSERT will fail. With all this in place, everything is locked into place such that any change to one part of the equation without modifying the other part, will result is a compilation failure.
Caution
The correctness of the above solution relies on functions_array[] to be declared with unspecified size (empty brackets). This way, the initialization list determines the size of the array. If you try to explicitly define the size to be FUNCTION_TYPE_MAX, it will allow uninitialized “holes” in the array in the case of adding members to the enum without adapting the struct initialization.
Conclusion
It is best to avoid creating dependencies between various entities in our code, as much as possible. When there is no choice, try to make the dependency explicit and constrained. If you end up in a situation in which Hidden Dependencies are hard to avoid, try to install safeguard provisions that will function as gatekeepers. By investing once, you can potentially save many tears later.
2 Comments
1794s0
Greetings! Very useful advice in this particular article! Its the little changes that will make the biggest changes. Thanks for sharing!