Race conditions
From APIDesign
There is no such thing as 100% CodeCoverage, but that doesn't mean we shouldn't strive to increase BugFixCoverage. The FlowControllingTest test pattern make it possible to even test race conditions.
Test to Simulate Race conditions
While certain problems with multiple threads and their synchronization are hard to anticipate, as deadlocks mentioned earlier, sometimes it is possible and useful to write a test to verify that various problems with parallel execution are correctly handled.
We have faced such problem when asked to write a startup lock for NetBeans. The goal was to solve a situation when a user starts the NetBeans IDE for the second time and warn him that another instance of the program is already running and then exit. This is similar to the behaviour of Mozilla or Open Office. We decided to allocate a socket server and create a file in a well known location with the port number written to it. Then each newly started NetBeans IDE could verify whether a previously running instance is active or not (by reading the port number and trying to communicate with it).
The major problem we had to optimize for was a situation when the user starts more NetBeans IDE processes at once. This can happen by extra clicks on the icon on the desktop or by dragging and dropping more files on the desktop icon of the IDE. Then more processes are started and they [[Race conditions|compete] to lock the user directory. The sequence of one process looks like this:
if (lockFile.exists ()) { // read the port number and connect to it if (alive) { // exit return; } } // otherwise try to create the file yourself lockFile.createNewFile(); DataOutputStream os = new DataOutputStream(new FileOutputStream(lockFile)); SocketServer server = new SocketServer(); int p = server.getLocalPort(); os.writeInt(p); os.close()
The above code can be interrupted at any time by the system. Instead of executing all the code as an atomic operation, OS can suspend the process and the control can be passed to a competing process which races to perform the same actions.
- What happens when one process creates the file, and another tries to read it meanwhile, before a port number is written to it?
- What if there is a file left from a previous (killed) execution?
- What happens when a test for file existence fails, but when trying to create it the file already exists?
All these questions have to be asked when one wants to have really good confidence in the application code.
Simple FlowControllingTest written manually
In order to get the confidence we wanted, we inserted a lot of check points into our implementation of locking so the code became a modified version of the previous snippet:
enterState(10, block); if (lockFile.exists ()) { enterState(11, block); // read the port number and connect to it if (alive) { // exit return; } } // otherwise try to create the file yourself enterState(20, block); lockFile.createNewFile(); DataOutputStream os = new DataOutputStream(new FileOutputStream(lockFile)); SocketServer server = new SocketServer(); enterState(21, block); int p = server.getLocalPort(); enterState(22, block); os.writeInt(p); enterState(23, block); os.close();
The enterState method does nothing in real production environment, but in test it can be instructed to block at a specific check point. So we can write a test which starts two threads and instruct one of them to stop at 22 and then let the second one run and observe how it handles the case when a file already exists, but the port is not yet written in.
This approach worked pretty well and despite the skeptical opinions we heard when we tried to solve this problem, we got about 90% of the behaviour right before we integrated the first version. Yes, there was still more work to do and bugs to be fixed, but because we had really good automated tests for the behaviour we really implemented, our amoeba edge was well stiffened and we had enough confidence that we can fix all outstanding problems.