JaroslavTulach: /* Deadlock Test */ - 2023-06-02 14:11:03

Deadlock Test

←Older revision Revision as of 14:11, 2 June 2023
Line 3: Line 3:
==Deadlock Test==
==Deadlock Test==
-
Fighting with deadlocks is a sad destiny of any multithreaded application. The problem field has been under extensive research because it causes huge problems for every writer of an operating system. Most of the applications are not as complex as operating systems, but as soon as you allow foreign code to run in your application, you basically have to fight with the same set of problems. In spite of the huge research efforts, there was no simple answer / solution found. We know that there are four necessary and sufficient [[deadlock conditions]] for a deadlock to be created: # '''Mutual exclusion condition''' - there has to be a resource (lock, execution queue, etc.) that can be owned by just one thread # '''Non-preemptive scheduling condition''' - it is not possible to take away or release a resource already assigned by anyone else than its owner # '''Hold and wait condition''' - a thread can wait for a resource indefinitely and can hold it indefinitely # '''Resources can be acquired incrementally''' - one can ask for a new resource (lock, execution queue), while already holding another one But we do not know how the code that prevents at least one condition to appear shall look like and definitely we do not know how to do a static analysis over source code to check whether a deadlock can or cannot appear. The basic and in fact very promising advice for a programmer in a language with threads and locks like Java has is: ''do not hold any lock'' while calling foreign code. By following this rule one eliminates the ''fourth'' condition and as all four must be satisfied to create a deadlock, we may believe we found the ultimate solution to deadlocks. But in fact, it is sometimes very hard to satisfy such restriction. Can the following code deadlock?
+
Fighting with deadlocks is a sad destiny of any multithreaded application. The problem field has been under extensive research because it causes huge problems for every writer of an operating system. Most of the applications are not as complex as operating systems, but as soon as you allow foreign code to run in your application, you basically have to fight with the same set of problems. In spite of the huge research efforts, there was no simple answer / solution found. We know that there are four necessary and sufficient [[deadlock conditions]] for a deadlock to be created:
 +
* '''Mutual exclusion condition''' - there has to be a resource (lock, execution queue, etc.) that can be owned by just one thread
 +
* '''Non-preemptive scheduling condition''' - it is not possible to take away or release a resource already assigned by anyone else than its owner
 +
* '''Hold and wait condition''' - a thread can wait for a resource indefinitely and can hold it indefinitely
 +
* '''Resources can be acquired incrementally''' - one can ask for a new resource (lock, execution queue), while already holding another one
 +
 
 +
But we do not know how the code that prevents at least one condition to appear shall look like and definitely we do not know how to do a static analysis over source code to check whether a deadlock can or cannot appear. The basic and in fact very promising advice for a programmer in a language with threads and locks like Java has is: ''do not hold any lock'' while calling foreign code. By following this rule one eliminates the ''fourth'' condition and as all four must be satisfied to create a deadlock, we may believe we found the ultimate solution to deadlocks. But in fact, it is sometimes very hard to satisfy such restriction. Can the following code deadlock?
<source lang="java">
<source lang="java">

JaroslavTulach: /* Advanced usage of Logging */ - 2022-03-26 05:52:19

Advanced usage of Logging

←Older revision Revision as of 05:52, 26 March 2022
Line 55: Line 55:
==Advanced usage of Logging==
==Advanced usage of Logging==
-
The NetBeans project started to introduce tests that collect logged output slowly and only in the most exposed places, where the failures were often due to overly complicated usage of multiple threads. However one should remember that every application written in Java is non-deterministic, even if it consciously uses just one thread. There is always the [[Garbage Collect]]or that starts in random moments depending on conditions external to the tested program and also when dealing with visual components, there is at least the AWT event dispatch thread, which can also receive events in not that strictly defined order. As a result it turned out that the logging support is pretty useful in almost any test. Because of that NetBeans decided to support it in its NbJUnit extensions. The simplest add-on is to support collecting of messages logged during run of a test. To get this, one can just override the logLevel method in any subclass of NbTestCase. Let's look at following example:
+
The NetBeans project started to introduce tests that collect logged output slowly and only in the most exposed places, where the failures were often due to overly complicated usage of multiple threads. However one should remember that every application written in Java is non-deterministic, even if it consciously uses just one thread. There is always the [[Garbage Collection]] that starts in random moments depending on conditions external to the tested program and also when dealing with visual components, there is at least the AWT event dispatch thread, which can also receive events in not that strictly defined order. As a result it turned out that the logging support is pretty useful in almost any test. Because of that NetBeans decided to support it in its NbJUnit extensions. The simplest add-on is to support collecting of messages logged during run of a test. To get this, one can just override the logLevel method in any subclass of NbTestCase. Let's look at following example:
<source lang="java">
<source lang="java">

JaroslavTulach: /* Advanced usage of Logging */ - 2022-03-26 05:51:53

Advanced usage of Logging

←Older revision Revision as of 05:51, 26 March 2022
Line 55: Line 55:
==Advanced usage of Logging==
==Advanced usage of Logging==
-
The NetBeans project started to introduce tests that collect logged output slowly and only in the most exposed places, where the failures were often due to overly complicated usage of multiple threads. However one should remember that every application written in Java is non-deterministic, even if it consciously uses just one thread. There is always the [[Garbage Collector]] that starts in random moments depending on conditions external to the tested program and also when dealing with visual components, there is at least the AWT event dispatch thread, which can also receive events in not that strictly defined order. As a result it turned out that the logging support is pretty useful in almost any test. Because of that NetBeans decided to support it in its NbJUnit extensions. The simplest add-on is to support collecting of messages logged during run of a test. To get this, one can just override the logLevel method in any subclass of NbTestCase. Let's look at following example:
+
The NetBeans project started to introduce tests that collect logged output slowly and only in the most exposed places, where the failures were often due to overly complicated usage of multiple threads. However one should remember that every application written in Java is non-deterministic, even if it consciously uses just one thread. There is always the [[Garbage Collect]]or that starts in random moments depending on conditions external to the tested program and also when dealing with visual components, there is at least the AWT event dispatch thread, which can also receive events in not that strictly defined order. As a result it turned out that the logging support is pretty useful in almost any test. Because of that NetBeans decided to support it in its NbJUnit extensions. The simplest add-on is to support collecting of messages logged during run of a test. To get this, one can just override the logLevel method in any subclass of NbTestCase. Let's look at following example:
<source lang="java">
<source lang="java">

JaroslavTulach at 05:51, 26 March 2022 - 2022-03-26 05:51:37

←Older revision Revision as of 05:51, 26 March 2022
Line 55: Line 55:
==Advanced usage of Logging==
==Advanced usage of Logging==
-
The NetBeans project started to introduce tests that collect logged output slowly and only in the most exposed places, where the failures were often due to overly complicated usage of multiple threads. However one should remember that every application written in Java is non-deterministic, even if it consciously uses just one thread. There is always the garbage collector that starts in random moments depending on conditions external to the tested program and also when dealing with visual components, there is at least the AWT event dispatch thread, which can also receive events in not that strictly defined order. As a result it turned out that the logging support is pretty useful in almost any test. Because of that NetBeans decided to support it in its NbJUnit extensions. The simplest add-on is to support collecting of messages logged during run of a test. To get this, one can just override the logLevel method in any subclass of NbTestCase. Let's look at following example:
+
The NetBeans project started to introduce tests that collect logged output slowly and only in the most exposed places, where the failures were often due to overly complicated usage of multiple threads. However one should remember that every application written in Java is non-deterministic, even if it consciously uses just one thread. There is always the [[Garbage Collector]] that starts in random moments depending on conditions external to the tested program and also when dealing with visual components, there is at least the AWT event dispatch thread, which can also receive events in not that strictly defined order. As a result it turned out that the logging support is pretty useful in almost any test. Because of that NetBeans decided to support it in its NbJUnit extensions. The simplest add-on is to support collecting of messages logged during run of a test. To get this, one can just override the logLevel method in any subclass of NbTestCase. Let's look at following example:
<source lang="java">
<source lang="java">

JaroslavTulach: /* Advanced usage of Logging */ - 2020-06-18 08:39:19

Advanced usage of Logging

←Older revision Revision as of 08:39, 18 June 2020
Line 75: Line 75:
</source>
</source>
-
It says that logging of all message levels shall be collected and as such if any of the test methods fails, the reported exception is going to contain output of all messages logged during execution of the program in format like: [name.of.the.logger] THREAD: threadName MSG: The message Of course the text in the exception message is truncated, to be of reasonable length, but the full log report is also available. In the NbTestCase.getWorkDir() which would usually be /tmp/tests/testLogCollectingTest/testMethodOne is a file called testMethodOne.log which contains last 1Mb of logged output. That should be enough for cases when one needs deeper analysis of some complex logging code, as most of the recently logged information shall be there. It shall also be mentioned that if one is using [http://xtest.netbeans.org xtest] to execute the tests, the generated HTML report includes the copy of the workdir including the output log file. Sometimes the logged messages can get pretty long and especially in case of random failures it may not be completely easy to analyse what is going on in the test. The best strategy to fight with such situation we have discoveder so far, is to add fail("Ok"); line into the place in the test where the execution randomly fails. This is going to generate a log file as well and the two log files - the one from real failure - and the one from the correct run can then be compared and diffed against each other. Of course it is wise to replace all the @543hac5f messages in the output with something more neutral, as each run is going to have these memory locations different. After eliminating this difference, it is generally possible to get a reasonable diff and understand what is the difference between the those two runs and what can be the root cause of the failure. There maybe two possible reasons why such an understanding to the differences between log files might not be possible. The first is that there may not be enough logs. The information recorded in the log file is just to coarse to provide a meaningful information of what has been happening during the test run. Indeed, the fix is to insert more logging messages into the test code. Using the NbTestCase.logLevel method one can add more logging on various levels like Level.FINE, Level.FINER and Level.FINEST and in different test cases enable different test messages. In case of very suspicious random failure which can potentially influence a critical piece of code, it may even make sense to log infomarmation about each line of the code that gets executed, possibly with arguments send to various methods and state of local variables. That shall generate enough information to allow detailed understanding of the reasons of random failures. On the other hand, there is a hidden catch: The more logging is put into the application, the more is its behaviour going to be different compared to the execution of the application without the logging enabled. Every call to Logger.log is going to add additional delays, formating of recorded messages is going to take a while, etc., etc. That is why a failure that might occur in one third of cases, may nearly disappear, due to all the logging code around. Indeed such a failure gets harder and harder to repeat and thus evaluate. One can nearly give up on trying to emulate it in debugger, or in the worse case, one cannot even face such failure on own computer. However from time to time the failure appears on someones machine. In NetBeans we have farm of testing machines running tests every day in various configurations, and this really helps to track such rare errors down. Althrough we may not even face the bug on the developer's workstations, from time to time we get a report from the testing infrastructure with full logging information and we can start to hunt the bug down. The other side of the problem is that there is already too much logging messages and it is hard to get it right from where they come from. For example if the test executes a series of steps which result in repeated calls to the same place of the application code that of course prints the same log messages, one can easy get lost. This is especially hard if the applications does some computation in a loop repeating the same or at least similar messages on and on and on. In such case the really helpful advice is to put logging messages into the testing code itself. Those messages are the anchor that can lead the base orientation in the log output. When comparing the differences between the successful and unsuccessful run, one can first of all locate the test messages and compare the difference of application messages inserted among them. Basically the best advice to fight with random failures is to do a lot of logging, capture it in tests and print enough of logging messages in the application and also in the test code itself. Then the likehood of understanding the root cause of the failure increases dramatically. A bit different direction of logging is testing that something really has been logged. For example when a new method in one's API that shall for some reason be overriden by subclasses, it might be desirable to let already existing classes compiled against previous version of the class to know about it during runtime: protected boolean overideMePlease() { Logger.getLogger("my.logger").warning("subclasses are supposed to override overideMePlease() method!"); // some default return true; } Of course everyone who is dead set on testing wants to be sure that the warning really get's printed in the right situations. To do that one can register is own logger or in NetBeans prior to 5.0 its own ErrorManager (like in the [http://www.netbeans.org/source/browse/openide/test/unit/src/org/openide/util/actions/AsynchronousTest.java?rev=1.1&content-type=text/x-cvsweb-markup AsynchronousTest.java]) and makes sure that its warning method is called. However as this was way too common pattern in NetBeans the there is a special utility method in '''NbJUnit''' that will handle everything itself. So one can just do: CharSequence log = Log.enable("my.logger", Level.ALL); // do the necessary actions if (log.toString().indexOf("warned") == -1) { fail("Warning shall be printed: " + log); } This is just a little utility method, but it helps to easily access and analyze what gets logged from inside the test case. Together with the rest of the support in org.netbeans.junit.NbTestCase and org.netbeans.junit.Log classes this should provide good enough support for fighting with the unwanted shaking of the amoeba's edge using logging.
+
It says that logging of all message levels shall be collected and as such if any of the test methods fails, the reported exception is going to contain output of all messages logged during execution of the program in format like: [name.of.the.logger] THREAD: threadName MSG: The message Of course the text in the exception message is truncated, to be of reasonable length, but the full log report is also available. In the NbTestCase.getWorkDir() which would usually be /tmp/tests/testLogCollectingTest/testMethodOne is a file called testMethodOne.log which contains last 1Mb of logged output. That should be enough for cases when one needs deeper analysis of some complex logging code, as most of the recently logged information shall be there. It shall also be mentioned that if one is using [http://xtest.netbeans.org xtest] to execute the tests, the generated HTML report includes the copy of the workdir including the output log file. Sometimes the logged messages can get pretty long and especially in case of random failures it may not be completely easy to analyse what is going on in the test. The best strategy to fight with such situation we have discoveder so far, is to add fail("Ok"); line into the place in the test where the execution randomly fails. This is going to generate a log file as well and the two log files - the one from real failure - and the one from the correct run can then be compared and diffed against each other. Of course it is wise to replace all the @543hac5f messages in the output with something more neutral, as each run is going to have these memory locations different. After eliminating this difference, it is generally possible to get a reasonable diff and understand what is the difference between the those two runs and what can be the root cause of the failure. There maybe two possible reasons why such an understanding to the differences between log files might not be possible. The first is that there may not be enough logs. The information recorded in the log file is just to coarse to provide a meaningful information of what has been happening during the test run. Indeed, the fix is to insert more logging messages into the test code. Using the NbTestCase.logLevel method one can add more logging on various levels like Level.FINE, Level.FINER and Level.FINEST and in different test cases enable different test messages. In case of very suspicious random failure which can potentially influence a critical piece of code, it may even make sense to log infomarmation about each line of the code that gets executed, possibly with arguments send to various methods and state of local variables. That shall generate enough information to allow detailed understanding of the reasons of random failures. On the other hand, there is a hidden catch: The more logging is put into the application, the more is its behaviour going to be different compared to the execution of the application without the logging enabled. Every call to Logger.log is going to add additional delays, formating of recorded messages is going to take a while, etc., etc. That is why a failure that might occur in one third of cases, may nearly disappear, due to all the logging code around. Indeed such a failure gets harder and harder to repeat and thus evaluate. One can nearly give up on trying to emulate it in debugger, or in the worse case, one cannot even face such failure on own computer. However from time to time the failure appears on someones machine. In NetBeans we have farm of testing machines running tests every day in various configurations, and this really helps to track such rare errors down. Althrough we may not even face the bug on the developer's workstations, from time to time we get a report from the testing infrastructure with full logging information and we can start to hunt the bug down. The other side of the problem is that there is already too much logging messages and it is hard to get it right from where they come from. For example if the test executes a series of steps which result in repeated calls to the same place of the application code that of course prints the same log messages, one can easy get lost. This is especially hard if the applications does some computation in a loop repeating the same or at least similar messages on and on and on. In such case the really helpful advice is to put logging messages into the testing code itself. Those messages are the anchor that can lead the base orientation in the log output. When comparing the differences between the successful and unsuccessful run, one can first of all locate the test messages and compare the difference of application messages inserted among them. Basically the best advice to fight with random failures is to do a lot of logging, capture it in tests and print enough of logging messages in the application and also in the test code itself. Then the likehood of understanding the root cause of the failure increases dramatically. A bit different direction of logging is testing that something really has been logged. For example when a new method in one's API that shall for some reason be overriden by subclasses, it might be desirable to let already existing classes compiled against previous version of the class to know about it during runtime:
 +
 
 +
<source lang="java">
 +
protected boolean overideMePlease() {
 +
Logger.getLogger("my.logger").warning("subclasses are supposed to override overideMePlease() method!");
 +
// some default
 +
return true;
 +
}
 +
</source>
 +
 
 +
Of course everyone who is dead set on testing wants to be sure that the warning really get's printed in the right situations. To do that one can register is own logger or in NetBeans prior to 5.0 its own '''ErrorManager''' (like in the [http://www.netbeans.org/source/browse/openide/test/unit/src/org/openide/util/actions/AsynchronousTest.java?rev=1.1&content-type=text/x-cvsweb-markup AsynchronousTest.java]) and makes sure that its warning method is called. However as this was way too common pattern in NetBeans the there is a special utility method in '''NbJUnit''' that will handle everything itself. So one can just do:
 +
 
 +
<source lang="java">
 +
CharSequence log = Log.enable("my.logger", Level.ALL); // do the necessary actions
 +
if (log.toString().indexOf("warned") == -1) {
 +
fail("Warning shall be printed: " + log);
 +
}
 +
</source>
 +
 
 +
This is just a little utility method, but it helps to easily access and analyze what gets logged from inside the test case. Together with the rest of the support in org.netbeans.junit.NbTestCase and org.netbeans.junit.Log classes this should provide good enough support for fighting with the unwanted shaking of the amoeba's edge using logging.
==Execution Flow Control using Logging==
==Execution Flow Control using Logging==

JaroslavTulach: /* Execution Flow Control using Logging */ - 2020-06-18 08:33:43

Execution Flow Control using Logging

←Older revision Revision as of 08:33, 18 June 2020
Line 255: Line 255:
<source lang="java">
<source lang="java">
-
public class ParaelTest extends NbTestCase { public ParaelTest(String testName) { super(testName); } protected Level logLevel() { return Level.WARNING; } public void testMain() throws Exception { org.netbeans.junit.Log.controlFlow(Logger.global, null, "THREAD: 1st MSG: cnt: 0" + "THREAD: 2nd MSG: .*0" + "THREAD: 1st MSG: ...: 1" + "THREAD: 2nd MSG: cnt: 1" + "THREAD: 1st MSG: cnt: 2" + "THREAD: 2nd MSG: cnt: 2" + "THREAD: 1st MSG: cnt: 3" + "THREAD: 2nd MSG: cnt: 3" + "THREAD: 1st MSG: cnt: 4" + "THREAD: 2nd MSG: cnt: 4" + "THREAD: 1st MSG: cnt: 5" + "THREAD: 2nd MSG: cnt: 5" + "THREAD: 1st MSG: cnt: 6" + "THREAD: 2nd MSG: cnt: 6" + "THREAD: 1st MSG: cnt: 7" + "THREAD: 2nd MSG: cnt: 7" + "THREAD: 1st MSG: cnt: 8" + "THREAD: 2nd MSG: cnt: 8" + "THREAD: 1st MSG: cnt: 9" + "THREAD: 2nd MSG: cnt: 9", 500 ); Parael.main(null); fail("Ok, just print the logged output"); } }
+
public class ParaelTest extends NbTestCase {
 +
public ParaelTest(String testName) {
 +
super(testName);
 +
}
 +
 
 +
protected Level logLevel() {
 +
return Level.WARNING;
 +
}
 +
 
 +
public void testMain() throws Exception {
 +
org.netbeans.junit.Log.controlFlow(Logger.global, null,
 +
"THREAD: 1st MSG: cnt: 0" +
 +
"THREAD: 2nd MSG: .*0" + "THREAD: 1st MSG: ...: 1" +
 +
"THREAD: 2nd MSG: cnt: 1" +
 +
"THREAD: 1st MSG: cnt: 2" +
 +
"THREAD: 2nd MSG: cnt: 2" +
 +
"THREAD: 1st MSG: cnt: 3" +
 +
"THREAD: 2nd MSG: cnt: 3" +
 +
"THREAD: 1st MSG: cnt: 4" +
 +
"THREAD: 2nd MSG: cnt: 4" +
 +
"THREAD: 1st MSG: cnt: 5" +
 +
"THREAD: 2nd MSG: cnt: 5" +
 +
"THREAD: 1st MSG: cnt: 6" +
 +
"THREAD: 2nd MSG: cnt: 6" +
 +
"THREAD: 1st MSG: cnt: 7" +
 +
"THREAD: 2nd MSG: cnt: 7" +
 +
"THREAD: 1st MSG: cnt: 8" +
 +
"THREAD: 2nd MSG: cnt: 8" +
 +
"THREAD: 1st MSG: cnt: 9" +
 +
"THREAD: 2nd MSG: cnt: 9", 500
 +
);
 +
Parael.main(null);
 +
fail("Ok, just print the logged output");
 +
}
 +
}
</source>
</source>

JaroslavTulach: /* Testing Race Conditions */ - 2018-10-20 05:14:11

Testing Race Conditions

←Older revision Revision as of 05:14, 20 October 2018
Line 47: Line 47:
==Testing Race Conditions==
==Testing Race Conditions==
-
While certain problems with multiple threads and their synchronization are hard to anticipate, as deadlocks mentioned earlier, sometimes it is possible and useful to write a test to verify that various problems with parallel execution are correctly handled. We have faced such problem when we were asked to write a [http://openide.netbeans.org/proposals/arch/cli.html startup lock] for [http://www.netbeans.org NetBeans]. The goal was to solve the situation when a user starts the [http://www.netbeans.org NetBeans] IDE for the second time and warn him that another instance of the program is already running and then exit. This is similar to the behaviour of Mozilla or Open Office. We decided to allocate a socket server and create a file in a well known location with the port number written to it. Then each newly started [http://www.netbeans.org NetBeans] IDE could verify whether a previously running instance is active or not (by reading the port number and trying to communicate with it). The major problem we had to optimize for was to solve situation when the user starts more [http://www.netbeans.org NetBeans] IDEs at once. This can happen by extra clicks on the icon on the desktop or by dragging and dropping more files on the desktop icon of our application. Then more processes are started and they start to compete for the file and its content. The sequence of one process looks like this: if (lockFile.exists ()) { // read the port number and connect to it if (alive) { // exit return; } } // otherwise try to create the file yourself lockFile.createNewFile(); DataOutputStream os = new DataOutputStream(new FileOutputStream(lockFile)); SocketServer server = new SocketServer(); int p = server.getLocalPort(); os.writeInt(p); but it can be at any time interrupted by the system and instead of executing all of this as an atomic operation, the control can be passed to the competing process which does the same actions. What happens when one process creates the file, and another tries to read it meanwhile, before a port number is written to it? What if there is a file left from a previous (killed) execution? What happens when a test for file existence fails, but when trying to create it the file already exists? All these questions have to be asked when one wants to have really good confidence in the application code. In order to get the confidence we wanted, we inserted a lot of check points into our [http://www.netbeans.org/source/browse/core/bootstrap/src/org/netbeans/CLIHandler.java implementation of locking ] so the code became a modified version of the previous snippet: enterState(10, block); if (lockFile.exists ()) { enterState(11, block); // read the port number and connect to it if (alive) { // exit return; } } // otherwise try to create the file yourself enterState(20, block); lockFile.createNewFile(); DataOutputStream os = new DataOutputStream(new FileOutputStream(lockFile)); SocketServer server = new SocketServer(); enterState(21, block); int p = server.getLocalPort(); enterState(22, block); os.writeInt(p); enterState(23, block); where the enterState method does nothing in real production environment, but in test it can be instructed to block in a specific check point. So we can write a test when we start two threads and instruct one of them to stop at 22 and then let the second one run and observe how it handles the case when a file already exists, but the port is not yet written in. This approach worked pretty well and despite the skeptical opinions we heard when we tried to solve this problem, we got about 90% of the behaviour right before we integrated the first version. Yes, there was still more work to do and bugs to be fixed, but because we had really good automated tests for the behaviour we really implemented, our [#amoeba amoeba] edge was well stiffened and we had enough confidence that we can fix all outstanding problems.
+
{{:Race conditions}}
==Analyzing Random Failures==
==Analyzing Random Failures==

JaroslavTulach: /* Deadlock Test */ - 2014-10-19 07:43:22

Deadlock Test

←Older revision Revision as of 07:43, 19 October 2014
Line 3: Line 3:
==Deadlock Test==
==Deadlock Test==
-
Fighting with deadlocks is a sad destiny of any multithreaded application. The problem field has been under extensive research because it causes huge problems for every writer of an operating system. Most of the applications are not as complex as operating systems, but as soon as you allow foreign code to run in your application, you basically have to fight with the same set of problems. In spite of the huge research efforts, there was no simple answer / solution found. We know that there are [http://www.google.com/search?q=four+necessary+condition+for+a+deadlock+to+appear&start=0&start=0&ie=utf-8&oe=utf-8 four necessary and also sufficient conditions] for a deadlock to be created: # '''Mutual exclusion condition''' - there has to be a resource (lock, execution queue, etc.) that can be owned by just one thread # '''Non-preemptive scheduling condition''' - it is not possible to take away or release a resource already assigned by anyone else than its owner # '''Hold and wait condition''' - a thread can wait for a resource indefinitely and can hold it indefinitely # '''Resources can be acquired incrementally''' - one can ask for a new resource (lock, execution queue), while already holding another one But we do not know how the code that prevents at least one condition to appear shall look like and definitely we do not know how to do a static analysis over source code to check whether a deadlock can or cannot appear. The basic and in fact very promising advice for a programmer in a language with threads and locks like Java has is: ''do not hold any lock'' while calling foreign code. By following this rule one eliminates the ''fourth'' condition and as all four must be satisfied to create a deadlock, we may believe we found the ultimate solution to deadlocks. But in fact, it is sometimes very hard to satisfy such restriction. Can the following code deadlock?
+
Fighting with deadlocks is a sad destiny of any multithreaded application. The problem field has been under extensive research because it causes huge problems for every writer of an operating system. Most of the applications are not as complex as operating systems, but as soon as you allow foreign code to run in your application, you basically have to fight with the same set of problems. In spite of the huge research efforts, there was no simple answer / solution found. We know that there are four necessary and sufficient [[deadlock conditions]] for a deadlock to be created: # '''Mutual exclusion condition''' - there has to be a resource (lock, execution queue, etc.) that can be owned by just one thread # '''Non-preemptive scheduling condition''' - it is not possible to take away or release a resource already assigned by anyone else than its owner # '''Hold and wait condition''' - a thread can wait for a resource indefinitely and can hold it indefinitely # '''Resources can be acquired incrementally''' - one can ask for a new resource (lock, execution queue), while already holding another one But we do not know how the code that prevents at least one condition to appear shall look like and definitely we do not know how to do a static analysis over source code to check whether a deadlock can or cannot appear. The basic and in fact very promising advice for a programmer in a language with threads and locks like Java has is: ''do not hold any lock'' while calling foreign code. By following this rule one eliminates the ''fourth'' condition and as all four must be satisfied to create a deadlock, we may believe we found the ultimate solution to deadlocks. But in fact, it is sometimes very hard to satisfy such restriction. Can the following code deadlock?
<source lang="java">
<source lang="java">
Line 43: Line 43:
</source>
</source>
-
The test works with two threads, one creates a component and validates it, which results in a callback to getPreferredSize under the awt tree lock, at this moment we start another thread and wait a while for it to acquire the createLabel lock. Under the current implementation this blocks in the JLabel constructor and as soon as our thread continues (after 1000ms) we create the deadlock. There can be a lot of fixes, but the simplest one is very likely to synchronize on the same lock as the JLabel constructor does: public JLabel createLabel () { synchronized (JLabel.getTreeLock ()) { JLabel l = new JLabel (); allCreated.add (l); return l; } } The fix is simple - much simpler than the test - but without the test, we would not fix the shape of our [#amoeba amoeba]. So the time spent writing the test is likely to get paid back. Often the test can be written by using an already existing api, like in our case the getPreferredSize method (for example our [http://www.netbeans.org/source/browse/openide/test/unit/src/org/openide/text/Starvation37045Test.java?rev=1.3&content-type=text/x-cvsweb-markup test]. Only in special situations does one need to introduce a special method that helps the test to simulate the problem (we used that in our [http://www.netbeans.org/source/browse/openide/test/unit/src/org/openide/text/Deadlock40766Test.java?rev=1.2.2.2&content-type=text/x-cvsweb-markup howToReproduceDeadlock40766(boolean)] called from [http://www.netbeans.org/source/browse/openide/src/org/openide/text/PositionRef.java.diff?r1=1.53&r2=1.54 PositionRef.java ]. Anyway deadlock tests are pure ''regression tests'' - one writes them when a bug is reported, nobody is going to write them in advance. At the beginning it is much wiser to invest in good design, but as we explained earlier, as there is no really universal theory on how to prevent deadlocks, one should know what he wants to do when a deadlock appears, for that we suggest that testing is the best way with respect to our [#amoeba amoeba] shape.
+
The test works with two threads, one creates a component and validates it, which results in a callback to getPreferredSize under the awt tree lock, at this moment we start another thread and wait a while for it to acquire the createLabel lock. Under the current implementation this blocks in the JLabel constructor and as soon as our thread continues (after 1000ms) we create the deadlock. There can be a lot of fixes, but the simplest one is very likely to synchronize on the same lock as the JLabel constructor does: public JLabel createLabel () { synchronized (JLabel.getTreeLock ()) { JLabel l = new JLabel (); allCreated.add (l); return l; } } The fix is simple - much simpler than the test - but without the test, we would not fix the shape of our [#amoeba amoeba]. So the time spent writing the test is likely to get paid back. Often the test can be written by using an already existing api, like in our case the getPreferredSize method (for example our [http://www.netbeans.org/source/browse/openide/test/unit/src/org/openide/text/Starvation37045Test.java?rev=1.3&content-type=text/x-cvsweb-markup test]. Only in special situations does one need to introduce a special method that helps the test to simulate the problem (we used that in our [http://www.netbeans.org/source/browse/openide/test/unit/src/org/openide/text/Deadlock40766Test.java?rev=1.2.2.2&content-type=text/x-cvsweb-markup howToReproduceDeadlock40766(boolean)] called from [http://www.netbeans.org/source/browse/openide/src/org/openide/text/PositionRef.java.diff?r1=1.53&r2=1.54 PositionRef.java ]. Anyway deadlock tests are pure ''regression tests'' - one writes them when a bug is reported, nobody is going to write them in advance. At the beginning it is much wiser to invest in good design, but as we explained earlier, as there is no really universal theory on how to prevent deadlocks, one should know what he wants to do when a deadlock appears, for that we suggest that testing is the best way with respect to our [#amoeba amoeba] shape.
==Testing Race Conditions==
==Testing Race Conditions==

JaroslavTulach: /* Analyzing Random Failures */ - 2013-06-09 07:07:29

Analyzing Random Failures

←Older revision Revision as of 07:07, 9 June 2013
Line 51: Line 51:
==Analyzing Random Failures==
==Analyzing Random Failures==
-
Those 10% of random failures mentioned in the previous part emerged, as usual, into more work than just the next 10% of additional tests and a few fixes. They inspired this whole new part, as dealing with failures that happen just from time to time and usually on a computer that you do not own, requires more sophisticated techniques be used for their tracking. The problem with parallel execution is that there is really not much help anyone can get if he wants to use it correctly. The methodology is either weak or missing or just too concentrated on specific cases, the debuggers are not really ready to push the debugged applications to their parallel limits, so in order to really move somewhere, people resort to the oldest solution - to println and logging. The old approach is pretty simple - add log messages into your code, run it a few times, wait until it starts to misbehave and then try to figure out from the log file what went wrong and fix it. In the case of automated tests a similar approach can be used. Enhance the application code and also the tests with logging, and if the test fails, output all the collected log messages as part of the failure report. We have achieved this by writing our own implementation of ErrorManager (which is a [http://www.netbeans.org NetBeans] class used for logging and error reporting), but one can do this in any test by using java.util.logging and implementing its Handler. The implementation has to be registered at the beginning of the test and has to capture all logged messages and in the case of failure make them part of the failure message: public class MyTest extends NbTestCase { static { System.setProperty ("org.openide.util.Lookup", "MyTest$Lkp"); } public MyTest (String name) { super (name); } protected void runTest () { ErrManager.messages.clear (); try { super.runTest (); } catch (AssertionFailedError err) { throw new AssertionFailedError (err.getMessage() + " Log:\n" + ErrManager.messages); } } public void testYourTest() throws Exception { // invoke some code ErrorManager.getDefault().log ("Do some logging"); // another code ErrorManager.getDefault().log ("Yet another logging"); } public static final class Lkp extends org.openide.util.lookup.AbstractLookup { private Lkp (org.openide.util.lookup.InstanceContent ic) { super (ic); ic.add (new MyErr ()); } } private static final class ErrManager extends org.openide.ErrorManager { public static final StringBuffer messages = new StringBuffer (); public void log (int severity, String s) { messages.append (s); messages.append ('\n'); } public void notify (int severity, Throwable t) { messages.append (t.getMessage ()); messages.append ('\n'); } } } The logging can be done by the test to mark important sections of its progress, but the main advantage is that your code shall be full of ErrorManager.log or (if you use the standard Java logging) java.util.logging.Logger.log. The test then collects messages from all places and in the case of failure, provides a complete and detailed (well, depending on how often and useful the log messages are delivered) description of the failure which then can be analyzed and either fixed or the logging made more detailed to help to track the problem down (as we did for example in [http://www.netbeans.org/source/browse/openide/test/unit/src/org/openide/nodes/ChildrenKeysIssue30907Test.java?rev=1.1.4.2&content-type=text/x-cvsweb-markup ChildrenKeysIssue30907Test.java]). Sometimes people are reluctant to analyze random failures in tests as something that does not affect the production code. In fact, it may turn out that the problem is in the test and the code is ok, but relying on this is usually false hope. Without a deeper understanding of the problem, it can be in the application code and even if it is not reproducible all the time, if it occurs it can have huge consequences. If we want to have enough trust in the behaviour of our application and make its [#amoeba amoeba] shape less amoebic, logging in application code and logging friendly tests turn out to be a very useful tool.
+
Those 10% of random failures mentioned in the previous part emerged, as usual, into more work than just the next 10% of additional tests and a few fixes. They inspired this whole new part, as dealing with failures that happen just from time to time and usually on a computer that you do not own, requires more sophisticated techniques be used for their tracking. The problem with parallel execution is that there is really not much help anyone can get if he wants to use it correctly. The methodology is either weak or missing or just too concentrated on specific cases, the [[debugger]]s are not really ready to push the debugged applications to their parallel limits, so in order to really move somewhere, people resort to the oldest solution - to println and logging. The old approach is pretty simple - add log messages into your code, run it a few times, wait until it starts to misbehave and then try to figure out from the log file what went wrong and fix it. In the case of automated tests a similar approach can be used. Enhance the application code and also the tests with logging, and if the test fails, output all the collected log messages as part of the failure report. We have achieved this by writing our own implementation of ErrorManager (which is a [http://www.netbeans.org NetBeans] class used for logging and error reporting), but one can do this in any test by using java.util.logging and implementing its Handler. The implementation has to be registered at the beginning of the test and has to capture all logged messages and in the case of failure make them part of the failure message: public class MyTest extends NbTestCase { static { System.setProperty ("org.openide.util.Lookup", "MyTest$Lkp"); } public MyTest (String name) { super (name); } protected void runTest () { ErrManager.messages.clear (); try { super.runTest (); } catch (AssertionFailedError err) { throw new AssertionFailedError (err.getMessage() + " Log:\n" + ErrManager.messages); } } public void testYourTest() throws Exception { // invoke some code ErrorManager.getDefault().log ("Do some logging"); // another code ErrorManager.getDefault().log ("Yet another logging"); } public static final class Lkp extends org.openide.util.lookup.AbstractLookup { private Lkp (org.openide.util.lookup.InstanceContent ic) { super (ic); ic.add (new MyErr ()); } } private static final class ErrManager extends org.openide.ErrorManager { public static final StringBuffer messages = new StringBuffer (); public void log (int severity, String s) { messages.append (s); messages.append ('\n'); } public void notify (int severity, Throwable t) { messages.append (t.getMessage ()); messages.append ('\n'); } } } The logging can be done by the test to mark important sections of its progress, but the main advantage is that your code shall be full of ErrorManager.log or (if you use the standard Java logging) java.util.logging.Logger.log. The test then collects messages from all places and in the case of failure, provides a complete and detailed (well, depending on how often and useful the log messages are delivered) description of the failure which then can be analyzed and either fixed or the logging made more detailed to help to track the problem down (as we did for example in [http://www.netbeans.org/source/browse/openide/test/unit/src/org/openide/nodes/ChildrenKeysIssue30907Test.java?rev=1.1.4.2&content-type=text/x-cvsweb-markup ChildrenKeysIssue30907Test.java]). Sometimes people are reluctant to analyze random failures in tests as something that does not affect the production code. In fact, it may turn out that the problem is in the test and the code is ok, but relying on this is usually false hope. Without a deeper understanding of the problem, it can be in the application code and even if it is not reproducible all the time, if it occurs it can have huge consequences. If we want to have enough trust in the behaviour of our application and make its [#amoeba amoeba] shape less amoebic, logging in application code and logging friendly tests turn out to be a very useful tool.
==Advanced usage of Logging==
==Advanced usage of Logging==

JaroslavTulach: /* Execution Flow Control using Logging */ - 2012-10-24 14:07:19

Execution Flow Control using Logging

←Older revision Revision as of 14:07, 24 October 2012
Line 79: Line 79:
==Execution Flow Control using Logging==
==Execution Flow Control using Logging==
-
This section is going to build on top of the content of previous description of logging support and on sections that were discussing testing for race conditions and tests that try to simulate deadlocks. Both of these tests need some kind of hooks in the code that allow the testing environment to simulate some obscure situations and either corrupt some internal data structure or stop the thread in a place that is going to cause a deadlock with other executing thread. The previous parts suggested to put special methods accessible from the testing code that allow such kinds of execution control. Either to have code full of inserted statements like enterState(10, block) or to provide some overridable method like [http://www.netbeans.org/source/browse/openide/test/unit/src/org/openide/text/Deadlock40766Test.java?rev=1.2.2.2&content-type=text/x-cvsweb-markup howToReproduceDeadlock40766(boolean)] and make it do something insane in the test. This is possible, but actually there is a small improvement which is not easy to find but when discovered it seems so natural and easy to use, that one just has to wonder why it has not come to the minds earlier: Instead of putting special methods in the code, one can use logging! Logging? Yes, logging! The beauty of such solution is in the fact that logging is or at least should be natural part of every at least a bit complicated program anyway, so one does not have to obscure the code with enterState and howToReproduceDeadlock40766(boolean), instead one can just use: logger.finest("Enter state 20"); // or logger.finest("reproduceDeadlock40766now"); The first log is fully natural, the second is a bit suspicious because of the name of its message, but still it is not that much strange to be seen as alien piece of code not related to the program at all. Logging just belongs to programs. Now the testing code can register its own Handler and in its publish method do all the wild things: class Reproduce40766 extends Handler { public void publish(LogRecord rec) { if ("reproduceDeadlock40766now".equals(rec.getMessage())) { // block and let other thread run as shown // in [http://www.netbeans.org/source/browse/openide/test/unit/src/org/openide/text/Deadlock40766Test.java?rev=1.2.2.2&content-type=text/x-cvsweb-markup Deadlock40766Test.howToReproduceDeadlock40766(boolean)] } } } Logger.getLogger("").addHandler(new Reproduce40766()); So instead of inseminating the application code with special hacks, one can achieve the same by adding a logging handler and analyzing messages passed to it during execution of the test. The application code stays clean and the test gets as powerful as is the number of messages that are logged in the application code. Because each logged message is a chance for the test to influence the behaviour of the application. Taken to an extreme, it shall be possible to fully control the behaviour a multithreaded program by suspeding all other threads than the actual one that shall be executed. Imagine following program: class Parael implements Runnable { public void run() { Random r = new Random(); for (int i = 0; i < 10; i++) { try { Thread.sleep(r.nextInt(100)); } catch (InterruptedException ex) {} Logger.global.log(Level.WARNING, "cnt: {0}", new Integer(i)); } } public static void main(String[] args) throws InterruptedException { Thread t1 = new Thread(new Parael(), "1st"); Thread t2 = new Thread(new Parael(), "2nd"); t1.start(); t2.start(); t1.join(); t2.join(); } } The program runs two threads, each of them counting by one to ten while pausing random number of miliseconds. Obviously the threads are going to be run in parael and the speed of their counting is going to be random. This can be easily verified by a simple NbTestCase with enabled logging:
+
This section is going to build on top of the content of previous description of logging support and on sections that were discussing testing for race conditions and tests that try to simulate deadlocks. Both of these tests need some kind of hooks in the code that allow the testing environment to simulate some obscure situations and either corrupt some internal data structure or stop the thread in a place that is going to cause a deadlock with other executing thread. The previous parts suggested to put special methods accessible from the testing code that allow such kinds of execution control. Either to have code full of inserted statements like enterState(10, block) or to provide some overridable method like [http://www.netbeans.org/source/browse/openide/test/unit/src/org/openide/text/Deadlock40766Test.java?rev=1.2.2.2&content-type=text/x-cvsweb-markup howToReproduceDeadlock40766(boolean)] and make it do something insane in the test. This is possible, but actually there is a small improvement which is not easy to find but when discovered it seems so natural and easy to use, that one just has to wonder why it has not come to the minds earlier: Instead of putting special methods in the code, one can use logging! Logging? Yes, logging! The beauty of such solution is in the fact that logging is or at least should be natural part of every at least a bit complicated program anyway, so one does not have to obscure the code with enterState and howToReproduceDeadlock40766(boolean), instead one can just use: logger.finest("Enter state 20"); // or logger.finest("reproduceDeadlock40766now"); The first log is fully natural, the second is a bit suspicious because of the name of its message, but still it is not that much strange to be seen as alien piece of code not related to the program at all. Logging just belongs to programs. Now the testing code can register its own Handler and in its publish method do all the wild things:
 +
 
 +
<source lang="java">
 +
class Reproduce40766 extends Handler {
 +
public void publish(LogRecord rec) {
 +
if ("reproduceDeadlock40766now".equals(rec.getMessage())) {
 +
// block and let other thread run as shown
 +
// in [http://www.netbeans.org/source/browse/openide/test/unit/src/org/openide/text/Deadlock40766Test.java?rev=1.2.2.2&content-type=text/x-cvsweb-markup Deadlock40766Test.howToReproduceDeadlock40766(boolean)]
 +
}
 +
}
 +
}
 +
 
 +
Logger.getLogger("").addHandler(new Reproduce40766());
 +
</source>
 +
 
 +
So instead of inseminating the application code with special hacks, one can achieve the same by adding a logging handler and analyzing messages passed to it during execution of the test. The application code stays clean and the test gets as powerful as is the number of messages that are logged in the application code. Because each logged message is a chance for the test to influence the behaviour of the application. Taken to an extreme, it shall be possible to fully control the behaviour a multithreaded program by suspeding all other threads than the actual one that shall be executed. Imagine following program:
 +
<source lang="java">
 +
class Parael implements Runnable {
 +
public void run() {
 +
Random r = new Random();
 +
for (int i = 0; i < 10; i++) {
 +
try {
 +
Thread.sleep(r.nextInt(100));
 +
} catch (InterruptedException ex) {
 +
}
 +
Logger.global.log(Level.WARNING, "cnt: {0}", new Integer(i));
 +
}
 +
}
 +
 
 +
public static void main(String[] args) throws InterruptedException {
 +
Thread t1 = new Thread(new Parael(), "1st");
 +
Thread t2 = new Thread(new Parael(), "2nd");
 +
t1.start();
 +
t2.start();
 +
t1.join();
 +
t2.join();
 +
}
 +
}
 +
</source>
 +
 
 +
The program runs two threads, each of them counting by one to ten while pausing random number of miliseconds. Obviously the threads are going to be run in parael and the speed of their counting is going to be random. This can be easily verified by a simple NbTestCase with enabled logging:
<source lang="java">
<source lang="java">
Line 221: Line 261:
<source lang="java">
<source lang="java">
-
public class ParaelTest extends NbTestCase { public ParaelTest(String testName) { super(testName); } protected Level logLevel() { return Level.WARNING; } public void testMain() throws Exception { org.netbeans.junit.Log.controlFlow(Logger.global, null, "THREAD: 1st MSG: cnt: 5" + "THREAD: 2nd MSG: cnt: 2" + "THREAD: 1st MSG: cnt: 6", 5000 ); Parael.main(null); fail("Ok, just print the logged output"); } }
+
public class ParaelTest extends NbTestCase {
 +
public ParaelTest(String testName) {
 +
super(testName);
 +
}
 +
 
 +
protected Level logLevel() {
 +
return Level.WARNING;
 +
}
 +
 
 +
public void testMain() throws Exception {
 +
org.netbeans.junit.Log.controlFlow(Logger.global, null,
 +
"THREAD: 1st MSG: cnt: 5" +
 +
"THREAD: 2nd MSG: cnt: 2" +
 +
"THREAD: 1st MSG: cnt: 6",
 +
5000
 +
);
 +
Parael.main(null);
 +
fail("Ok, just print the logged output");
 +
}
 +
}
</source>
</source>
As can be seen the log messages inside the application can became very useful and valuable tool for preventing unwanted moves of ''[[amoeba]]'s edge''. Not only that such messages can be enabled during execution of the application itself, but they should even be on and be captured when the code is executed in testing environment. Not only that having the log for failed tests is going to help analyzing its root cause, but if needed, the log can also serve as a script that can control the execution flow and thus became very useful in simulating extreme situations the application code can get to. Indeed this style of logging control can be taken even further. One possible style of enhancements is in the direction of better log generation and the second in better replaying of such log files. On the side of log capturing, an interesting thing to know is when a context switch - e.g. transfer of execution from one thread to another is happening - for that possible tool like dtrace could be very useful. The interesting implication of knowing about context switches and being able to replay them is that this in fact can turn a parael, non-deterministic application into a deterministic one (of course only on single processor machines). Other possible improvements might include use of AOP or any other bytecode patching techniques to insert logging code everywhere and thus getting info about the application even if it is not fully ready to provide it. Everything is possible, but just a future is going to tell which of such improvements is useful. For now it is enough to say, that NetBeans is successfully using the logging and control flow based on logging to fight with the ''amoeba effect'' in its APIs.
As can be seen the log messages inside the application can became very useful and valuable tool for preventing unwanted moves of ''[[amoeba]]'s edge''. Not only that such messages can be enabled during execution of the application itself, but they should even be on and be captured when the code is executed in testing environment. Not only that having the log for failed tests is going to help analyzing its root cause, but if needed, the log can also serve as a script that can control the execution flow and thus became very useful in simulating extreme situations the application code can get to. Indeed this style of logging control can be taken even further. One possible style of enhancements is in the direction of better log generation and the second in better replaying of such log files. On the side of log capturing, an interesting thing to know is when a context switch - e.g. transfer of execution from one thread to another is happening - for that possible tool like dtrace could be very useful. The interesting implication of knowing about context switches and being able to replay them is that this in fact can turn a parael, non-deterministic application into a deterministic one (of course only on single processor machines). Other possible improvements might include use of AOP or any other bytecode patching techniques to insert logging code everywhere and thus getting info about the application even if it is not fully ready to provide it. Everything is possible, but just a future is going to tell which of such improvements is useful. For now it is enough to say, that NetBeans is successfully using the logging and control flow based on logging to fight with the ''amoeba effect'' in its APIs.