ImpossibleThreading
From APIDesign
(→Don't Say It is Impossible) |
|||
(28 intermediate revisions not shown.) | |||
Line 1: | Line 1: | ||
- | Another story about problems | + | Another story about problems to explain that something is [[impossible]] is here. This time it touches my own experience with threading (and you don't have to understand [[Impossible|finite state automata]] to read it).. |
== [[NetBeans]] Threading == | == [[NetBeans]] Threading == | ||
Line 5: | Line 5: | ||
Once upon a time, probably slightly after year 2000, [[NetBeans]] had enormous problems with [[deadlock]]s. Not surprisingly. [[Swing]] is single-threaded, but we were running a lot of tasks on background and they were competing for resources (like the [[Swing]] dispatch thread, or their own locks, etc.). My boss asked me to fix this. | Once upon a time, probably slightly after year 2000, [[NetBeans]] had enormous problems with [[deadlock]]s. Not surprisingly. [[Swing]] is single-threaded, but we were running a lot of tasks on background and they were competing for resources (like the [[Swing]] dispatch thread, or their own locks, etc.). My boss asked me to fix this. | ||
- | Yes, [[I]] was [[impossible|the expert]] - I knew about [[deadlock conditions]] and was aware that it is enough to make sure just one of them is not true and we | + | Yes, [[I]] was [[impossible|the expert]] - I knew about [[deadlock conditions]] and was aware that it is enough to make sure just one of them is not true and we would have a [[deadlock]]-free system. Yet I also remembered my lectures from [[MatFyz]] where we were informed that there is no coherent theory to drive development of [[deadlock]]-free system. Especially if you have a system composed from independent [[module]]s. Each of them may be [[deadlock]]-free itself, but when you assemble them together a [[deadlock]] can still appear. |
== Is It [[Impossible]]? == | == Is It [[Impossible]]? == | ||
Line 11: | Line 11: | ||
[[I]] did what experts do. I said: ''"It's [[impossible]]!"'' and explained my reasoning. Looking back and reminding myself of the [[impossible|finite-state automaton story]], it was no surprise my boss didn't listen. I lost my credibility as an expert and he selected somebody else to make [[NetBeans]] [[deadlock]] free! | [[I]] did what experts do. I said: ''"It's [[impossible]]!"'' and explained my reasoning. Looking back and reminding myself of the [[impossible|finite-state automaton story]], it was no surprise my boss didn't listen. I lost my credibility as an expert and he selected somebody else to make [[NetBeans]] [[deadlock]] free! | ||
- | As a result we got a detailed write up describing the state of locking at that time (it was really bad) and suggestions to modify state under write-lock and deliver events under read-lock. For a while it | + | As a result we got a detailed write up describing the state of locking at that time (it was really bad) and suggestions to modify state under write-lock and deliver events under read-lock. For a while it seemed to work OK (it takes a while before people report [[deadlock]]s in new code), but at the end it turned out this style is actually a source of major and hard to solve [[deadlock]]s and long pauses when rendering the UI (because of few global locks preventing rendering while long modification was running). |
+ | |||
+ | Now it is easy to see the whole idea suffered from a typical syndrome of designing a "solution" to something that is [[impossible]] to be solved, but at beginning? My manager and many others treated it as a real cure. | ||
== Seeking the Flaws == | == Seeking the Flaws == | ||
- | + | Since the beginning I knew the problem has no simple solution. Thus I acted as [[impossible|experts do]], I tried to find out why the offered solution is completely stupid. Maybe I was too proud, or more likely I just didn't want to rewrite most of the [[NetBeans]] [[API]]s (and change them incompatibly) to a new threading scheme without guaranteed result - those who follow this web may know I honour [[backward compatibility]] a lot and I don't want to sacrifice it for a fictitious dream. | |
- | As such I seeked for ways to eliminate our [[deadlock]]s - but not thanks to a new and unproven master plan, but with as little changes as possible (e.g. trying to not shake the [[Amoeba]] of [[NetBeans]] needlessly). At the end I learned how to simulate any [[deadlock]] in a unit test (see [[FlowControllingTest]] for details) and | + | As such I seeked for ways to eliminate our [[deadlock]]s - but not thanks to a new and unproven master plan, but with as little changes as possible (e.g. trying to not shake the [[Amoeba]] of [[NetBeans]] needlessly). At the end I learned how to simulate any [[deadlock]] in a unit test (see [[FlowControllingTest]] page for details) and then everything goes easy. Just apply [[wikipedia:TDD|test driven development]]: Have a bug describing a deadlock? Write a test that fails. Fix the code to make the test pass. Repeat! |
+ | |||
+ | As a result the number of [[deadlock]]s in critical areas started to decrease. It took few years, lay-off of my former manager and [[Chapter 11|one chapter]] in [[TheAPIBook]], before it got clear that the threading cannot be fixed by a vision, but rather a hard work. The [[impossible|expert truth]] may eventually reveal, but it takes time. | ||
== Don't Say It is [[Impossible]] == | == Don't Say It is [[Impossible]] == | ||
- | The [[impossible|expert]] | + | The feeling of [[impossible|expert in me]] was right: It is [[impossible]] to fix threading only with a great vision. Even my favourite way to unchain one of the [[deadlock conditions]] - e.g.: [[Chapter 11|never call foreign code while holding a lock]] is not enough as [[Chapter 11]] analyses. Hard work is necessary, but how to sell it? |
+ | |||
+ | Given the few years of struggling I had to go through, I would reply differently hearing the original question again: rather than saying fighting [[deadlock]]s is [[impossible]], I'd say we need to create a process to help our developers fight with [[deadlock]]s properly (e.g. they have to write a test before fixing a [[deadlock]]). The result would be the same and I would go through less suffering. Moreover such answer might have suited my manager more, as he was famous for mixing technical and human factors by saying: ''we have a technical issue, we need somebody to ...." | ||
+ | |||
+ | Should we think twice before claiming something is [[impossible]]? Yeah, sometimes it is hard to explain why something is [[impossible]] and there is always a lot of dummies to offer half-baken solutions. But, explaining [[impossible|impossibility]] is necessary from time to time! Btw. I still have one more topic about [[impossible|imposibility]] to cover - [[TBD|to be continued]]... | ||
+ | |||
+ | == Careful modular design is key == | ||
+ | |||
+ | Contributed by Jeffrey D. Smith, author of the JavaMutex project on SourceForge. | ||
+ | |||
+ | Avoiding [[Deadlock]] in a modularized system requires removing the tightly coupled connections between modules, and replacing with asynchronous communications for requesting services and receiving responses. Each module is a conceptual server thread, awaiting on requests from any number of other asynchronous threads, as well awaiting on responses to its own requests for services from other servers. Deadlock is most commonly caused by holding read-write (exclusive) locks while calling foreign code and that code acquires locks and then calls (directly or indirectly) the originating module. Thus, a circular reference is created and causes a deadly embrace. A partial solution is to rely on read-only (shared) locks as much as possible, but that cannot prevent an arbitrary thread from "hanging" (awaiting indefinitely) on the read-write lock while other threads are concurrently holding the read-only lock. Whenever a thread must await on a resource, only the mutex for that resource is atomically released. Any other held mutex resources remain held while awaiting, thus leading to potential deadlock. | ||
+ | |||
+ | The solution requires a comprehensive design to eliminate holding locks while the thread is accessing other modules. | ||
+ | |||
+ | A potential solution is to restrict direct access to read-only data for foreign modules. When foreign data must be accessed and updated by asynchronous threads, then the processing is isolated to the owning thread (server). Thus, the various module servers encapsulate and isolate the updating of information and won't need to hold any locks for the update. The "client" threads send asynchronous requests to the owning server and await for a reply. Each server simply awaits for service requests, processes each request by computing a result or by sending one or more service requests to the other servers and combines the responses. The combined results are then returned to the original requestor. | ||
+ | |||
+ | The server can delegate a potentially time consuming request to a private worker thread. The server maintains its responsiveness to other service requests by offloading "big" requests to its private pool of worker threads, which operate as a "mini-servers" for the parent server. | ||
+ | |||
+ | The main Java issue faced by modularized systems is that the Java synchronization mechanism supports any number of threads awaiting on a single resource for exclusive control. By encapsulating the Java synchronization mechanism, the inverse design is abstracted: A single thread awaiting on multiple events (logical resources). The key is provided by the JavaMutex EventTokenSet class that is designed for one thread to await on multiple events across multiple threads. Thus, a server can await for multiple events occurring in multiple other threads. | ||
+ | |||
+ | A critical design point is to avoid replacing potential deadlocks with potential infinite cycles of service requests. Server Alpha requests data from Server Beta. Server Beta requests data from Server Alpha, which triggers yet another request to Server Beta and so on. The deadlock is replaced by OutOfMemoryError as event tokens are created for each request. Thus, each service module must be designed to demonstrate a terminus for every possible service request. | ||
+ | |||
+ | The JavaMutex project is a library of 100% pure Java interfaces and classes that provide concurrency tools for locks, intersects, latches, executor services, etc. There is a sample test case TestMultiServer.java that demonstrates a simple modularized system of two servers and a client, each running asynchronously as its own thread. The client issues multiple requests to the servers, and the servers issue further requests amongst themselves. Responses are accumulated asynchronously and replied to the client. All of the processing is free of deadlocks and completely asynchronous. The asynchronous nature of clients and servers is the basic design principle. The design is scalable and only limited by the number of distinct threads (and memory) available in the JVM. | ||
- | + | Trying to retrofit a mature, yet deadlock-prone modularized system is very difficult. The correct initial design is crucial to building a sophisticated modularized system with asynchronous server modules that is free of deadlocks. |
Current revision
Another story about problems to explain that something is impossible is here. This time it touches my own experience with threading (and you don't have to understand finite state automata to read it)..
Contents |
NetBeans Threading
Once upon a time, probably slightly after year 2000, NetBeans had enormous problems with deadlocks. Not surprisingly. Swing is single-threaded, but we were running a lot of tasks on background and they were competing for resources (like the Swing dispatch thread, or their own locks, etc.). My boss asked me to fix this.
Yes, I was the expert - I knew about deadlock conditions and was aware that it is enough to make sure just one of them is not true and we would have a deadlock-free system. Yet I also remembered my lectures from MatFyz where we were informed that there is no coherent theory to drive development of deadlock-free system. Especially if you have a system composed from independent modules. Each of them may be deadlock-free itself, but when you assemble them together a deadlock can still appear.
Is It Impossible?
I did what experts do. I said: "It's impossible!" and explained my reasoning. Looking back and reminding myself of the finite-state automaton story, it was no surprise my boss didn't listen. I lost my credibility as an expert and he selected somebody else to make NetBeans deadlock free!
As a result we got a detailed write up describing the state of locking at that time (it was really bad) and suggestions to modify state under write-lock and deliver events under read-lock. For a while it seemed to work OK (it takes a while before people report deadlocks in new code), but at the end it turned out this style is actually a source of major and hard to solve deadlocks and long pauses when rendering the UI (because of few global locks preventing rendering while long modification was running).
Now it is easy to see the whole idea suffered from a typical syndrome of designing a "solution" to something that is impossible to be solved, but at beginning? My manager and many others treated it as a real cure.
Seeking the Flaws
Since the beginning I knew the problem has no simple solution. Thus I acted as experts do, I tried to find out why the offered solution is completely stupid. Maybe I was too proud, or more likely I just didn't want to rewrite most of the NetBeans APIs (and change them incompatibly) to a new threading scheme without guaranteed result - those who follow this web may know I honour backward compatibility a lot and I don't want to sacrifice it for a fictitious dream.
As such I seeked for ways to eliminate our deadlocks - but not thanks to a new and unproven master plan, but with as little changes as possible (e.g. trying to not shake the Amoeba of NetBeans needlessly). At the end I learned how to simulate any deadlock in a unit test (see FlowControllingTest page for details) and then everything goes easy. Just apply test driven development: Have a bug describing a deadlock? Write a test that fails. Fix the code to make the test pass. Repeat!
As a result the number of deadlocks in critical areas started to decrease. It took few years, lay-off of my former manager and one chapter in TheAPIBook, before it got clear that the threading cannot be fixed by a vision, but rather a hard work. The expert truth may eventually reveal, but it takes time.
Don't Say It is Impossible
The feeling of expert in me was right: It is impossible to fix threading only with a great vision. Even my favourite way to unchain one of the deadlock conditions - e.g.: never call foreign code while holding a lock is not enough as Chapter 11 analyses. Hard work is necessary, but how to sell it?
Given the few years of struggling I had to go through, I would reply differently hearing the original question again: rather than saying fighting deadlocks is impossible, I'd say we need to create a process to help our developers fight with deadlocks properly (e.g. they have to write a test before fixing a deadlock). The result would be the same and I would go through less suffering. Moreover such answer might have suited my manager more, as he was famous for mixing technical and human factors by saying: we have a technical issue, we need somebody to ...."
Should we think twice before claiming something is impossible? Yeah, sometimes it is hard to explain why something is impossible and there is always a lot of dummies to offer half-baken solutions. But, explaining impossibility is necessary from time to time! Btw. I still have one more topic about imposibility to cover - to be continued...
Careful modular design is key
Contributed by Jeffrey D. Smith, author of the JavaMutex project on SourceForge.
Avoiding Deadlock in a modularized system requires removing the tightly coupled connections between modules, and replacing with asynchronous communications for requesting services and receiving responses. Each module is a conceptual server thread, awaiting on requests from any number of other asynchronous threads, as well awaiting on responses to its own requests for services from other servers. Deadlock is most commonly caused by holding read-write (exclusive) locks while calling foreign code and that code acquires locks and then calls (directly or indirectly) the originating module. Thus, a circular reference is created and causes a deadly embrace. A partial solution is to rely on read-only (shared) locks as much as possible, but that cannot prevent an arbitrary thread from "hanging" (awaiting indefinitely) on the read-write lock while other threads are concurrently holding the read-only lock. Whenever a thread must await on a resource, only the mutex for that resource is atomically released. Any other held mutex resources remain held while awaiting, thus leading to potential deadlock.
The solution requires a comprehensive design to eliminate holding locks while the thread is accessing other modules.
A potential solution is to restrict direct access to read-only data for foreign modules. When foreign data must be accessed and updated by asynchronous threads, then the processing is isolated to the owning thread (server). Thus, the various module servers encapsulate and isolate the updating of information and won't need to hold any locks for the update. The "client" threads send asynchronous requests to the owning server and await for a reply. Each server simply awaits for service requests, processes each request by computing a result or by sending one or more service requests to the other servers and combines the responses. The combined results are then returned to the original requestor.
The server can delegate a potentially time consuming request to a private worker thread. The server maintains its responsiveness to other service requests by offloading "big" requests to its private pool of worker threads, which operate as a "mini-servers" for the parent server.
The main Java issue faced by modularized systems is that the Java synchronization mechanism supports any number of threads awaiting on a single resource for exclusive control. By encapsulating the Java synchronization mechanism, the inverse design is abstracted: A single thread awaiting on multiple events (logical resources). The key is provided by the JavaMutex EventTokenSet class that is designed for one thread to await on multiple events across multiple threads. Thus, a server can await for multiple events occurring in multiple other threads.
A critical design point is to avoid replacing potential deadlocks with potential infinite cycles of service requests. Server Alpha requests data from Server Beta. Server Beta requests data from Server Alpha, which triggers yet another request to Server Beta and so on. The deadlock is replaced by OutOfMemoryError as event tokens are created for each request. Thus, each service module must be designed to demonstrate a terminus for every possible service request.
The JavaMutex project is a library of 100% pure Java interfaces and classes that provide concurrency tools for locks, intersects, latches, executor services, etc. There is a sample test case TestMultiServer.java that demonstrates a simple modularized system of two servers and a client, each running asynchronously as its own thread. The client issues multiple requests to the servers, and the servers issue further requests amongst themselves. Responses are accumulated asynchronously and replied to the client. All of the processing is free of deadlocks and completely asynchronous. The asynchronous nature of clients and servers is the basic design principle. The design is scalable and only limited by the number of distinct threads (and memory) available in the JVM.
Trying to retrofit a mature, yet deadlock-prone modularized system is very difficult. The correct initial design is crucial to building a sophisticated modularized system with asynchronous server modules that is free of deadlocks.