TransactionalDataStructure

From APIDesign

Jump to: navigation, search

TransactionalDataStructure is a realization of LockFreeAlgorithm. The primary goal may be different than to (just) avoid locks however. The TransactionalDataStructure pattern can be used to build a reliable data structure that keeps consistency even under multi threaded access and can be easily reason about. To do so the data structure uses two important concepts:

  • immutability (helps to reason about consistency)
  • compare and swap (aka CAS) operation (always selects just a single thread to perform an update)

Let's start with a code sample!

Contents

Example

Let's keep the internal data in a dedicated immutable structure called Data. When performing an update:

  • Read the current state.
  • Compute an update based on the current state
  • Use compare and swap operation to transactionally (hence the similarity with TransactionalMemory) update the state of the object
  • repeat everything again if there was a clash
  • and the current value was modified by some other thread meanwhile


import java.util.concurrent.atomic.AtomicReference;
 
public abstract class Helper {
 
    private static final class Data {
 
        final int value1;
        final int value2;
        final int value3;
 
        Data(int value1, int value2, int value3) {
            this.value1 = value1;
            this.value2 = value2;
            this.value3 = value3;
        }
    }
 
    private final AtomicReference<Data> data = new AtomicReference<>(new Data(0, 0, 0));
 
    protected abstract int combine(int x, int y);
 
    public final void update() {
        Data current;
        while (true) {
            current = data.get();
 
            // the following three operations must be performed in a "transaction" ...
            int v1 = combine(current.value1, current.value2);
            int v2 = combine(current.value2, current.value3);
            int v3 = combine(current.value3, current.value1);
            // either we want to apply them all or none at all!
 
            if (data.compareAndSet(current, new Data(v1, v2, v3))) {
                // CAS checks the value in data is still == to current
                // if so it changes it to new Data and we can return
                return;
            }
            // otherwise try again
        }
    }
}

Neat usage of immutability. The data that are supposed to be consistent are final in a Data class. The mutability is handled all at once with AtomicReference compareAndSet method. Either it fails and the computation runs again or it succeeds and atomically changes the data to newly created and consistent value. No locks involved, hence the reference to LockFreeAlgorithms. Let's call such a pattern TransactionalDataStructure.

Trade CPU for no Locks!

LockFreeAlgorithms avoid synchronization and rather prefer multiple threads to do (and potentially throw away) their work. This kind of trade CPU for no locks is exactly what TransactionalMemory tried to promise. The mantra of "let the computation run and on commit verify" whether all inputs are still the same and only then write down the changed values, is exactly the transactional approach.

Why the TransactionalMemory approaches failed so far? It is easy to do the compare&swap with a single value, it is harder, but possible to do it with two values. With more values, it becomes almost impossible. Software TransactionalMemory doesn't restrict the amount of CAS operations, so it is easy to get to the impractical scale of the problem.

I had to implement this kind CAS-like check few times in NetBeans. For example here: https://github.com/apache/netbeans/blob/c22dbacda230435cec1c5067bf8ae6c359e15d55/platform/openide.util.lookup/src/org/openide/util/lookup/ProxyLookup.java#L434 however my problem wasn't performance (btw. the code still uses synchronized), but two other reasons:

Never call 3rd party code with a lock

The Never call 3rd party code with a lock mantra for framework designers (essential for designing deadlock-free APIs) - to do that you have to read state ahead of doing any computation; if there is a need to perform a callback (to unknown code): make sure no lock is held when the callback returns enter the CAS phase - e.g. check preconditions and update if they hold, just like TransactionalMemory would do

Re-entrancy

The Lookup, ProxyLookup code was so central in NetBeans, that the critical section could appear multiple times on stack and the nested invocations could invalidate the read in input values! Locking doesn't help with re-entrant data structures at all - the only way is to use this transactional memory approach - read values in, do the computation, and CAS or try again. Plus one has to "hope" that the data structure somehow stabilizes and nested calls don't invalidate the outer calls indefinitely (which was working well in the case of NetBeans Lookup library which gradually converged to more and more computed state).

It's Cool to be Transactional!

The term TransactionalMemory isn't massively known these days. The software TransactionalMemory hype is over for years. No surprise Java (EE) developers know nothing about it. However majority of developers understands DB transactions. Thus explaining CAS as a "transaction commit" may be natural to them. Moreover LockFreeAlgorithms are still cool and in! Don't be afraid to use TransactionalDataStructures! They make every code (not just an API) easier to reason about and robust for multithreaded access.

Personal tools
buy