❄️ Volatile

ConcurrencyOct 10, 2020, 6 min read

The fun part about about analyzing race conditions is that anything can happen - so most of your assumptions will probably be right.

There are two important concepts to achieve in a multi-threading environment: Visibility and Atomicity.

The volatile keyword ensures that updates to a variable are propagated predictably to other threads.2 Prevents the compiler from re-ordering instructions (derived from, such as, optimisations) and its variable from being cached in registers hidden from other threads. Volatile targets visibility by guaranteeing it returns the most recent value by any thread.

If you find the inner depths of compilers and concurrency as fascinating as I do, this definition just isn't enough.

The need by example

Let's assume, in the example below, the two methods A and B were run by two different threads.

/* Adapted from "C# 4 in a Nutshell", Joseph Albahari, Ben Albahari */
class Foo {
int answer;
volatile boolean complete;
public void A() {
answer = 123;
complete = true;
}
public void B() {
if (complete) {
System.out.println(answer);
}
}
public static void main(String[] args) throws InterruptedException {
Foo t = new Foo();
new Thread(t::B).start();
TimeUnit.MILLISECONDS.sleep(100);
new Thread(t::A).start();
}
}

Using volatile for the boolean value complete, will ensure a happens-before relationship where reads/writes cannot be re-ordered to occur after a write to a volatile, effectively creating a memory barrier.

Re-orderings are a possibility underlying a weak memory model and, besides being useful for optimisations, they can also cause unpredictable cases such as the highlighted lines above becoming:

_complete = true; // without volatile
_answer = 123;

Potentially resulting in the program above to view complete as true and print 0 instead of the expected 123 right before its assignment. Or, equally targeting visibility, by creating an infinite loop as the thread running B did not commit any of the writes - there is no freshness guarantee.

Taking this in consideration, you might have previously read: a volatile variable is read from the computer's main memory, and adding to the fact that it is no longer cached in registers, are we stumbling upon a clash between performance and concurrency?

Is volatile expensive?

Does that actually mean every volatile access will be read from the main memory? Not exactly, this is a common misconception I've seen in nearly a quarter of the volatile definitions inducing me in mistake previously.

If volatiles were read/written from main memory every time, the performance impact would be very underwhelming. The actual cost depends on the CPU architecture which, in a broader context, using protocols such as the MESI, shares a coherent cache line allowing other caches to obtain the cached reference from another CPU, reducing the memory hits.

The volatile modifier guarantees that any thread that reads a field will see the most recently written value. -- Joshua Bloch in Effective Java 3

Keeping the caches coherent enforces that when two threads read from the same memory address, they should never simultaneously read different values.3

This is accomplished by a set of states:

  • Modified: One processor contains the data, but it is dirty - has been modified.
  • Exclusive: One processor contains the data and it hasn't been shared.
  • Shared: Multiple processors have cached this data and it's up to date.
  • Invalid: The entry is invalid and unused.

In simplified terms, if an entry is initially Invalid and a processor requests to read an address, the state will transition to Shared if other caches have a valid copy or Exclusive if none (and eventually fetched from main memory). If an entry is Shared and a processor requests to write, the state transitions to Modified and other caches mark their copy of the block as Invalid. I do not intend to cover the combination of possible events under this protocol, but you can read more about its state machine here.

Taking coherency in consideration, it leads us to another important question:

Why do we still need volatile over the previously explained cache-coherency?

Besides visibility, cache coherency is not sufficient to guarantee the instruction ordering. If 2 cores write to the same line simultaneously, MESI protocol only guarantees that these writes will have some order, not a specific one you might expect.4 Two threads might still read from that same memory address in-between, read the exact same values, but not in the order you initially designed it.

Some x86 and x64 processors implement a strong memory model where memory access is effectively volatile5, but unless they guarantee sequential consistency, re-orderings are allowed.

In addition to any further possible compiler (not hardware) optimisations, the latter is also prohibited from allocating these values in registers, such as two references to the same volatile variable, where the first involves a load from memory, but the second tries to skip it re-using the value in the register.

CPU registers are special temporary locations that facilitate some CPU operations. While the cache is a high-speed volatile memory, closer to the CPU, that helps reducing main memory hits.1

Some programmers might consider that since they're compiling for x86, they don't need some ordering guarantees for a shared variable. But optimisations happen based on the as-if rule for the language memory model, not the target hardware.6

Is volatile atomic?

The effect of a volatile read or write is that the individual operation will be atomic. However, an operation such as i++, where i is effectively atomic, is decomposed into two operations i = i + 1 (read and write), which are not atomic since the respective thread might be interrupted in the meantime.

concurrencycompilerscpu

👋 Hey, did you find this content useful?

Let me know on Twitter or edit this page on Github.

Pedro © 2020