This uses HeapBlockDevice for providing the underlying storage block.
Check boundaries that slicingblockdevice do not overlow over to unassigned
blocks.
Reimplement atomic code in inline assembly. This can improve
optimisation, and avoids potential architectural problems with using
LDREX/STREX intrinsics.
API further extended:
* Bitwise operations (fetch_and/fetch_or/fetch_xor)
* fetch_add and fetch_sub (like incr/decr, but returning old value -
aligning with C++11)
* compare_exchange_weak
* Explicit memory order specification
* Basic freestanding template overloads for C++
This gives our existing C implementation essentially all the functionality
needed by C++11.
An actual Atomic<T> template based upon these C functions could follow.