Reimplement atomic code in inline assembly. This can improve
optimisation, and avoids potential architectural problems with using
LDREX/STREX intrinsics.
API further extended:
* Bitwise operations (fetch_and/fetch_or/fetch_xor)
* fetch_add and fetch_sub (like incr/decr, but returning old value -
aligning with C++11)
* compare_exchange_weak
* Explicit memory order specification
* Basic freestanding template overloads for C++
This gives our existing C implementation essentially all the functionality
needed by C++11.
An actual Atomic<T> template based upon these C functions could follow.
Add atomic load and store functions, and add barriers to the existing atomic
functions.
File currently has no explicit barriers - we don't support SMP, so don't
need CPU barriers.
But we do need to worry about compiler barriers - particularly if link time
optimisation is activated so that the compiler can see inside these
functions. The assembler or intrinsics that access PRIMASK for
enter/exit critical act as barriers, but LDREX, STREX and simple
volatile pointer loads and stores do not.