Avoid template ambiguities using type_identity_t.
Previously the compiler would be unable to figure out whether
uint8_t x;
core_util_atomic_store(&x, 0);
should invoke core_util_atomic_store<uint8_t>, matching the pointer
type, or core_util_atomic_store<int>, matching the value, leading to
an ambiguity error.
Templates now select only on the type of the atomic pointer parameter.
Volatile makes no real difference when we're using assembler, or locked
functions, but leaving it off could be more efficient for the basic
loads and stores. So add non-volatile overloads in C++ for them.
Reimplement atomic code in inline assembly. This can improve
optimisation, and avoids potential architectural problems with using
LDREX/STREX intrinsics.
API further extended:
* Bitwise operations (fetch_and/fetch_or/fetch_xor)
* fetch_add and fetch_sub (like incr/decr, but returning old value -
aligning with C++11)
* compare_exchange_weak
* Explicit memory order specification
* Basic freestanding template overloads for C++
This gives our existing C implementation essentially all the functionality
needed by C++11.
An actual Atomic<T> template based upon these C functions could follow.