Add unified syntax directives to make the atomic assembler work when GCC
is building for M23.
GCC actually uses unified syntax when compiling C code, but puts
`.syntax divided` before each piece of inline assembly when targetting
Thumb-1 type devices like M0 and M23 for backwards compatibility. We can
overcome this with our own `.syntax unified`. The command-line option
`-masm-syntax-unified` intended to override this globally has been
broken from GCC 6 to 8.0.
Volatile makes no real difference when we're using assembler, or locked
functions, but leaving it off could be more efficient for the basic
loads and stores. So add non-volatile overloads in C++ for them.
Reimplement atomic code in inline assembly. This can improve
optimisation, and avoids potential architectural problems with using
LDREX/STREX intrinsics.
API further extended:
* Bitwise operations (fetch_and/fetch_or/fetch_xor)
* fetch_add and fetch_sub (like incr/decr, but returning old value -
aligning with C++11)
* compare_exchange_weak
* Explicit memory order specification
* Basic freestanding template overloads for C++
This gives our existing C implementation essentially all the functionality
needed by C++11.
An actual Atomic<T> template based upon these C functions could follow.