In some multithread cases there is possibility that process_oob function
was called after ATHandler was deleted. Fix is to wait if oob processing
is ongoing.
Reimplement atomic code in inline assembly. This can improve
optimisation, and avoids potential architectural problems with using
LDREX/STREX intrinsics.
API further extended:
* Bitwise operations (fetch_and/fetch_or/fetch_xor)
* fetch_add and fetch_sub (like incr/decr, but returning old value -
aligning with C++11)
* compare_exchange_weak
* Explicit memory order specification
* Basic freestanding template overloads for C++
This gives our existing C implementation essentially all the functionality
needed by C++11.
An actual Atomic<T> template based upon these C functions could follow.