Reimplement atomic code in inline assembly. This can improve
optimisation, and avoids potential architectural problems with using
LDREX/STREX intrinsics.
API further extended:
* Bitwise operations (fetch_and/fetch_or/fetch_xor)
* fetch_add and fetch_sub (like incr/decr, but returning old value -
aligning with C++11)
* compare_exchange_weak
* Explicit memory order specification
* Basic freestanding template overloads for C++
This gives our existing C implementation essentially all the functionality
needed by C++11.
An actual Atomic<T> template based upon these C functions could follow.
Adding new modules inside the namespace could be breaking change for existing code base
hence add `using namespace::class` for classes newly added to mbed namespace to maintian
backwards compatibility.
MBED_NO_GLOBAL_USING_DIRECTIVE is added to remove auto-addition of namespace
Macro guard `MBED_NO_GLOBAL_USING_DIRECTIVE` is added around namespace, to avoid
polluting users namespace.