Add a C++ `Atomic<T>` template to make atomics even easier. Basically
compatible with C++11 `std::atomic<T>`, but using the underlying
`core_util_atomic_xxx` functions from mbed_atomic.h, so appropriate for
synchronising with interrupts and optimised for uniprocessor.
One extra piece of functionality beyond the `core_util_atomic_xxx`
functions is the ability to have an arbitrary atomic type - eg a small
structure with 2 `uint16_t`s can be stored in a `uint32_t` container.