Reimplement atomic code in inline assembly. This can improve
optimisation, and avoids potential architectural problems with using
LDREX/STREX intrinsics.
API further extended:
* Bitwise operations (fetch_and/fetch_or/fetch_xor)
* fetch_add and fetch_sub (like incr/decr, but returning old value -
aligning with C++11)
* compare_exchange_weak
* Explicit memory order specification
* Basic freestanding template overloads for C++
This gives our existing C implementation essentially all the functionality
needed by C++11.
An actual Atomic<T> template based upon these C functions could follow.
1. For SHA AC, use atomic flag to manage its ownership.
(1) Nuvoton SHA AC doesn't support SHA context save & restore, so S/W
SHA fallback has been supported before. To make non-blocking 'acquire'
semantics clearer, introduce 'try_acquire' to substitute for 'acquire'.
(2) No biting CPU due to mechanism above.
(3) No deadlock due to mechanism above.
2. For AES/DES/ECC AC, change to mutex to manage their ownership.
(1) Change crypto-misc.c to crypto-misc.cpp to utilize C++ SingletonPtr
which guarantees thread-safe mutex construct-on-first-use.
(2) With change to crypto-misc.cpp, add 'extern "C"' modifier to CRYPTO_IRQHandler()
to avoid name mangling in C++.
(3) No priority inversion because mutex has osMutexPrioInherit attribute
bit set.
(4) No deadlock because these AC are all locked for a short sequence
of operations rather than the whole lifetime of mbedtls context.
(5) For double mbedtls_internal_ecp_init() issue, it has been fixed in upper
mbedtls layer. So no need to change ecc init/free flow.