In gcc4mbed, I have been running with "-Wall -Wextra" and then
disabling a couple of noisy warnings that result. In particular, I
disable the unused-parameter and missing-field-initializers warnings.
The first commonly goes off for implementation of virtual methods or
other overridable functions where not all parameters are required for
every override. I don't find the second warning to be all that useful
anyway since missing structure field initializers will be set to 0
according to the C language specification. The RTOS code uses this
language feature and I see no reason that it shouldn't :)
The following line in USBHAL_KL25Z.cpp would generate a warning in GCC
because of a potential operator precendence issue:
return((USB0->FRMNUML | (USB0->FRMNUMH << 8) & 0x07FF));
This would have been interpreted as:
return((USB0->FRMNUML | ((USB0->FRMNUMH << 8) & 0x07FF)));
since & has higher precedence than |
I switched it to be:
return((USB0->FRMNUML | (USB0->FRMNUMH << 8)) & 0x07FF);
Since it makes more sense to & with 0x7FF after having merged the lower
and upper bytes together rather than just the upper byte. It should
have resulted in the same value either way.
The SDFileSystem class contained a few routines which compared a signed
integer loop index to an unsigned integer length/size. I switched the
loop index to be uint32_t as well.
From Adam Green, regarding using -fno-delete-null-pointer-checks:
"I would argue that on Cortex-M processors, it is more dangerous to not
have it. The compiler can actually generate incorrect code because it is
making an incorrect assumption (that reads from a NULL pointer will throw
an exception.) The GCC for ARM developers should actually never enable
the delete-null-pointer-checks optimization for Cortex-M processors.
There is a comment in the GCC manual that indicates, "Some targets,
especially embedded ones, disable this option [delete-null-pointer-checks]
at all levels." Not having this flag is pretty risky on the current
versions of GCC_ARM. Just to clarify, this flag doesn't enable an
optimization...it disables an unsafe optimization."
The code in netif_set_ipaddr would read the memory pointed to by its
ipaddr parameter, even if it was NULL on this line:
if ((ip_addr_cmp(ipaddr, &(netif->ip_addr))) == 0) {
On the Cortex-M3, it is typically OK to read from address 0 so this
code will actually compare the reset stack pointer value to the
current value in netif->ip_addr.
Later in the code, this same pointer will be used for a second read:
ip_addr_set(&(netif->ip_addr), ipaddr);
The ip_addr_set call will first check to see if the ipaddr is NULL and
if so, treats it like IP_ADDR_ANY (4 bytes of 0).
/** Safely copy one IP address to another (src may be NULL) */
#define ip_addr_set(dest, src) ((dest)->addr = \
((src) == NULL ? 0 : \
(src)->addr))
The issue here is that when GCC optimizes this code, it assumes that
the first dereference of ipaddr would have thrown an invalid memory
access exception and execution would never make it to this second
dereference. Therefore it optimizes out the NULL check in ip_addr_set.
The -fno-delete-null-pointer-checks will disable this optimization and
is a good thing to use with GCC in general on Cortex-M parts. I will
let the mbed guys make that change to their build system.
I have however corrected the code so that the intent of how to handle a
NULL ipaddr is more obvious and gets rid of the potential NULL
dereference.
By the way, this bug caused connect() to fail in obtaining an
address from DHCP. If I recall correctly from when I first debugged
this issue (late last year), I actually saw the initial value of the
stack pointer being used in the DHCP request as an IP address which
caused it to be rejected.
Peter's and my changes to LPC1768.ld ended up adding the same AHBSRAM0
and AHBSRAM1 section clauses to the script twice. I removed one copy.
I also pulled Peter's define of the ETHMEM_SECTION macro up into the
previous nested #if so that the preprocessor wouldn't spit out a
redefined macro warning.
I verified that building the code clean before and after these changes
still results in the same .bin file but now without warnings and/or
duplicate code.
I started out looking at some UDP receive code that was only able to
handle 3 inbound 550 byte datagrams out of 16 when sent in quick
succession. I stepped through the ethernet driver code and it
seemed to work as expected but it just couldn't queue up more than
3 PBUFs for each burst. It was almost like it was being starved of
CPU cycles. Based on that observation, I looked up the thread
priorities for the receive ethernet thread and found the following
close to the top of the lpc17_emac.c source file:
#define RX_PRIORITY (osPriorityNormal)
This got me to thinking, what is the priority of the tcp thead? It
turns out that it gets its priority from the following line in
lwipopts.h:
#define TCPIP_THREAD_PRIO 1
Interesting! What priority is 1? It turns out that it corresponds
to osPriorityAboveNormal. This means that while the tcp thread is
handling one packet that has been posted to its mailbox from the
ethernet receive thread, the receive thread is starved from processing
any more inbound ethernet packets.
What happens if we set TCP_IP_THREAD_PRIO to osPriorityNormal? Crash!
The ethernet driver ends up crashing in lpc_low_level_input() when
it tries to set p->len on a NULL p pointer. The p pointer ended up
being NULL because an earlier call to pbuf_alloc() in lpc_rx_queue()
failed its allocation (I will have more to say about this failed
allocation later since that is caused by yet another bug). I pulled a
fix from http://lpcware.com/content/bugtrackerissue/lpc17xx-mac-bugs to
remedy this issue. When the pbuf allocation fails, it discards the
inbound packet in the pbuf and just puts it back into the rx queue.
This means we never end up with a NULL pointer in that queue to
dereference and crash on.
With that bug fixed, the application would just appear to hang after
receiving and processing a few datagrams. I could place breakpoints in
the packet_rx() thread function and found that it was being signalled
by the ethernet ISR but it was always failing to allocate new PBUFs,
which is what led to our previous crash. This means that the new
crash prevention code was just discarding every packet that arrived.
Why are these allocations failing? In my opinion, this was the most
interesting bug to track down. Is there a memory leak somewhere in
the code which maybe only triggers in low memory situations? I
figured the easiest way to determine that would be to learn a bit
about the format of the lwIP heap from which the PBUF was failing to
be allocated. I started by just stepping into the failing lwIP memory
allocator, mem_malloc(). The loop which search the free list starts
with this code:
for (ptr = (mem_size_t)((u8_t *)lfree - ram);
This loop didn't even go through one iteration and when I looked at the
initial ptr value it contained a really large value. It turns out that
lfree was actually lower than ram. At this point I figured that lfree
had probably been corrupted during a free operation after one of the
heap allocations had been underflowed/overflowed to cause the metadata
for an allocation to be corrupted. As I started thinking about how to
track that kind of bug down, I noticed that the ram variable might be
too large (0x20080a68). I restarted the debugger and looked at the
initial value. It was at a nice even address (0x2007c000) and
certainly nothing like what I saw when the allocations were failing.
This global variable shouldn't change at all during the execution of
the program. I placed a memory access watchpoint on this ram variable
and it fired very quickly inside of the rt_mbx_send() function. The
ram variable was being changed by this line in rt_mbx_send():
p_MCB->msg[p_MCB->first] = p_msg;
What the what? Why does writing to the mailbox queue overwrite the
ram global variable? Let's start by looking at the data structure used
in the lwIP port to target RTX (defined in sys_arch.h):
// === MAIL BOX ===
typedef struct {
osMessageQId id;
osMessageQDef_t def;
uint32_t queue[MB_SIZE];
} sys_mbox_t;
Compare that to the utility macro that RTX defines to help setup one of
these mailboxes with queue:
#define osMessageQDef(name, queue_sz, type) \
uint32_t os_messageQ_q_##name[4+(queue_sz)]; \
osMessageQDef_t os_messageQ_def_##name = \
{ (queue_sz), (os_messageQ_q_##name) }
Note the 4+(queue_sz) used in the definition of the message queue
array. What a hack! The RTX OS requires an extra 16 bytes to contain
its OS_MCB header and this is how it adds it in. Obviously the
sys_mbox_t structure used in the lwIP OS targetting code doesn't have
this. Without it, the RTX mailbox routines end up scribbling on
memory following the structure in memory. Adding 4 in that structure
fixes the memory allocation failure that I was seeing and now the network
stack can handle between 7 and 10 datagrams within a burst.
The phy_speed_100mbs, phy_full_duplex, and phy_link_active fields of
PHY_STATUS_TYPE are 1 bit wide but lpc_phy_init() attempted to
initialize them to a value of 2. I switched the initializations to
be 0 instead and it still generated the same .bin image.
The first was a potential out of range index read in dhcp_handle_ack().
The (n < DNS_MAX_SERVERS) check should occur first. There is also a
documented lwIP bug for this issue here:
http://savannah.nongnu.org/bugs/?36170
In dhcp_bind() there is no need to perform the NULL check in
ip_addr_isany() for &gw_addr. Just check (gw_addr.addr == IPADDR_ANY)
instead.
I refactored the chaddr[] copy in dhcp_create_msg() to first copy all
of the valid bytes in hwaddr and then pad the rest of the bytes with 0.
Before it used to check on every destination byte if it should copy or
pad. GCC originally complained about an index out of range read from
the hwaddr[] array even though it was protected by a conditional
operator. The refactor makes the intent a bit clearer and saves the
extra comparison per loop iteration. It also stops GCC from
complaining :)
GCC will issue a warning when the ip_addr_isany() macro is used on
a pointer which can never be NULL since the macros NULL check will
always be false:
#define ip_addr_isany(addr1) ((addr1) == NULL || \
(addr1)->addr == IPADDR_ANY)
In these cases, it is probably clearer to just perform the
x.addr == IPADDR_ANY check inline.
The dn variable in lpc_low_level_output() was originally defined as a
u32_t but it is later compared to the s32_t return value from
lpc_tx_ready(). Since it is intialized to pbuf_clean() which returns
a u8_t, a s32_t type can safely hold the initial value and remains
consistent with the signed lpc_tx_ready() comparison.
I also modifed writtenLen in TCPSocketConnection::send_all() and
readLen in TCPSocketConnection::recieve_all() to be of type int instead
of size_t. This is more consistent with their usage within these
methods (they accumulate int ret values and are compared to the int
length value) and their use as a signed integer return values.
The original script assigned memory ranges to USB_RAM and ETH_RAM but
it never placed any section data in those regions. I added clauses
towards the bottom of the script to place data that the programmer
has marked for the AHBSRAM0 and AHBSRAM1 sections into these regions
of RAM. Previously the data destined for these sections was being
placed in the lower 32K RAM bank and overflowing it at link time.
I also added a few Image$$ linker symbols to mimic those used by the
online compiler. I have had samples in the past which took advantage
of these to display static memory statistics for each SRAM region.
I also changed LENGTH=0x7F38 to LENGTH=(32K - 0xC8) to make it more
consistent with the sizing of the other regions in this script which
use human readable K sizing information. The 0xC8 subtraction reflects
the starting offset of 0xC8 for this region.