Commit Graph

4 Commits (2e32c71b50171aa1bc165d5e7b3281ab3e62ecd2)

Author SHA1 Message Date
Adam Green 7dddd9e578 Asm versions of netstack memcpy() and lwip_standard_chksum()
For tests such as TCPEchoServer
(http://mbed.org/users/emilmont/notebook/networking-libraries-benchmark/)
this change showed a 28% improvement (14Mbps to 18Mbps) when the echo
test was modified to instead use 1K data buffers.

I targetted these two functions based on manual profiling samples which
showed that a great deal of time was being spent in these two functions
when the network stack was being slammed with UDP packets.
2013-08-30 06:09:16 -07:00
Adam Green 657e6df203 Updates to network code to improve performance/robustness
I started out looking at some UDP receive code that was only able to
handle 3 inbound 550 byte datagrams out of 16 when sent in quick
succession.  I stepped through the ethernet driver code and it
seemed to work as expected but it just couldn't queue up more than
3 PBUFs for each burst.  It was almost like it was being starved of
CPU cycles.  Based on that observation, I looked up the thread
priorities for the receive ethernet thread and found the following
close to the top of the lpc17_emac.c source file:
    #define RX_PRIORITY   (osPriorityNormal)
This got me to thinking, what is the priority of the tcp thead?  It
turns out that it gets its priority from the following line in
lwipopts.h:
    #define TCPIP_THREAD_PRIO           1
Interesting!  What priority is 1?  It turns out that it corresponds
to osPriorityAboveNormal.  This means that while the tcp thread is
handling one packet that has been posted to its mailbox from the
ethernet receive thread, the receive thread is starved from processing
any more inbound ethernet packets.

What happens if we set TCP_IP_THREAD_PRIO to osPriorityNormal?  Crash!
The ethernet driver ends up crashing in lpc_low_level_input() when
it tries to set p->len on a NULL p pointer.  The p pointer ended up
being NULL because an earlier call to pbuf_alloc() in lpc_rx_queue()
failed its allocation (I will have more to say about this failed
allocation later since that is caused by yet another bug).  I pulled a
fix from http://lpcware.com/content/bugtrackerissue/lpc17xx-mac-bugs to
remedy this issue.  When the pbuf allocation fails, it discards the
inbound packet in the pbuf and just puts it back into the rx queue.
This means we never end up with a NULL pointer in that queue to
dereference and crash on.

With that bug fixed, the application would just appear to hang after
receiving and processing a few datagrams.  I could place breakpoints in
the packet_rx() thread function and found that it was being signalled
by the ethernet ISR but it was always failing to allocate new PBUFs,
which is what led to our previous crash.  This means that the new
crash prevention code was just discarding every packet that arrived.

Why are these allocations failing?  In my opinion, this was the most
interesting bug to track down.  Is there a memory leak somewhere in
the code which maybe only triggers in low memory situations?  I
figured the easiest way to determine that would be to learn a bit
about the format of the lwIP heap from which the PBUF was failing to
be allocated.  I started by just stepping into the failing lwIP memory
allocator, mem_malloc().  The loop which search the free list starts
with this code:
    for (ptr = (mem_size_t)((u8_t *)lfree - ram);
This loop didn't even go through one iteration and when I looked at the
initial ptr value it contained a really large value.  It turns out that
lfree was actually lower than ram.  At this point I figured that lfree
had probably been corrupted during a free operation after one of the
heap allocations had been underflowed/overflowed to cause the metadata
for an allocation to be corrupted.  As I started thinking about how to
track that kind of bug down, I noticed that the ram variable might be
too large (0x20080a68).  I restarted the debugger and looked at the
initial value.  It was at a nice even address (0x2007c000) and
certainly nothing like what I saw when the allocations were failing.
This global variable shouldn't change at all during the execution of
the program.  I placed a memory access watchpoint on this ram variable
and it fired very quickly inside of the rt_mbx_send() function.  The
ram variable was being changed by this line in rt_mbx_send():
    p_MCB->msg[p_MCB->first] = p_msg;

What the what?  Why does writing to the mailbox queue overwrite the
ram global variable?  Let's start by looking at the data structure used
in the lwIP port to target RTX (defined in sys_arch.h):
// === MAIL BOX ===

typedef struct {
    osMessageQId    id;
    osMessageQDef_t def;
    uint32_t        queue[MB_SIZE];
} sys_mbox_t;

Compare that to the utility macro that RTX defines to help setup one of
these mailboxes with queue:
    #define osMessageQDef(name, queue_sz, type)   \
    uint32_t os_messageQ_q_##name[4+(queue_sz)]; \
    osMessageQDef_t os_messageQ_def_##name = \
    { (queue_sz), (os_messageQ_q_##name) }
Note the 4+(queue_sz) used in the definition of the message queue
array.  What a hack!  The RTX OS requires an extra 16 bytes to contain
its OS_MCB header and this is how it adds it in.  Obviously the
sys_mbox_t structure used in the lwIP OS targetting code doesn't have
this.  Without it, the RTX mailbox routines end up scribbling on
memory following the structure in memory.  Adding 4 in that structure
fixes the memory allocation failure that I was seeing and now the network
stack can handle between 7 and 10 datagrams within a burst.
2013-08-15 03:08:47 -07:00
Adam Green 6e95a5ecde Removed extra ALIGNED macro define 2013-08-14 20:17:53 -07:00
Emilio Monti 5c20760685 Initial commit of the mbed libraries and tools 2013-02-18 15:32:11 +00:00