Updates to network code to improve performance/robustness

I started out looking at some UDP receive code that was only able to
handle 3 inbound 550 byte datagrams out of 16 when sent in quick
succession.  I stepped through the ethernet driver code and it
seemed to work as expected but it just couldn't queue up more than
3 PBUFs for each burst.  It was almost like it was being starved of
CPU cycles.  Based on that observation, I looked up the thread
priorities for the receive ethernet thread and found the following
close to the top of the lpc17_emac.c source file:
    #define RX_PRIORITY   (osPriorityNormal)
This got me to thinking, what is the priority of the tcp thead?  It
turns out that it gets its priority from the following line in
lwipopts.h:
    #define TCPIP_THREAD_PRIO           1
Interesting!  What priority is 1?  It turns out that it corresponds
to osPriorityAboveNormal.  This means that while the tcp thread is
handling one packet that has been posted to its mailbox from the
ethernet receive thread, the receive thread is starved from processing
any more inbound ethernet packets.

What happens if we set TCP_IP_THREAD_PRIO to osPriorityNormal?  Crash!
The ethernet driver ends up crashing in lpc_low_level_input() when
it tries to set p->len on a NULL p pointer.  The p pointer ended up
being NULL because an earlier call to pbuf_alloc() in lpc_rx_queue()
failed its allocation (I will have more to say about this failed
allocation later since that is caused by yet another bug).  I pulled a
fix from http://lpcware.com/content/bugtrackerissue/lpc17xx-mac-bugs to
remedy this issue.  When the pbuf allocation fails, it discards the
inbound packet in the pbuf and just puts it back into the rx queue.
This means we never end up with a NULL pointer in that queue to
dereference and crash on.

With that bug fixed, the application would just appear to hang after
receiving and processing a few datagrams.  I could place breakpoints in
the packet_rx() thread function and found that it was being signalled
by the ethernet ISR but it was always failing to allocate new PBUFs,
which is what led to our previous crash.  This means that the new
crash prevention code was just discarding every packet that arrived.

Why are these allocations failing?  In my opinion, this was the most
interesting bug to track down.  Is there a memory leak somewhere in
the code which maybe only triggers in low memory situations?  I
figured the easiest way to determine that would be to learn a bit
about the format of the lwIP heap from which the PBUF was failing to
be allocated.  I started by just stepping into the failing lwIP memory
allocator, mem_malloc().  The loop which search the free list starts
with this code:
    for (ptr = (mem_size_t)((u8_t *)lfree - ram);
This loop didn't even go through one iteration and when I looked at the
initial ptr value it contained a really large value.  It turns out that
lfree was actually lower than ram.  At this point I figured that lfree
had probably been corrupted during a free operation after one of the
heap allocations had been underflowed/overflowed to cause the metadata
for an allocation to be corrupted.  As I started thinking about how to
track that kind of bug down, I noticed that the ram variable might be
too large (0x20080a68).  I restarted the debugger and looked at the
initial value.  It was at a nice even address (0x2007c000) and
certainly nothing like what I saw when the allocations were failing.
This global variable shouldn't change at all during the execution of
the program.  I placed a memory access watchpoint on this ram variable
and it fired very quickly inside of the rt_mbx_send() function.  The
ram variable was being changed by this line in rt_mbx_send():
    p_MCB->msg[p_MCB->first] = p_msg;

What the what?  Why does writing to the mailbox queue overwrite the
ram global variable?  Let's start by looking at the data structure used
in the lwIP port to target RTX (defined in sys_arch.h):
// === MAIL BOX ===

typedef struct {
    osMessageQId    id;
    osMessageQDef_t def;
    uint32_t        queue[MB_SIZE];
} sys_mbox_t;

Compare that to the utility macro that RTX defines to help setup one of
these mailboxes with queue:
    #define osMessageQDef(name, queue_sz, type)   \
    uint32_t os_messageQ_q_##name[4+(queue_sz)]; \
    osMessageQDef_t os_messageQ_def_##name = \
    { (queue_sz), (os_messageQ_q_##name) }
Note the 4+(queue_sz) used in the definition of the message queue
array.  What a hack!  The RTX OS requires an extra 16 bytes to contain
its OS_MCB header and this is how it adds it in.  Obviously the
sys_mbox_t structure used in the lwIP OS targetting code doesn't have
this.  Without it, the RTX mailbox routines end up scribbling on
memory following the structure in memory.  Adding 4 in that structure
fixes the memory allocation failure that I was seeing and now the network
stack can handle between 7 and 10 datagrams within a burst.
pull/32/head
Adam Green 2013-08-14 22:24:57 -07:00
parent 9d846d9902
commit 657e6df203
3 changed files with 29 additions and 11 deletions

View File

@ -424,10 +424,31 @@ static struct pbuf *lpc_low_level_input(struct netif *netif)
p = lpc_enetif->rxb[idx];
p->len = (u16_t) length;
/* Free pbuf from desriptor */
/* Free pbuf from descriptor */
lpc_enetif->rxb[idx] = NULL;
lpc_enetif->rx_free_descs++;
/* Attempt to queue new buffer(s) */
if (lpc_rx_queue(lpc_enetif->netif) == 0) {
/* Drop the frame due to OOM. */
LINK_STATS_INC(link.drop);
/* Re-queue the pbuf for receive */
lpc_rxqueue_pbuf(lpc_enetif, p);
LWIP_DEBUGF(UDP_LPC_EMAC | LWIP_DBG_TRACE,
("lpc_low_level_input: Packet index %d dropped for OOM\n",
idx));
#ifdef LOCK_RX_THREAD
#if NO_SYS == 0
sys_mutex_unlock(&lpc_enetif->TXLockMutex);
#endif
#endif
return NULL;
}
LWIP_DEBUGF(UDP_LPC_EMAC | LWIP_DBG_TRACE,
("lpc_low_level_input: Packet received: %p, size %d (index=%d)\n",
p, length, idx));
@ -435,9 +456,6 @@ static struct pbuf *lpc_low_level_input(struct netif *netif)
/* Save size */
p->tot_len = (u16_t) length;
LINK_STATS_INC(link.recv);
/* Queue new buffer(s) */
lpc_rx_queue(lpc_enetif->netif);
}
}

View File

@ -51,7 +51,7 @@ typedef struct {
osMessageQId id;
osMessageQDef_t def;
#ifdef CMSIS_OS_RTX
uint32_t queue[MB_SIZE];
uint32_t queue[4+MB_SIZE]; /* The +4 is required for RTX OS_MCB overhead. */
#endif
} sys_mbox_t;

View File

@ -29,14 +29,14 @@
#define LWIP_RAW 0
#define TCPIP_MBOX_SIZE 6
#define DEFAULT_TCP_RECVMBOX_SIZE 6
#define DEFAULT_UDP_RECVMBOX_SIZE 6
#define DEFAULT_RAW_RECVMBOX_SIZE 6
#define DEFAULT_ACCEPTMBOX_SIZE 6
#define TCPIP_MBOX_SIZE 8
#define DEFAULT_TCP_RECVMBOX_SIZE 8
#define DEFAULT_UDP_RECVMBOX_SIZE 8
#define DEFAULT_RAW_RECVMBOX_SIZE 8
#define DEFAULT_ACCEPTMBOX_SIZE 8
#define TCPIP_THREAD_STACKSIZE 1024
#define TCPIP_THREAD_PRIO 1
#define TCPIP_THREAD_PRIO (osPriorityNormal)
#define DEFAULT_THREAD_STACKSIZE 512