mbed-os

Commit Graph

Author	SHA1	Message	Date
Adam Green	7dddd9e578	Asm versions of netstack memcpy() and lwip_standard_chksum() For tests such as TCPEchoServer (http://mbed.org/users/emilmont/notebook/networking-libraries-benchmark/) this change showed a 28% improvement (14Mbps to 18Mbps) when the echo test was modified to instead use 1K data buffers. I targetted these two functions based on manual profiling samples which showed that a great deal of time was being spent in these two functions when the network stack was being slammed with UDP packets.	2013-08-30 06:09:16 -07:00
Adam Green	657e6df203	Updates to network code to improve performance/robustness I started out looking at some UDP receive code that was only able to handle 3 inbound 550 byte datagrams out of 16 when sent in quick succession. I stepped through the ethernet driver code and it seemed to work as expected but it just couldn't queue up more than 3 PBUFs for each burst. It was almost like it was being starved of CPU cycles. Based on that observation, I looked up the thread priorities for the receive ethernet thread and found the following close to the top of the lpc17_emac.c source file: #define RX_PRIORITY (osPriorityNormal) This got me to thinking, what is the priority of the tcp thead? It turns out that it gets its priority from the following line in lwipopts.h: #define TCPIP_THREAD_PRIO 1 Interesting! What priority is 1? It turns out that it corresponds to osPriorityAboveNormal. This means that while the tcp thread is handling one packet that has been posted to its mailbox from the ethernet receive thread, the receive thread is starved from processing any more inbound ethernet packets. What happens if we set TCP_IP_THREAD_PRIO to osPriorityNormal? Crash! The ethernet driver ends up crashing in lpc_low_level_input() when it tries to set p->len on a NULL p pointer. The p pointer ended up being NULL because an earlier call to pbuf_alloc() in lpc_rx_queue() failed its allocation (I will have more to say about this failed allocation later since that is caused by yet another bug). I pulled a fix from http://lpcware.com/content/bugtrackerissue/lpc17xx-mac-bugs to remedy this issue. When the pbuf allocation fails, it discards the inbound packet in the pbuf and just puts it back into the rx queue. This means we never end up with a NULL pointer in that queue to dereference and crash on. With that bug fixed, the application would just appear to hang after receiving and processing a few datagrams. I could place breakpoints in the packet_rx() thread function and found that it was being signalled by the ethernet ISR but it was always failing to allocate new PBUFs, which is what led to our previous crash. This means that the new crash prevention code was just discarding every packet that arrived. Why are these allocations failing? In my opinion, this was the most interesting bug to track down. Is there a memory leak somewhere in the code which maybe only triggers in low memory situations? I figured the easiest way to determine that would be to learn a bit about the format of the lwIP heap from which the PBUF was failing to be allocated. I started by just stepping into the failing lwIP memory allocator, mem_malloc(). The loop which search the free list starts with this code: for (ptr = (mem_size_t)((u8_t *)lfree - ram); This loop didn't even go through one iteration and when I looked at the initial ptr value it contained a really large value. It turns out that lfree was actually lower than ram. At this point I figured that lfree had probably been corrupted during a free operation after one of the heap allocations had been underflowed/overflowed to cause the metadata for an allocation to be corrupted. As I started thinking about how to track that kind of bug down, I noticed that the ram variable might be too large (0x20080a68). I restarted the debugger and looked at the initial value. It was at a nice even address (0x2007c000) and certainly nothing like what I saw when the allocations were failing. This global variable shouldn't change at all during the execution of the program. I placed a memory access watchpoint on this ram variable and it fired very quickly inside of the rt_mbx_send() function. The ram variable was being changed by this line in rt_mbx_send(): p_MCB->msg[p_MCB->first] = p_msg; What the what? Why does writing to the mailbox queue overwrite the ram global variable? Let's start by looking at the data structure used in the lwIP port to target RTX (defined in sys_arch.h): // === MAIL BOX === typedef struct { osMessageQId id; osMessageQDef_t def; uint32_t queue[MB_SIZE]; } sys_mbox_t; Compare that to the utility macro that RTX defines to help setup one of these mailboxes with queue: #define osMessageQDef(name, queue_sz, type) \ uint32_t os_messageQ_q_##name[4+(queue_sz)]; \ osMessageQDef_t os_messageQ_def_##name = \ { (queue_sz), (os_messageQ_q_##name) } Note the 4+(queue_sz) used in the definition of the message queue array. What a hack! The RTX OS requires an extra 16 bytes to contain its OS_MCB header and this is how it adds it in. Obviously the sys_mbox_t structure used in the lwIP OS targetting code doesn't have this. Without it, the RTX mailbox routines end up scribbling on memory following the structure in memory. Adding 4 in that structure fixes the memory allocation failure that I was seeing and now the network stack can handle between 7 and 10 datagrams within a burst.	2013-08-15 03:08:47 -07:00
Adam Green	6e95a5ecde	Removed extra ALIGNED macro define	2013-08-14 20:17:53 -07:00
Emilio Monti	5c20760685	Initial commit of the mbed libraries and tools	2013-02-18 15:32:11 +00:00

4 Commits (2e32c71b50171aa1bc165d5e7b3281ab3e62ecd2)