lwip tcp_tw_pcbs list problem in tcp_slowtmr()

2019-07-14 09:33发布

lwip tcp_tw_pcbs list problem in tcp_slowtmr()

I have been having a problem in the tcp_slowtmr() function in tcp.c.  I have been using the raw api for a quite a while to implement TCP servers listening on several different ports.  I have not really had any problems so far.  Recently I have also implemented TCP client connection which quickly open up a TCP client connection read/write some data and then close the connection.  This sequence repeats itself over and over connecting to several different remote TCP servers.  Things seem to work fairly well until I start introducing some error conditions like extending remote server resonse times or reducing my timeouts waiting for data to be received.  This causes my code to timeout and close the TCP connection (probably while some responses may come back in later).  I have also tried disconnecting some of my remote servers network connections so that the initial client connection attempts will fail.  I am just mainly trying to do some general stress testing with normal conditions that may occur when deploying the application.



The problem that I am having occurs when adding these additional stress tests, and possibly with normal conditions after an extended period of time.  I have not nailed down the exact cause as of yet.  I finally get into a lock up condition when calling the tcp_slowtmr() function.  The lockup occurs cycling through the code lines highlighted below from the tcp_slowtmr() function:



  /* Steps through all of the TIME-WAIT PCBs. */

prev = NULL;

  pcb = tcp_tw_pcbs;

  while (pcb != NULL) {

    LWIP_ASSERT("tcp_slowtmr: TIME-WAIT pcb->state == TIME-WAIT", pcb->state == TIME_WAIT);

    pcb_remove = 0;



    /* Check if this PCB has stayed long enough in TIME-WAIT */

    if ((u32_t)(tcp_ticks - pcb->tmr) > 2 * TCP_MSL / TCP_SLOW_INTERVAL) {

      ++pcb_remove;

    }

   





    /* If the PCB should be removed, do it. */

    if (pcb_remove) {

      struct tcp_pcb *pcb2;

      tcp_pcb_purge(pcb);

      /* Remove PCB from tcp_tw_pcbs list. */

      if (prev != NULL) {

        LWIP_ASSERT("tcp_slowtmr: middle tcp != tcp_tw_pcbs", pcb != tcp_tw_pcbs);

        prev->next = pcb->next;

      } else {

        /* This PCB was the first. */

        LWIP_ASSERT("tcp_slowtmr: first pcb == tcp_tw_pcbs", tcp_tw_pcbs == pcb);

        tcp_tw_pcbs = pcb->next;

      }

      pcb2 = pcb;

      pcb = pcb->next;

      memp_free(MEMP_TCP_PCB, pcb2);

    } else {

      prev = pcb;

      pcb = pcb->next;

    }

  }



The problem occurs when the first item on the tcp_tw_pcbs list points back to itself:  pcb->next == pcb, so the code never exits the while loop.  The tcp_ticks value never changes in this part of the code so pcb_remove is never set > 0 either.



This is a single threaded application and only the standard interrupt handling function are being used.  This is an application running on an LPC4350 using the LPCOpen library from NXP with lwip v1.4.1.



It seems like the problem is created if I start calling tcp_close() to close my client connections.  If I use tcp_abort() instead then I don’t seem to have to problem – It does however cause undesirable sequences in wireshark.



What is the recommended sequence to close a tcp client session using the raw api?



Any suggestions as to what I may be doing wrong here or could this possibly be a bug that has been seen before in lwip?



Thanks,
Greg Dunn