HIPL

Merge lp:~martin-lp/hipl/hipl_exp_backoff into lp:hipl

hipl_exp_backoff
Merge into trunk

Proposed by David Martin on 2012-03-07

Status:	Merged
Merged at revision:	6320
Proposed branch:	lp:~martin-lp/hipl/hipl_exp_backoff
Merge into:	lp:hipl
Diff against target:	392 lines (+157/-39) 8 files modified hipd/hadb.c (+1/-0) hipd/hipd.c (+98/-7) hipd/hipd.h (+7/-9) hipd/input.c (+5/-2) hipd/maintenance.c (+40/-19) hipd/maintenance.h (+1/-0) hipd/output.c (+3/-1) lib/core/state.h (+2/-1)
To merge this branch:	bzr merge lp:~martin-lp/hipl/hipl_exp_backoff
Related bugs:	Link a bug report

Reviewer	Date Requested	Status
René Hummen		Approve on 2012-03-08
Diego Biurrun		Approve on 2012-03-08
Miika Komu	2012-03-07	Approve on 2012-03-08
Review via email: mp+96420@code.launchpad.net

Description of the change

This branch introduces an exponential backoff mechanism for retransmissions. Previously retransmissions had static timeouts of 10 seconds before they are sent out again which is an eternity for example in mobility scenarios. In this branch the first retransmission is triggered after 100ms. Timeout for every consecutive retransmission is doubled in classic backoff fashion (100ms, 200ms, 400ms, ..., up to 12seconds).

To make this possible some changes to the hipd select() loop are required.
- before hipd had a timeout of 1 second on the select() call. This means when one second passes without any outgoing or incoming transmissions on the sockets the maintenance routine is called. Part of the maintenance is the check for retransmission. The problem: any retransmission can at the shortest be sent out after this 1 second passes.
- the select() timeout is now set dynamically. Everytime a retransmissions is buffered, sent out or cleared the timeout is updated with the newly introduced hip_update_select_timeout() in hipd.c.
- the timeout value is set to the lowest backoff value of all buffered retransmissions and by default or at maximum to 1 second as before
- maintenance is supposed to be performed every second. To avoid running it too often and screwing up heartbeats the time of the last maintenace call is stored and maintenance is only called when at least 1 second has passed
- even though the select() call has at times shorter timeout values the previous hipd behaviour is not changed: maintenance is only called every second, only the retransmissions are sent out earlier than before.

I have tested these changes on virtual machines with the simulated packet dropping and in our testbed and it seems to work fine.

To test this locally you can simulate packet drops with this patch:

=== modified file 'hipd/output.c'
--- a/hipd/output.c 2012-02-23 15:10:37 +0000
+++ b/hipd/output.c 2012-02-24 09:26:03 +0000
@@ -87,8 +87,8 @@
/* Set to 1 if you want to simulate lost output packet */
#define HIP_SIMULATE_PACKET_LOSS 1
/* Packet loss probability in percents */
-#define HIP_SIMULATE_PACKET_LOSS_PROBABILITY 0
-#define HIP_SIMULATE_PACKET_IS_LOST() (random() < ((uint64_t) HIP_SIMULATE_PACKET_LOSS_PROBABILITY * RAND_MAX) / 100)
+#define HIP_SIMULATE_PACKET_LOSS_PROBABILITY 85
+#define HIP_SIMULATE_PACKET_IS_LOST() ((uint64_t) random() < ((uint64_t) HIP_SIMULATE_PACKET_LOSS_PROBABILITY * RAND_MAX) / 100)

Revision history for this message

Miika Komu (miika-iki) wrote on 2012-03-08:

I am glad that somebody implemented this. Seems ok to me but we better wait for an additional review. Thanks!

review: Approve

Revision history for this message

Diego Biurrun (diego-biurrun) wrote on 2012-03-08:

review approve

On Wed, Mar 07, 2012 at 06:05:28PM +0000, David Martin wrote:
> David Martin has proposed merging lp:~martin-lp/hipl/hipl_exp_backoff into lp:hipl.
>
> Requested reviews:
> HIPL core team (hipl-core)

No real comments from me, looks sane overall. If this is well-tested,
I trust you it will be fine.

> --- hipd/maintenance.c 2012-02-15 17:37:10 +0000
> +++ hipd/maintenance.c 2012-03-07 18:04:31 +0000
> @@ -87,6 +86,31 @@
> static struct hip_ll *maintenance_functions;
>
> /**
> + * Update the retransmission backoff of the given retransmission.
> + * The backoff will simply be doubled and in case the maximum is exceeded the
> + * retransmissions are disabled.

in case the maximum is exceeded retransmissions are

> +static void update_retrans_backoff(struct hip_msg_retrans *const retrans)
> +{
> + retrans->current_backoff = retrans->current_backoff << 1;
> + if (retrans->current_backoff > HIP_RETRANSMIT_BACKOFF_MAX) {
> + HIP_DEBUG("Maximum retransmission backoff reached. Stopping"
> + " retransmission.\n");

retransmissionS I think.

Diego

review: Approve

Revision history for this message

René Hummen (rene-hummen) wrote on 2012-03-08:

Thanks for clarifying.

review: Approve

lp:~martin-lp/hipl/hipl_exp_backoff updated on 2012-03-08

6304. By David Martin on 2012-03-08: Fix typo in update_retrans_backoff() doxygen documentation.

Revision history for this message

David Martin (martin-lp) wrote on 2012-03-08:

Hi,

On Thu, Mar 8, 2012 at 7:00 PM, Diego Biurrun <email address hidden> wrote:
> review approve
>
> On Wed, Mar 07, 2012 at 06:05:28PM +0000, David Martin wrote:
>> David Martin has proposed merging lp:~martin-lp/hipl/hipl_exp_backoff into lp:hipl.
>
> No real comments from me, looks sane overall. If this is well-tested,
> I trust you it will be fine.

Tested it on the N900, netbook and VMs. Did not have a look at the
more exotic HIPL scenarios with shotgunning or relays and what other
stuff there is though. Fingers crossed those will be fine I guess.

>> --- hipd/maintenance.c 2012-02-15 17:37:10 +0000
>> +++ hipd/maintenance.c 2012-03-07 18:04:31 +0000
>> @@ -87,6 +86,31 @@
>> static struct hip_ll *maintenance_functions;
>>
>> /**
>> + * Update the retransmission backoff of the given retransmission.
>> + * The backoff will simply be doubled and in case the maximum is exceeded the
>> + * retransmissions are disabled.
>
> in case the maximum is exceeded retransmissions are

Fixed in revision 6304.

>> +static void update_retrans_backoff(struct hip_msg_retrans *const retrans)
>> +{
>> + retrans->current_backoff = retrans->current_backoff << 1;
>> + if (retrans->current_backoff > HIP_RETRANSMIT_BACKOFF_MAX) {
>> + HIP_DEBUG("Maximum retransmission backoff reached. Stopping"
>> + " retransmission.\n");
>
> retransmissionS I think.

I had the plural before but decided to change it. Directly quoted from
the commit log:

> "Stopping retransmissions." sounds like no more retransmissions will be
> sent at all but it only refers to this specific one. "Stopping "
> retransmission." makes it more clear.

Revision history for this message

Diego Biurrun (diego-biurrun) wrote on 2012-03-08:

On Thu, Mar 08, 2012 at 06:27:30PM +0000, David Martin wrote:
> On Thu, Mar 8, 2012 at 7:00 PM, Diego Biurrun <email address hidden> wrote:
> >> --- hipd/maintenance.c 2012-02-15 17:37:10 +0000
> >> +++ hipd/maintenance.c 2012-03-07 18:04:31 +0000
> >> @@ -87,6 +86,31 @@
> >> +static void update_retrans_backoff(struct hip_msg_retrans *const retrans)
> >> +{
> >> + retrans->current_backoff = retrans->current_backoff << 1;
> >> + if (retrans->current_backoff > HIP_RETRANSMIT_BACKOFF_MAX) {
> >> + HIP_DEBUG("Maximum retransmission backoff reached. Stopping"
> >> + " retransmission.\n");
> >
> > retransmissionS I think.
>
> I had the plural before but decided to change it. Directly quoted from
> the commit log:
>
> > "Stopping retransmissions." sounds like no more retransmissions will be
> > sent at all but it only refers to this specific one. "Stopping "
> > retransmission." makes it more clear.

OK, then keep your version, the singular was intended.

Diego

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

David Martin

HIPL core team

 === modified file 'hipd/hadb.c'
 --- hipd/hadb.c	2012-03-01 14:06:24 +0000
 +++ hipd/hadb.c	2012-03-08 18:19:32 +0000
@@ -837,6 +837,7 @@
      for (i = 0; i < HIP_RETRANSMIT_QUEUE_SIZE; i++) {
          free(ha->hip_msg_retrans[i].buf);
+     }
++    hip_update_select_timeout();
      if (ha->peer_pub) {
          switch (hip_get_host_id_algo(ha->peer_pub)) {
          case HIP_HI_RSA:
 === modified file 'hipd/hipd.c'
 --- hipd/hipd.c	2012-02-15 17:37:10 +0000
 +++ hipd/hipd.c	2012-03-08 18:19:32 +0000
@@ -60,6 +60,7 @@
  #include "lib/core/util.h"
  #include "config.h"
  #include "accessor.h"
++#include "hadb.h"
  #include "hip_socket.h"
  #include "init.h"
  #include "maintenance.h"
@@ -74,6 +75,12 @@
   *  nf_ipsec for this purpose). */
  struct rtnl_handle hip_nl_route;
++/* The timeout for the select call in the main loop. */
++static struct timeval select_timeout;
++/* Shortest backoff (in microseconds) of all retransmissions of all HAs.
++ * Used to determine the required select timeout. */
++static uint64_t shortest_backoff;
++
  /**
   * print hipd usage instructions on stderr
   */
@@ -170,14 +177,74 @@
+ }
  /**
++ * Determine the lowest retransmission backoff of all retransmissions in the
++ * given host association.
++ *
++ * Note: The lowest retransmission backoff will be written to the static
++ *       variable shortest_backoff. The caller is responsible for initializing
++ *       this variable as the actual retransmission backoffs are compared
++ *       against it.
++ *
++ * @param hadb The hadb state from which the lowest retransmission backoff will
++ *             be determined.
++ * @param opaq UNUSED.
++ *
++ * @return Always 0. Not void because hip_for_each_ha() requires a return value.
++ */
++static int get_shortest_retrans_backoff(struct hip_hadb_state *hadb, UNUSED void *opaq)
++{
++    for (unsigned int i = 0; i < HIP_RETRANSMIT_QUEUE_SIZE; i++) {
++        struct hip_msg_retrans *retrans = &hadb->hip_msg_retrans[i];
++
++        if (shortest_backoff == HIP_RETRANSMIT_BACKOFF_MIN) {
++            break;
++        } else if (retrans->count > 0 && retrans->current_backoff < shortest_backoff) {
++            shortest_backoff = retrans->current_backoff;
++        }
++    }
++
++    return 0;
++}
++
++/**
++ * Update the select timeout with respect to the currently outstanding
++ * retransmissions. If there are no retransmissions the timeout will be
++ * set to the HIP_SELECT_TIMEOUT default value. Else it will be set to the
++ * minimum backoff of all retransmissions.
++ *
++ * @return  0 on success
++ *         -1 on error
++ */
++int hip_update_select_timeout(void)
++{
++    uint64_t last_backoff = shortest_backoff;
++
++    shortest_backoff = HIP_SELECT_TIMEOUT_USEC;
++    if (hip_for_each_ha(get_shortest_retrans_backoff, NULL) < 0) {
++        HIP_ERROR("Failed to determine shortest retransmission backoff.\n");
++        return -1;
++    }
++
++    if (shortest_backoff != last_backoff) {
++        select_timeout.tv_sec  = shortest_backoff / 1000000;
++        select_timeout.tv_usec = shortest_backoff % 1000000;
++        HIP_DEBUG("select() timeout set to %" PRIu64 "ms.\n",
++                  shortest_backoff / 1000);
++    }
++
++    return 0;
++}
++
++/**
   * Daemon "main" function.
   * @param flags startup flags
   * @return      0 on success, negative error code otherwise
   */
  int hipd_main(uint64_t flags)
+ {
--    int                       highest_descriptor = 0, err = 0;
--    struct timeval            timeout;
++    int                       highest_descriptor        = 0, err = 0;
++    bool                      maintenance_was_performed = true;
++    struct timeval            last_maintenance          = { 0 };
      fd_set                    read_fdset;
      struct hip_packet_context ctx = { 0 };
@@ -259,14 +326,20 @@
      hip_perf_stop_benchmark(perf_set, PERF_STARTUP);
      hip_perf_write_benchmark(perf_set, PERF_STARTUP);
  #endif
++
++    select_timeout.tv_sec  = HIP_SELECT_TIMEOUT;
++    select_timeout.tv_usec = 0;
++
      while (hipd_get_state() != HIPD_STATE_CLOSED) {
++        /* The select() call modifies the provided timeout struct timeval.
++         * This variable indirection makes sure that the correct timeout value
++         * is used in every loop iteration. */
++        struct timeval timeout = select_timeout;
++
          hip_prepare_fd_set(&read_fdset);
          hipfw_sock = hip_user_sock;
--        timeout.tv_sec  = HIP_SELECT_TIMEOUT;
--        timeout.tv_usec = 0;
--
  #ifdef CONFIG_HIP_FIREWALL
          if (hipfw_status < 0) {
              hipfw_addr.sin6_family = AF_INET6;
@@ -287,6 +360,11 @@
+         }
  #endif
++        if (maintenance_was_performed) {
++            gettimeofday(&last_maintenance, NULL);
++            maintenance_was_performed = false;
++        }
++
          err = select(highest_descriptor + 1, &read_fdset, NULL, NULL, &timeout);
          if (err < 0) {
@@ -294,13 +372,26 @@
              goto to_maintenance;
          } else if (err == 0) {
              /* idle cycle - select() timeout */
--            goto to_maintenance;
++            struct timeval now;
++
++            /* Maintenance is supposed to be called every HIP_SELECT_TIMEOUT
++             * seconds. Retransmissions have a higher frequency. Therefore call
++             * the retransmission scan after every select timeout and maintenance
++             * only when HIP_SELECT_TIMEOUT seconds have passed. */
++            gettimeofday(&now, NULL);
++            if (calc_timeval_diff(&last_maintenance, &now) > HIP_SELECT_TIMEOUT_USEC) {
++                goto to_maintenance;
++            } else {
++                hip_scan_retransmissions();
++                continue;
++            }
+         }
          hip_run_socket_handles(&read_fdset, &ctx);
  to_maintenance:
--        err = hip_periodic_maintenance();
++        err                       = hip_periodic_maintenance();
++        maintenance_was_performed = true;
          if (err) {
              HIP_ERROR("Error (%d) ignoring. %s\n", err,
                        ((errno) ? strerror(errno) : ""));
 === modified file 'hipd/hipd.h'
 --- hipd/hipd.h	2012-03-01 14:06:24 +0000
 +++ hipd/hipd.h	2012-03-08 18:19:32 +0000
@@ -38,15 +38,12 @@
  #define HIP_HIT_DEV "dummy0"
--#define HIP_SELECT_TIMEOUT        1
--#define HIP_RETRANSMIT_MAX        5
--#define HIP_RETRANSMIT_INTERVAL   1 /* seconds */
--/* the interval with which the hadb entries are checked for retransmissions */
--#define HIP_RETRANSMIT_INIT \
--    (HIP_RETRANSMIT_INTERVAL / HIP_SELECT_TIMEOUT)
--/* wait about n seconds before retransmitting.
-- * the actual time is between n and n + RETRANSMIT_INIT seconds */
--#define HIP_RETRANSMIT_WAIT 10
++#define HIP_SELECT_TIMEOUT 1
++/* The select timeout in microseconds. */
++#define HIP_SELECT_TIMEOUT_USEC (HIP_SELECT_TIMEOUT * 1000000)
++#define HIP_RETRANSMIT_MAX        10
++#define HIP_RETRANSMIT_BACKOFF_MIN (100 * 1000) /* microseconds */
++#define HIP_RETRANSMIT_BACKOFF_MAX (15 * 1000000) /* microseconds */
  #define HIP_R1_PRECREATE_INTERVAL 60 * 60 /* seconds */
  #define HIP_R1_PRECREATE_INIT (HIP_R1_PRECREATE_INTERVAL / HIP_SELECT_TIMEOUT)
@@ -80,6 +77,7 @@
  /* Functions for handling outgoing packets. */
  int hip_sendto_firewall(const struct hip_common *msg);
++int hip_update_select_timeout(void);
  int hipd_parse_cmdline_opts(int argc, char *argv[], uint64_t * flags);
  int hipd_main(uint64_t flags);
 === modified file 'hipd/input.c'
 --- hipd/input.c	2012-03-01 14:06:24 +0000
 +++ hipd/input.c	2012-03-08 18:19:32 +0000
@@ -1886,8 +1886,10 @@
+ }
  /**
-- * Clear the given retransmission.
-- * i.e. set the remaining retransmissions to zero and zero the buffer.
++ * Clear the given retransmission and update the select() timeout.
++ * i.e. set the remaining retransmissions to zero, zero the buffer and
++ * update the select() timeout as there is one retransmission less to be
++ * considered.
+  *
   * @param retrans The retransmission to be cleared.
   */
@@ -1901,6 +1903,7 @@
      if (retrans->buf) {
          memset(retrans->buf, 0, hip_get_msg_total_len(retrans->buf));
+     }
++    hip_update_select_timeout();
+ }
  /**
 === modified file 'hipd/maintenance.c'
 --- hipd/maintenance.c	2012-02-15 17:37:10 +0000
 +++ hipd/maintenance.c	2012-03-08 18:19:32 +0000
@@ -77,7 +77,6 @@
  struct sockaddr_in6 hipfw_addr = { 0 };
  int                 hipfw_sock = 0;
--static float retrans_counter    = HIP_RETRANSMIT_INIT;
  static float precreate_counter  = HIP_R1_PRECREATE_INIT;
  static int   force_exit_counter = FORCE_EXIT_COUNTER_START;
@@ -87,6 +86,31 @@
  static struct hip_ll *maintenance_functions;
  /**
++ * Update the retransmission backoff of the given retransmission.
++ * The backoff will simply be doubled and in case the maximum is exceeded
++ * retransmissions are disabled.
++ *
++ * @param retrans The retransmission to be updated.
++ */
++static void update_retrans_backoff(struct hip_msg_retrans *const retrans)
++{
++    if (!retrans) {
++        return;
++    }
++
++    retrans->current_backoff = retrans->current_backoff << 1;
++    if (retrans->current_backoff > HIP_RETRANSMIT_BACKOFF_MAX) {
++        HIP_DEBUG("Maximum retransmission backoff reached. Stopping"
++                  " retransmission.\n");
++        hip_clear_retransmission(retrans);
++        return;
++    }
++
++    HIP_DEBUG("Retransmission timeout set to %" PRIu64 "ms.\n",
++              retrans->current_backoff / 1000);
++}
++
++/**
   * an iterator to handle packet retransmission for a given host association
+  *
   * @param entry the host association which to handle
@@ -98,15 +122,15 @@
+ {
      int                     err = 0, i = 0;
      struct hip_msg_retrans *retrans;
--    time_t                 *now = current_time;
++    struct timeval         *now = current_time;
      for (i = 0; i < HIP_RETRANSMIT_QUEUE_SIZE; i++) {
          retrans = &entry->hip_msg_retrans[(entry->next_retrans_slot + i) %
                                            HIP_RETRANSMIT_QUEUE_SIZE];
--        /* check if the last transmission was at least RETRANSMIT_WAIT seconds ago */
--        if (*now - HIP_RETRANSMIT_WAIT > retrans->last_transmit) {
--            if (retrans->count > 0) {
++        if (retrans->count > 0) {
++            if (calc_timeval_diff(&retrans->last_transmit, now) >
++                retrans->current_backoff) {
                  /* @todo: verify that this works over slow ADSL line */
                  if (hip_send_pkt(&retrans->saddr,
                                   &retrans->daddr,
@@ -128,10 +152,12 @@
+                 }
                  retrans->count--;
--                time(&retrans->last_transmit);
--            } else if (hip_get_msg_type(retrans->buf)) {
--                hip_clear_retransmission(retrans);
++                gettimeofday(&retrans->last_transmit, NULL);
++                update_retrans_backoff(retrans);
++                hip_update_select_timeout();
+             }
++        } else if (hip_get_msg_type(retrans->buf)) {
++            hip_clear_retransmission(retrans);
+         }
+     }
@@ -143,10 +169,10 @@
+  *
   * @return zero on success or negative on failure
   */
--static int scan_retransmissions(void)
++int hip_scan_retransmissions(void)
+ {
--    time_t current_time;
--    time(&current_time);
++    struct timeval current_time;
++    gettimeofday(&current_time, NULL);
      if (hip_for_each_ha(handle_retransmissions, &current_time)) {
          return -1;
@@ -261,14 +287,9 @@
       * in closing or closed state, delete them */
      hip_for_each_ha(hip_purge_closing_ha, NULL);
--    if (retrans_counter < 0) {
--        if (scan_retransmissions()) {
--            HIP_ERROR("Retransmission scan failed.\n");
--            return -1;
--        }
--        retrans_counter = HIP_RETRANSMIT_INIT;
--    } else {
--        retrans_counter--;
++    if (hip_scan_retransmissions()) {
++        HIP_ERROR("Retransmission scan failed.\n");
++        return -1;
+     }
      if (precreate_counter < 0) {
 === modified file 'hipd/maintenance.h'
 --- hipd/maintenance.h	2012-02-15 17:37:10 +0000
 +++ hipd/maintenance.h	2012-03-08 18:19:32 +0000
@@ -39,6 +39,7 @@
                                  const uint16_t priority);
  int hip_unregister_maint_function(int (*maint_function)(void));
  void hip_uninit_maint_functions(void);
++int hip_scan_retransmissions(void);
  int hip_periodic_maintenance(void);
  int hipfw_is_alive(void);
 === modified file 'hipd/output.c'
 --- hipd/output.c	2012-03-01 14:06:24 +0000
 +++ hipd/output.c	2012-03-08 18:19:32 +0000
@@ -1145,9 +1145,11 @@
      ipv6_addr_copy(&retrans->saddr, src_addr);
      ipv6_addr_copy(&retrans->daddr, peer_addr);
      retrans->count = HIP_RETRANSMIT_MAX;
--    time(&retrans->last_transmit);
++    gettimeofday(&retrans->last_transmit, NULL);
++    retrans->current_backoff = HIP_RETRANSMIT_BACKOFF_MIN;
      entry->next_retrans_slot = (entry->next_retrans_slot + 1) % HIP_RETRANSMIT_QUEUE_SIZE;
++    hip_update_select_timeout();
      return 0;
+ }
 === modified file 'lib/core/state.h'
 --- lib/core/state.h	2012-02-17 10:45:47 +0000
 +++ lib/core/state.h	2012-03-08 18:19:32 +0000
@@ -109,7 +109,8 @@
   */
  struct hip_msg_retrans {
      int                count;
--    time_t             last_transmit;
++    uint64_t           current_backoff;
++    struct timeval     last_transmit;
      struct in6_addr    saddr;
      struct in6_addr    daddr;
      struct hip_common *buf;

HIPL

Merge lp:~martin-lp/hipl/hipl_exp_backoff into lp:hipl

Commit message

Description of the change

Preview Diff

Subscribers