Merge lp:~martin-lp/hipl/hipl_exp_backoff into lp:hipl

Proposed by David Martin
Status: Merged
Merged at revision: 6320
Proposed branch: lp:~martin-lp/hipl/hipl_exp_backoff
Merge into: lp:hipl
Diff against target: 392 lines (+157/-39)
8 files modified
hipd/hadb.c (+1/-0)
hipd/hipd.c (+98/-7)
hipd/hipd.h (+7/-9)
hipd/input.c (+5/-2)
hipd/maintenance.c (+40/-19)
hipd/maintenance.h (+1/-0)
hipd/output.c (+3/-1)
lib/core/state.h (+2/-1)
To merge this branch: bzr merge lp:~martin-lp/hipl/hipl_exp_backoff
Reviewer Review Type Date Requested Status
René Hummen Approve
Diego Biurrun Approve
Miika Komu Approve
Review via email: mp+96420@code.launchpad.net

Description of the change

This branch introduces an exponential backoff mechanism for retransmissions. Previously retransmissions had static timeouts of 10 seconds before they are sent out again which is an eternity for example in mobility scenarios. In this branch the first retransmission is triggered after 100ms. Timeout for every consecutive retransmission is doubled in classic backoff fashion (100ms, 200ms, 400ms, ..., up to 12seconds).

To make this possible some changes to the hipd select() loop are required.
- before hipd had a timeout of 1 second on the select() call. This means when one second passes without any outgoing or incoming transmissions on the sockets the maintenance routine is called. Part of the maintenance is the check for retransmission. The problem: any retransmission can at the shortest be sent out after this 1 second passes.
- the select() timeout is now set dynamically. Everytime a retransmissions is buffered, sent out or cleared the timeout is updated with the newly introduced hip_update_select_timeout() in hipd.c.
- the timeout value is set to the lowest backoff value of all buffered retransmissions and by default or at maximum to 1 second as before
- maintenance is supposed to be performed every second. To avoid running it too often and screwing up heartbeats the time of the last maintenace call is stored and maintenance is only called when at least 1 second has passed
- even though the select() call has at times shorter timeout values the previous hipd behaviour is not changed: maintenance is only called every second, only the retransmissions are sent out earlier than before.

I have tested these changes on virtual machines with the simulated packet dropping and in our testbed and it seems to work fine.

To test this locally you can simulate packet drops with this patch:

=== modified file 'hipd/output.c'
--- a/hipd/output.c 2012-02-23 15:10:37 +0000
+++ b/hipd/output.c 2012-02-24 09:26:03 +0000
@@ -87,8 +87,8 @@
 /* Set to 1 if you want to simulate lost output packet */
 #define HIP_SIMULATE_PACKET_LOSS 1
 /* Packet loss probability in percents */
-#define HIP_SIMULATE_PACKET_LOSS_PROBABILITY 0
-#define HIP_SIMULATE_PACKET_IS_LOST() (random() < ((uint64_t) HIP_SIMULATE_PACKET_LOSS_PROBABILITY * RAND_MAX) / 100)
+#define HIP_SIMULATE_PACKET_LOSS_PROBABILITY 85
+#define HIP_SIMULATE_PACKET_IS_LOST() ((uint64_t) random() < ((uint64_t) HIP_SIMULATE_PACKET_LOSS_PROBABILITY * RAND_MAX) / 100)

To post a comment you must log in.
Revision history for this message
Miika Komu (miika-iki) wrote :

I am glad that somebody implemented this. Seems ok to me but we better wait for an additional review. Thanks!

review: Approve
Revision history for this message
Diego Biurrun (diego-biurrun) wrote :

 review approve

On Wed, Mar 07, 2012 at 06:05:28PM +0000, David Martin wrote:
> David Martin has proposed merging lp:~martin-lp/hipl/hipl_exp_backoff into lp:hipl.
>
> Requested reviews:
> HIPL core team (hipl-core)

No real comments from me, looks sane overall. If this is well-tested,
I trust you it will be fine.

> --- hipd/maintenance.c 2012-02-15 17:37:10 +0000
> +++ hipd/maintenance.c 2012-03-07 18:04:31 +0000
> @@ -87,6 +86,31 @@
> static struct hip_ll *maintenance_functions;
>
> /**
> + * Update the retransmission backoff of the given retransmission.
> + * The backoff will simply be doubled and in case the maximum is exceeded the
> + * retransmissions are disabled.

in case the maximum is exceeded retransmissions are

> +static void update_retrans_backoff(struct hip_msg_retrans *const retrans)
> +{
> + retrans->current_backoff = retrans->current_backoff << 1;
> + if (retrans->current_backoff > HIP_RETRANSMIT_BACKOFF_MAX) {
> + HIP_DEBUG("Maximum retransmission backoff reached. Stopping"
> + " retransmission.\n");

retransmissionS I think.

Diego

review: Approve
Revision history for this message
René Hummen (rene-hummen) wrote :

Thanks for clarifying.

review: Approve
lp:~martin-lp/hipl/hipl_exp_backoff updated
6304. By David Martin

Fix typo in update_retrans_backoff() doxygen documentation.

Revision history for this message
David Martin (martin-lp) wrote :

Hi,

On Thu, Mar 8, 2012 at 7:00 PM, Diego Biurrun <email address hidden> wrote:
> review approve
>
> On Wed, Mar 07, 2012 at 06:05:28PM +0000, David Martin wrote:
>> David Martin has proposed merging lp:~martin-lp/hipl/hipl_exp_backoff into lp:hipl.
>
> No real comments from me, looks sane overall. If this is well-tested,
> I trust you it will be fine.

Tested it on the N900, netbook and VMs. Did not have a look at the
more exotic HIPL scenarios with shotgunning or relays and what other
stuff there is though. Fingers crossed those will be fine I guess.

>> --- hipd/maintenance.c 2012-02-15 17:37:10 +0000
>> +++ hipd/maintenance.c 2012-03-07 18:04:31 +0000
>> @@ -87,6 +86,31 @@
>> static struct hip_ll *maintenance_functions;
>>
>> /**
>> + * Update the retransmission backoff of the given retransmission.
>> + * The backoff will simply be doubled and in case the maximum is exceeded the
>> + * retransmissions are disabled.
>
> in case the maximum is exceeded retransmissions are

Fixed in revision 6304.

>> +static void update_retrans_backoff(struct hip_msg_retrans *const retrans)
>> +{
>> + retrans->current_backoff = retrans->current_backoff << 1;
>> + if (retrans->current_backoff > HIP_RETRANSMIT_BACKOFF_MAX) {
>> + HIP_DEBUG("Maximum retransmission backoff reached. Stopping"
>> + " retransmission.\n");
>
> retransmissionS I think.

I had the plural before but decided to change it. Directly quoted from
the commit log:

> "Stopping retransmissions." sounds like no more retransmissions will be
> sent at all but it only refers to this specific one. "Stopping "
> retransmission." makes it more clear.

Revision history for this message
Diego Biurrun (diego-biurrun) wrote :

On Thu, Mar 08, 2012 at 06:27:30PM +0000, David Martin wrote:
> On Thu, Mar 8, 2012 at 7:00 PM, Diego Biurrun <email address hidden> wrote:
> >> --- hipd/maintenance.c 2012-02-15 17:37:10 +0000
> >> +++ hipd/maintenance.c 2012-03-07 18:04:31 +0000
> >> @@ -87,6 +86,31 @@
> >> +static void update_retrans_backoff(struct hip_msg_retrans *const retrans)
> >> +{
> >> + retrans->current_backoff = retrans->current_backoff << 1;
> >> + if (retrans->current_backoff > HIP_RETRANSMIT_BACKOFF_MAX) {
> >> + HIP_DEBUG("Maximum retransmission backoff reached. Stopping"
> >> + " retransmission.\n");
> >
> > retransmissionS I think.
>
> I had the plural before but decided to change it. Directly quoted from
> the commit log:
>
> > "Stopping retransmissions." sounds like no more retransmissions will be
> > sent at all but it only refers to this specific one. "Stopping "
> > retransmission." makes it more clear.

OK, then keep your version, the singular was intended.

Diego

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'hipd/hadb.c'
2--- hipd/hadb.c 2012-03-01 14:06:24 +0000
3+++ hipd/hadb.c 2012-03-08 18:19:32 +0000
4@@ -837,6 +837,7 @@
5 for (i = 0; i < HIP_RETRANSMIT_QUEUE_SIZE; i++) {
6 free(ha->hip_msg_retrans[i].buf);
7 }
8+ hip_update_select_timeout();
9 if (ha->peer_pub) {
10 switch (hip_get_host_id_algo(ha->peer_pub)) {
11 case HIP_HI_RSA:
12
13=== modified file 'hipd/hipd.c'
14--- hipd/hipd.c 2012-02-15 17:37:10 +0000
15+++ hipd/hipd.c 2012-03-08 18:19:32 +0000
16@@ -60,6 +60,7 @@
17 #include "lib/core/util.h"
18 #include "config.h"
19 #include "accessor.h"
20+#include "hadb.h"
21 #include "hip_socket.h"
22 #include "init.h"
23 #include "maintenance.h"
24@@ -74,6 +75,12 @@
25 * nf_ipsec for this purpose). */
26 struct rtnl_handle hip_nl_route;
27
28+/* The timeout for the select call in the main loop. */
29+static struct timeval select_timeout;
30+/* Shortest backoff (in microseconds) of all retransmissions of all HAs.
31+ * Used to determine the required select timeout. */
32+static uint64_t shortest_backoff;
33+
34 /**
35 * print hipd usage instructions on stderr
36 */
37@@ -170,14 +177,74 @@
38 }
39
40 /**
41+ * Determine the lowest retransmission backoff of all retransmissions in the
42+ * given host association.
43+ *
44+ * Note: The lowest retransmission backoff will be written to the static
45+ * variable shortest_backoff. The caller is responsible for initializing
46+ * this variable as the actual retransmission backoffs are compared
47+ * against it.
48+ *
49+ * @param hadb The hadb state from which the lowest retransmission backoff will
50+ * be determined.
51+ * @param opaq UNUSED.
52+ *
53+ * @return Always 0. Not void because hip_for_each_ha() requires a return value.
54+ */
55+static int get_shortest_retrans_backoff(struct hip_hadb_state *hadb, UNUSED void *opaq)
56+{
57+ for (unsigned int i = 0; i < HIP_RETRANSMIT_QUEUE_SIZE; i++) {
58+ struct hip_msg_retrans *retrans = &hadb->hip_msg_retrans[i];
59+
60+ if (shortest_backoff == HIP_RETRANSMIT_BACKOFF_MIN) {
61+ break;
62+ } else if (retrans->count > 0 && retrans->current_backoff < shortest_backoff) {
63+ shortest_backoff = retrans->current_backoff;
64+ }
65+ }
66+
67+ return 0;
68+}
69+
70+/**
71+ * Update the select timeout with respect to the currently outstanding
72+ * retransmissions. If there are no retransmissions the timeout will be
73+ * set to the HIP_SELECT_TIMEOUT default value. Else it will be set to the
74+ * minimum backoff of all retransmissions.
75+ *
76+ * @return 0 on success
77+ * -1 on error
78+ */
79+int hip_update_select_timeout(void)
80+{
81+ uint64_t last_backoff = shortest_backoff;
82+
83+ shortest_backoff = HIP_SELECT_TIMEOUT_USEC;
84+ if (hip_for_each_ha(get_shortest_retrans_backoff, NULL) < 0) {
85+ HIP_ERROR("Failed to determine shortest retransmission backoff.\n");
86+ return -1;
87+ }
88+
89+ if (shortest_backoff != last_backoff) {
90+ select_timeout.tv_sec = shortest_backoff / 1000000;
91+ select_timeout.tv_usec = shortest_backoff % 1000000;
92+ HIP_DEBUG("select() timeout set to %" PRIu64 "ms.\n",
93+ shortest_backoff / 1000);
94+ }
95+
96+ return 0;
97+}
98+
99+/**
100 * Daemon "main" function.
101 * @param flags startup flags
102 * @return 0 on success, negative error code otherwise
103 */
104 int hipd_main(uint64_t flags)
105 {
106- int highest_descriptor = 0, err = 0;
107- struct timeval timeout;
108+ int highest_descriptor = 0, err = 0;
109+ bool maintenance_was_performed = true;
110+ struct timeval last_maintenance = { 0 };
111 fd_set read_fdset;
112 struct hip_packet_context ctx = { 0 };
113
114@@ -259,14 +326,20 @@
115 hip_perf_stop_benchmark(perf_set, PERF_STARTUP);
116 hip_perf_write_benchmark(perf_set, PERF_STARTUP);
117 #endif
118+
119+ select_timeout.tv_sec = HIP_SELECT_TIMEOUT;
120+ select_timeout.tv_usec = 0;
121+
122 while (hipd_get_state() != HIPD_STATE_CLOSED) {
123+ /* The select() call modifies the provided timeout struct timeval.
124+ * This variable indirection makes sure that the correct timeout value
125+ * is used in every loop iteration. */
126+ struct timeval timeout = select_timeout;
127+
128 hip_prepare_fd_set(&read_fdset);
129
130 hipfw_sock = hip_user_sock;
131
132- timeout.tv_sec = HIP_SELECT_TIMEOUT;
133- timeout.tv_usec = 0;
134-
135 #ifdef CONFIG_HIP_FIREWALL
136 if (hipfw_status < 0) {
137 hipfw_addr.sin6_family = AF_INET6;
138@@ -287,6 +360,11 @@
139 }
140 #endif
141
142+ if (maintenance_was_performed) {
143+ gettimeofday(&last_maintenance, NULL);
144+ maintenance_was_performed = false;
145+ }
146+
147 err = select(highest_descriptor + 1, &read_fdset, NULL, NULL, &timeout);
148
149 if (err < 0) {
150@@ -294,13 +372,26 @@
151 goto to_maintenance;
152 } else if (err == 0) {
153 /* idle cycle - select() timeout */
154- goto to_maintenance;
155+ struct timeval now;
156+
157+ /* Maintenance is supposed to be called every HIP_SELECT_TIMEOUT
158+ * seconds. Retransmissions have a higher frequency. Therefore call
159+ * the retransmission scan after every select timeout and maintenance
160+ * only when HIP_SELECT_TIMEOUT seconds have passed. */
161+ gettimeofday(&now, NULL);
162+ if (calc_timeval_diff(&last_maintenance, &now) > HIP_SELECT_TIMEOUT_USEC) {
163+ goto to_maintenance;
164+ } else {
165+ hip_scan_retransmissions();
166+ continue;
167+ }
168 }
169
170 hip_run_socket_handles(&read_fdset, &ctx);
171
172 to_maintenance:
173- err = hip_periodic_maintenance();
174+ err = hip_periodic_maintenance();
175+ maintenance_was_performed = true;
176 if (err) {
177 HIP_ERROR("Error (%d) ignoring. %s\n", err,
178 ((errno) ? strerror(errno) : ""));
179
180=== modified file 'hipd/hipd.h'
181--- hipd/hipd.h 2012-03-01 14:06:24 +0000
182+++ hipd/hipd.h 2012-03-08 18:19:32 +0000
183@@ -38,15 +38,12 @@
184
185 #define HIP_HIT_DEV "dummy0"
186
187-#define HIP_SELECT_TIMEOUT 1
188-#define HIP_RETRANSMIT_MAX 5
189-#define HIP_RETRANSMIT_INTERVAL 1 /* seconds */
190-/* the interval with which the hadb entries are checked for retransmissions */
191-#define HIP_RETRANSMIT_INIT \
192- (HIP_RETRANSMIT_INTERVAL / HIP_SELECT_TIMEOUT)
193-/* wait about n seconds before retransmitting.
194- * the actual time is between n and n + RETRANSMIT_INIT seconds */
195-#define HIP_RETRANSMIT_WAIT 10
196+#define HIP_SELECT_TIMEOUT 1
197+/* The select timeout in microseconds. */
198+#define HIP_SELECT_TIMEOUT_USEC (HIP_SELECT_TIMEOUT * 1000000)
199+#define HIP_RETRANSMIT_MAX 10
200+#define HIP_RETRANSMIT_BACKOFF_MIN (100 * 1000) /* microseconds */
201+#define HIP_RETRANSMIT_BACKOFF_MAX (15 * 1000000) /* microseconds */
202
203 #define HIP_R1_PRECREATE_INTERVAL 60 * 60 /* seconds */
204 #define HIP_R1_PRECREATE_INIT (HIP_R1_PRECREATE_INTERVAL / HIP_SELECT_TIMEOUT)
205@@ -80,6 +77,7 @@
206 /* Functions for handling outgoing packets. */
207 int hip_sendto_firewall(const struct hip_common *msg);
208
209+int hip_update_select_timeout(void);
210 int hipd_parse_cmdline_opts(int argc, char *argv[], uint64_t * flags);
211 int hipd_main(uint64_t flags);
212
213
214=== modified file 'hipd/input.c'
215--- hipd/input.c 2012-03-01 14:06:24 +0000
216+++ hipd/input.c 2012-03-08 18:19:32 +0000
217@@ -1886,8 +1886,10 @@
218 }
219
220 /**
221- * Clear the given retransmission.
222- * i.e. set the remaining retransmissions to zero and zero the buffer.
223+ * Clear the given retransmission and update the select() timeout.
224+ * i.e. set the remaining retransmissions to zero, zero the buffer and
225+ * update the select() timeout as there is one retransmission less to be
226+ * considered.
227 *
228 * @param retrans The retransmission to be cleared.
229 */
230@@ -1901,6 +1903,7 @@
231 if (retrans->buf) {
232 memset(retrans->buf, 0, hip_get_msg_total_len(retrans->buf));
233 }
234+ hip_update_select_timeout();
235 }
236
237 /**
238
239=== modified file 'hipd/maintenance.c'
240--- hipd/maintenance.c 2012-02-15 17:37:10 +0000
241+++ hipd/maintenance.c 2012-03-08 18:19:32 +0000
242@@ -77,7 +77,6 @@
243 struct sockaddr_in6 hipfw_addr = { 0 };
244 int hipfw_sock = 0;
245
246-static float retrans_counter = HIP_RETRANSMIT_INIT;
247 static float precreate_counter = HIP_R1_PRECREATE_INIT;
248 static int force_exit_counter = FORCE_EXIT_COUNTER_START;
249
250@@ -87,6 +86,31 @@
251 static struct hip_ll *maintenance_functions;
252
253 /**
254+ * Update the retransmission backoff of the given retransmission.
255+ * The backoff will simply be doubled and in case the maximum is exceeded
256+ * retransmissions are disabled.
257+ *
258+ * @param retrans The retransmission to be updated.
259+ */
260+static void update_retrans_backoff(struct hip_msg_retrans *const retrans)
261+{
262+ if (!retrans) {
263+ return;
264+ }
265+
266+ retrans->current_backoff = retrans->current_backoff << 1;
267+ if (retrans->current_backoff > HIP_RETRANSMIT_BACKOFF_MAX) {
268+ HIP_DEBUG("Maximum retransmission backoff reached. Stopping"
269+ " retransmission.\n");
270+ hip_clear_retransmission(retrans);
271+ return;
272+ }
273+
274+ HIP_DEBUG("Retransmission timeout set to %" PRIu64 "ms.\n",
275+ retrans->current_backoff / 1000);
276+}
277+
278+/**
279 * an iterator to handle packet retransmission for a given host association
280 *
281 * @param entry the host association which to handle
282@@ -98,15 +122,15 @@
283 {
284 int err = 0, i = 0;
285 struct hip_msg_retrans *retrans;
286- time_t *now = current_time;
287+ struct timeval *now = current_time;
288
289 for (i = 0; i < HIP_RETRANSMIT_QUEUE_SIZE; i++) {
290 retrans = &entry->hip_msg_retrans[(entry->next_retrans_slot + i) %
291 HIP_RETRANSMIT_QUEUE_SIZE];
292
293- /* check if the last transmission was at least RETRANSMIT_WAIT seconds ago */
294- if (*now - HIP_RETRANSMIT_WAIT > retrans->last_transmit) {
295- if (retrans->count > 0) {
296+ if (retrans->count > 0) {
297+ if (calc_timeval_diff(&retrans->last_transmit, now) >
298+ retrans->current_backoff) {
299 /* @todo: verify that this works over slow ADSL line */
300 if (hip_send_pkt(&retrans->saddr,
301 &retrans->daddr,
302@@ -128,10 +152,12 @@
303 }
304
305 retrans->count--;
306- time(&retrans->last_transmit);
307- } else if (hip_get_msg_type(retrans->buf)) {
308- hip_clear_retransmission(retrans);
309+ gettimeofday(&retrans->last_transmit, NULL);
310+ update_retrans_backoff(retrans);
311+ hip_update_select_timeout();
312 }
313+ } else if (hip_get_msg_type(retrans->buf)) {
314+ hip_clear_retransmission(retrans);
315 }
316 }
317
318@@ -143,10 +169,10 @@
319 *
320 * @return zero on success or negative on failure
321 */
322-static int scan_retransmissions(void)
323+int hip_scan_retransmissions(void)
324 {
325- time_t current_time;
326- time(&current_time);
327+ struct timeval current_time;
328+ gettimeofday(&current_time, NULL);
329
330 if (hip_for_each_ha(handle_retransmissions, &current_time)) {
331 return -1;
332@@ -261,14 +287,9 @@
333 * in closing or closed state, delete them */
334 hip_for_each_ha(hip_purge_closing_ha, NULL);
335
336- if (retrans_counter < 0) {
337- if (scan_retransmissions()) {
338- HIP_ERROR("Retransmission scan failed.\n");
339- return -1;
340- }
341- retrans_counter = HIP_RETRANSMIT_INIT;
342- } else {
343- retrans_counter--;
344+ if (hip_scan_retransmissions()) {
345+ HIP_ERROR("Retransmission scan failed.\n");
346+ return -1;
347 }
348
349 if (precreate_counter < 0) {
350
351=== modified file 'hipd/maintenance.h'
352--- hipd/maintenance.h 2012-02-15 17:37:10 +0000
353+++ hipd/maintenance.h 2012-03-08 18:19:32 +0000
354@@ -39,6 +39,7 @@
355 const uint16_t priority);
356 int hip_unregister_maint_function(int (*maint_function)(void));
357 void hip_uninit_maint_functions(void);
358+int hip_scan_retransmissions(void);
359 int hip_periodic_maintenance(void);
360 int hipfw_is_alive(void);
361
362
363=== modified file 'hipd/output.c'
364--- hipd/output.c 2012-03-01 14:06:24 +0000
365+++ hipd/output.c 2012-03-08 18:19:32 +0000
366@@ -1145,9 +1145,11 @@
367 ipv6_addr_copy(&retrans->saddr, src_addr);
368 ipv6_addr_copy(&retrans->daddr, peer_addr);
369 retrans->count = HIP_RETRANSMIT_MAX;
370- time(&retrans->last_transmit);
371+ gettimeofday(&retrans->last_transmit, NULL);
372+ retrans->current_backoff = HIP_RETRANSMIT_BACKOFF_MIN;
373
374 entry->next_retrans_slot = (entry->next_retrans_slot + 1) % HIP_RETRANSMIT_QUEUE_SIZE;
375+ hip_update_select_timeout();
376
377 return 0;
378 }
379
380=== modified file 'lib/core/state.h'
381--- lib/core/state.h 2012-02-17 10:45:47 +0000
382+++ lib/core/state.h 2012-03-08 18:19:32 +0000
383@@ -109,7 +109,8 @@
384 */
385 struct hip_msg_retrans {
386 int count;
387- time_t last_transmit;
388+ uint64_t current_backoff;
389+ struct timeval last_transmit;
390 struct in6_addr saddr;
391 struct in6_addr daddr;
392 struct hip_common *buf;

Subscribers

People subscribed via source and target branches

to all changes: