Comment 2 for bug 615371

Revision history for this message
Kevin McDermott (bigkevmcd) wrote :

There was a bit of discussion about this at the time in the IRC channel...

Aug 06 15:06:33 <andreas>>--bigkevmcd: so /proc/net/dev is rolling over? OR, the interface was reset (brought down and up again, something that resets the number of bytes sent/received)
Aug 06 15:06:41 <bigkevmcd>>andreas: /proc/net/dev is rolling over
Aug 06 15:06:51 <bigkevmcd>>andreas: so, we need to know the rollover point, and delta it accordingly
Aug 06 15:07:04 <andreas>>--bigkevmcd: I suspect that's arch dependent
Aug 06 15:07:16 <andreas>>--we could check how mrtg does it :)
Aug 06 15:07:18 <bigkevmcd>>andreas: what arch are the machines?
Aug 06 15:07:32 <andreas>>--we only support two
Aug 06 15:07:57 <andreas>>--Intel(R) Xeon(R) CPU E5410 @ 2.33GHz
Aug 06 15:07:58 <andreas>>--7.82GB RAM
Aug 06 15:07:58 <andreas>>--that's one
Aug 06 15:08:05 <andreas>>--Ubuntu 10.04.1 LTS (lucid)
Aug 06 15:08:09 <andreas>>--doesn't say the arch in the info page
Aug 06 15:08:12 <bigkevmcd>>pah
Aug 06 15:08:40 <andreas>>--on the "plus" side, the hardware page does list a lot of network interfaces
Aug 06 15:09:14 <bigkevmcd>>andreas: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=199054
Aug 06 15:09:23 <bigkevmcd>>andreas: that bug tells a story :-)
Aug 06 15:11:11 <jkakar>>---bigkevmcd: Bumping to bigint sounds like the right fix.
Aug 06 15:11:45 <bigkevmcd>>jkakar: the fix is up for review
Aug 06 15:11:51 <jkakar>>---bigkevmcd: Cool, checking it out.
Aug 06 15:13:40 <jkakar>>---+ data_point = api.create_traffic("eth0", now, -2814141641, -377168823)
Aug 06 15:13:51 <jkakar>>---bigkevmcd: ^^ Uhm. Those aren't big numbers are they?
Aug 06 15:14:10 <bigkevmcd>>jkakar: they're big enough to trigger the DataError
Aug 06 15:14:23 <jkakar>>---bigkevmcd: Shouldn't they be positive?
Aug 06 15:14:28 <bigkevmcd>>jkakar: they're deltas
Aug 06 15:14:37 <bigkevmcd>>jkakar: i.e. the difference between one data point, and another
Aug 06 15:14:41 <jkakar>>---bigkevmcd: Ah, right.
Aug 06 15:14:48 <jkakar>>---I know what a delta is. ;)
Aug 06 15:14:54 <bigkevmcd>>jkakar: phew ;-)
Aug 06 15:15:11 <bigkevmcd>>jkakar: there's a client side fix needed, but I think it needs a kernel fix
Aug 06 15:15:33 <bigkevmcd>>jkakar: or rather, we can client-side fix, or wait for the kernel fix
Aug 06 15:16:53 <jkakar>>---bigkevmcd: Ah, hmm. So the server-side fix will unwedge clients, right?
Aug 06 15:16:58 <bigkevmcd>>jkakar: precisely
Aug 06 15:17:07 <bigkevmcd>>jkakar: see the debian bug report above
Aug 06 15:17:17 <bigkevmcd>>jkakar: apparently there's a kernel bug, which makes rollover more likely
Aug 06 15:17:35 <bigkevmcd>>jkakar: the data comes from the kernel in 32bits, which will cause us to see larger deltas
Aug 06 15:17:49 <jkakar>>---bigkevmcd: Wow, that bug was opened in 2003. :/
Aug 06 15:17:57 <_mup_>>Branch lp:~bigkevmcd/landscape/bug-614346-network-data-overflow: approved
Aug 06 15:17:57 <bigkevmcd>>jkakar: note the last date...
Aug 06 15:17:59 <jkakar>>---bigkevmcd: I see.
Aug 06 15:18:00 <jkakar>>---bigkevmcd: Yep.
Aug 06 15:18:14 <jkakar>>---bigkevmcd: We should fix the client if we can.
Aug 06 15:18:22 <andreas>>--so the client still has a rollover bug
Aug 06 15:18:25 <bigkevmcd>>yeah
Aug 06 15:18:39 <bigkevmcd>>the client-side fix is a bit more interesting
Aug 06 15:18:49 <bigkevmcd>>even if we do fix it, it's not clear when we'd actually see the fix working
Aug 06 15:18:59 <bigkevmcd>>if the data from kernel -> user space is wrong
Aug 06 15:19:36 <jkakar>>---Yeah, that's what I was thinking, How do we know when we need to "fix" the data and when we don't, IOW.
Aug 06 15:19:43 <bigkevmcd>>yeah
Aug 06 15:19:46 <bigkevmcd>>that's what worries me
Aug 06 15:19:53 <bigkevmcd>>I guess we'd have to check the kernel version
Aug 06 15:20:07 <bigkevmcd>>which is icky at best :-)
Aug 06 15:20:58 <andreas>>--we could skip a data point
Aug 06 15:21:10 <bigkevmcd>>andreas: am running the tests for production before committing
Aug 06 15:21:18 <bigkevmcd>>andreas: you'll see the commit in about 5 mins :-)
Aug 06 15:21:27 <andreas>>--bigkevmcd: two reviews already? :)
Aug 06 15:21:38 <bigkevmcd>>andreas: wedged computers focus the mind ;-)
Aug 06 15:23:06 <bigkevmcd>>I'm not sure what to do about the client
Aug 06 15:23:44 <bigkevmcd>>kernel fix, or no kernel fix, rollovers are a problem on the client
Aug 06 15:24:34 <bigkevmcd>>I guess we need to see if new_value < old_value and then subtract old_value from MAX_INT and add that to the delta
Aug 06 15:25:09 <bigkevmcd>>or rather, add that to new_value, and that's the new delta
Aug 06 15:25:40 <andreas>>--yes, if MAX_INT is known and reliable, that's the fix
Aug 06 15:25:49 <andreas>>--or we could skip a value, no?
Aug 06 15:25:54 <andreas>>--if the delta is negative, skip it
Aug 06 15:26:03 <andreas>>--but store the previous value nonetheless
Aug 06 15:26:09 <andreas>>--so the next delta should be fine
Aug 06 15:27:27 <bigkevmcd>>andreas: wouldn't that result in delta jumps?
Aug 06 15:28:03 <andreas>>--if there is traffic, like a file being transferred
Aug 06 15:28:05 <andreas>>--and the rollover occurs
Aug 06 15:28:13 <andreas>>--if not treated, that will be a jump
Aug 06 15:28:23 <andreas>>--but we don't send, or ignore, that single value
Aug 06 15:28:30 <andreas>>--there is no rollover for the next data point
Aug 06 15:28:39 <andreas>>--the traffic stays more or less the same
Aug 06 15:29:05 <andreas>>--say maxint = 10 and we get these values: 1,3,9,1,3,9
Aug 06 15:29:07 *>--zaid_h is now known as zaid_h_afk
Aug 06 15:29:12 <allenap>>--bigkevmcd: Do you have a few minutes?
Aug 06 15:29:12 <andreas>>--from 9 to 1 there is a rollover
Aug 06 15:29:26 <andreas>>--we send: 2, 6, none, 2, 6
Aug 06 15:29:31 <bigkevmcd>>allenap: sure
Aug 06 15:29:47 <andreas>>--from 9 to 1 the real value would be 3 I think
Aug 06 15:29:56 <andreas>>--or 2