Code review comment for lp:~jameinel/bzr/2.0-490228-gc-segfault

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Vincent Ladeuil wrote:
> Review: Approve
> I would be curious to know how long you and other people involved have spent on this bug
> and compare that to the time necessary to get a test infrastructure good enough to
> write a test reproducing this bug...
>

Once I had a traceback, and did a line-by-line analysis of what was
going on, maybe 20 min?

Rewriting the code to be testable is at least a days work for me,
depending on how much I want to cheat.

The main problems are:

1) Data injection. I'd have to figure out how to generate a structure
that is ready to expose the bug. Going further, how do you do it without
generating a brittle test?

2) Measuring the error. The specific error is that we are *reading* from
an invalid address. This hasn't been failing in *lots* of circumstances,
because reading "N+1" is often not a segfault. For example, I think most
allocators reserve a whole page for the process. So allocating a 20 byte
string will have the OS give you an 8kB page to work with, and malloc
will then use the rest of the 8172 bytes for other malloc requests.

However, accessing the 21st byte will not be a segfault. You have to
access the 8193rd byte to get the OS to say "hey, you don't have access
to that page, go home."

If it was writing to an invalid address, then we could check the memory
array and say "yep, X+1 doesn't have any data in it anymore, great."

3) Alternatively, we refactor the whole thing so that memory access goes
through a callback function. And then inject our own callback function
to test that we aren't accessing an bad section of memory. However, that
means that *every* access in this inner loop needs to go through a
virtual function abstraction, which would probably hurt its performance
significantly. And the gc code is definitely one of the performance
sensitive areas of our codebase...

> I'm more and more convinced that each time someone says: "I don't have time to write a test"
> he is just lying to himself :-(
>
> Given that you will likely be the one writing this test in the end anyway, I'm certainly
> not throwing stones as I realize that we got this code without tests to start with.
> So, let's land this patch :-D

If you have any hints as to a good way to test this, I'm all ears.

>
> Do you believe that this bug is present upstream or that it was introduced during
> the adaptation to pyrex ?

It is a bug from our adaptation of the code. They don't ever insert new
records into the hash map. It has to do with how groupcompress blocks work.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAksmcc0ACgkQJdeBCYSNAAOeMACfSECIKn0xNU+n4Jd+A3MTRAss
iVkAoMYH6BrfrRMmSfTUogEF28XkGgWa
=6hJc
-----END PGP SIGNATURE-----

« Back to merge proposal