Merge lp:~gz/bzr/trivial_fork_error_block into lp:bzr

Proposed by Martin Packman
Status: Rejected
Rejected by: Martin Packman
Proposed branch: lp:~gz/bzr/trivial_fork_error_block
Merge into: lp:bzr
Diff against target: 42 lines (+14/-5)
1 file modified
bzrlib/tests/__init__.py (+14/-5)
To merge this branch: bzr merge lp:~gz/bzr/trivial_fork_error_block
Reviewer Review Type Date Requested Status
Vincent Ladeuil Needs Information
Review via email: mp+81479@code.launchpad.net

Description of the change

Second attempt at keeping last-ditch error logging from forked selftest children separated from each other. For the background, see (merged) proposal from earlier:

<https://code.launchpad.net/~gz/bzr/trivial_fork_error_block/+merge/81372>

I'm not crazy about this change, it adds a lot of complexity that's not comprehensively testable. However, even if unlikely, a child process getting a signal that caused the tail of traceback not be displayed at all would be annoying.

To post a comment you must log in.
Revision history for this message
Vincent Ladeuil (vila) wrote :

I'm not sure I understand the rationale, did you encounter a real-life case where this is required ? babune killing the master process ?

Also, I'm not sure it applied here but mixing buffered and not buffered accesses is generally error-prone and may cause data loss no ? Should you add a .flush() before trying to report the exception ?

That being said, if I my fears are without basis and you have a real use case here, this should be landed, but a comment explaining it would help.

review: Needs Information
Revision history for this message
Vincent Ladeuil (vila) wrote :

/me blinks

My review above was made directly from launchpad so I missed your comments on your previous attempt that I just read in my backlogged mail.

So, bufsize=1.

But then, we shouldn't see *lines* mixed, but *chars* mixed. What is going on here ?

review: Needs Information
Revision history for this message
Martin Packman (gz) wrote :

> So, bufsize=1.
>
> But then, we shouldn't see *lines* mixed, but *chars* mixed. What is going on
> here ?

In Python, 1 is a magic value used to imply _IOLBUF (on the grounds no one want to do per-character buffering).

> I'm not sure I understand the rationale, did you encounter a real-life case
> where this is required ? babune killing the master process ?

The problem is the same as before, miraculously the same random failure happened on the natty buildbot for the revision intended to fix this, showing right away that it hadn't worked:

<http://babune.ladeuil.net:24842/job/selftest-chroot-natty/285/testReport/junit/bzrlib.tests.test_selftest/TestParallelFork/test_error_in_child_during_fork/>

> Also, I'm not sure it applied here but mixing buffered and not buffered
> accesses is generally error-prone and may cause data loss no ? Should you add
> a .flush() before trying to report the exception ?

We don't know if the stream has been created or used at this point, and any remaining content in buffers will just be discarded on exit. For last ditch logging like this, that doesn't matter too much as we'd already be losing that information, this way we just make sure the traceback does get out.

Again, this mostly just helps development of the selftest framework itself, problems in tests will be caught and reported at a much higher level than this.

Revision history for this message
Martin Packman (gz) wrote :

Looking at it again, I think this extra change isn't helpful either, see the original merge proposal again for more rationale.

Revision history for this message
Vincent Ladeuil (vila) wrote :

> > But then, we shouldn't see *lines* mixed, but *chars* mixed. What is going
> on
> > here ?
>
> In Python, 1 is a magic value used to imply _IOLBUF

Ha, bah, I knew that one day...

> (on the grounds no one want to do per-character buffering).

Hey, what if I want to output half chars ? ;)

> The problem is the same as before, miraculously the same random failure
> happened on the natty buildbot for the revision intended to fix this, showing
> right away that it hadn't worked:

This bug is really nice with you :)

>
> <http://babune.ladeuil.net:24842/job/selftest-chroot-natty/285/testReport/juni
> t/bzrlib.tests.test_selftest/TestParallelFork/test_error_in_child_during_fork/
> >

> We don't know if the stream has been created or used at this point, and any
> remaining content in buffers will just be discarded on exit. For last ditch
> logging like this, that doesn't matter too much as we'd already be losing that
> information, this way we just make sure the traceback does get out.

But why handling EINTR was really the intriguing part for me.

>
> Again, this mostly just helps development of the selftest framework itself,
> problems in tests will be caught and reported at a much higher level than
> this.

Right. But having discussing this test failure with you back in Orlando, I wonder if you're not digging too much on this front.

If we can't get a reliable way to address the issue here, it may be better to just relax the test itself and keep this part of the code as simple as possible while documenting the potential issue no ?

review: Needs Information

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrlib/tests/__init__.py'
2--- bzrlib/tests/__init__.py 2011-11-07 10:14:38 +0000
3+++ bzrlib/tests/__init__.py 2011-11-07 18:10:38 +0000
4@@ -3503,24 +3503,33 @@
5 pid = os.fork()
6 if pid == 0:
7 try:
8- stream = os.fdopen(c2pwrite, 'wb', 1)
9- workaround_zealous_crypto_random()
10 os.close(c2pread)
11 # Leave stderr and stdout open so we can see test noise
12 # Close stdin so that the child goes away if it decides to
13 # read from stdin (otherwise its a roulette to see what
14 # child actually gets keystrokes for pdb etc).
15 sys.stdin.close()
16+ workaround_zealous_crypto_random()
17+ stream = os.fdopen(c2pwrite, 'wb', 1)
18 subunit_result = AutoTimingTestResultDecorator(
19 SubUnitBzrProtocolClient(stream))
20 process_suite.run(subunit_result)
21 except:
22 # Try and report traceback on stream, but exit with error even
23 # if stream couldn't be created or something else goes wrong.
24- # The traceback is formatted to a string and written in one go
25- # to avoid interleaving lines from multiple failing children.
26+ # The traceback is formatted to a string and written with the
27+ # low-level api to avoid line buffering interleaving output
28+ # from multiple failing children, unless a signal forces that.
29 try:
30- stream.write(traceback.format_exc())
31+ tb_text = traceback.format_exc()
32+ while tb_text:
33+ try:
34+ bytes_written = os.write(c2pwrite, tb_text)
35+ except EnvironmentError, e:
36+ if e.errno != errno.EINTR:
37+ raise
38+ else:
39+ tb_text = tb_text[bytes_written:]
40 finally:
41 os._exit(1)
42 os._exit(0)