Code review comment for lp:~jameinel/launchpad/lp-serve-child-hangup

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 3/3/2011 4:43 PM, Gavin Panella wrote:
> Review: Approve
> Cool :)
>
>
> [1]
>
> + def _get_child_connect_timeout(self):
> + """signal.alarm only supports 1s granularity.
> +
> + We have to make sure we don't ever send 0, which would not generate an
> + alarm.
> + """
> + timeout = int(self._child_connect_timeout)
> + if timeout <= 0:
> + timeout = 1
> + return timeout
>
> I can't see where _child_connect_timeout might be set to something
> other than an int, so you could remove this and prevent problems
> another way:
>
> signal.alarm(max(1, self._child_connect_timeout))
>
> But I don't really have a strong argument why you ought to change it
> other than it's more code to read, more stuff to brain-load when
> coming in to maintain this code.

It used to be a float in the code, and I often set it to sub second in
the test suite, etc. I was just being cautious, because setting it to
0.1s caused it to get set to 0, which *unsets* alarms rather than
setting them.

>
>
> [2]
>
> + # Even though it timed out, we still cleanup the temp dir
> + self.assertFalse(os.path.exists(tempdir))
>
> I think this only happens because SIGALRM was neutered. If no handler
> was installed I doubt the temporary directory would be cleared
> out. Perhaps there needs to be a handler for SIGALRM anyway, that
> raises a custom exception perhaps, or signal.default_int_handler, or
> whatever :)
>

It used to be true that the signal was trapped. If we don't trap it at
all, it kills the test suite. I used to install a signal handler (when I
wasn't using signal.alarm). I can take out that assertion, as the parent
process will handle cleaning up the child. All I *really* care about is
that the child doesn't get hung indefinitely trying to open fifos that
will never open.

>
> [3]
>
> + # We can't actually run this test, because by default 'bzr serve
> + # --inet' does not flush after each message.
>
> Out of interest, why doesn't it flush? Can this test be fixed and
> enabled later? Consider removing it if it's just going to bit-rot.
>

At the moment, there is a layering issue. The part that writes the data
only gets a single 'write' callable. Rather than getting both 'write'
and 'flush'. We don't want to flush all the time, just in-between
messages. So that any buffering is used appropriately.

As it stands, I can't really test that blocking mode works correctly.

>
> [4]
>
> + # The master process should clean up after the now deceased child.
> + self.failIfExists(path)
>
> If, for some reason, the master believes the child has failed, but
> there's just been a misunderstanding and the child is really still
> running, could this mean that both the master and the child try to
> clean up the fifos directory? _cleanup_fifos() in the child code
> doesn't swallow errors, so this might fail.
>
> Consider using shutil instead:
>
> shutil.rmtree(path, ignore_errors=True)
>

We use os.waitpid() to tell that the child has exited. It will be pretty
hard for that to give a false positive. And if the file handles get
deleted from under the child, it is going to crash anyway, because the
remote process won't be able to successfully connect to the child.

We can do it differently if you think it is wise. I'm not sure if it
buys much.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk1vu8YACgkQJdeBCYSNAANS/wCgtNJW4eS1lm9LRBGOAJRtoIID
eMUAn2JNzNiWd4hzG5BHUAaBh1CRkxIT
=WItB
-----END PGP SIGNATURE-----

« Back to merge proposal