Comment 2 for bug 613675

Revision history for this message
Barry Warsaw (barry) wrote :

If you have both Python 2.6 and 2.7 installed, run the test script and watch
as it hangs in p1.communicate(). If you trace this through, it's actually
hanging in a call to os.waitpid() on p1 after its stdin has been closed. If
you comment out p2 (i.e. run only one process), you get no hang. If you use
.terminate() calls, you get no hangs. If you serialize the calls instead of
parallelizing them, it does not hang.

You'll notice that the test doesn't actually write anything to subprocess.
From observation, this happens when the python-pkg-resources package gets
installed.

So what it looks like to me is that if we have more than one subprocess, the
first one doesn't recognize its stdin getting closed and so it never gets an
EOF to break out of the 'while True' loop in py_compile. I'm sure it's caused
fundamentally by some blocking IO, but I haven't quite worked out in my mind
where that's happening. I'll continue to debug what I can today.

I've considered other approaches, including putting each subprocess
communication in its own thread, but I didn't want to go down that radical a
rewrite and the switch to terminate() *seems* to work. I've tested this on a
Maverick system with the Python 2.7 stack enabled and as far as I can tell,
all packages are getting installed and byte-compiled correctly.

Doko tells me in IRC that we still need to support Python 2.5 in Debian, so
ywe can't use .terminate(). :(

I marked this as a critical bug and have a package in my PPA with the fix.
Without this, I basically DDoS'd the Launchpad build farm because every
package just freezes the builder. I can reproduce this in my sbuilds,
e.g. when building net-snmp on a Python 2.7 enabled system.