I've now completed 2 successful full deployments without any errors.
On top of that, I've done 3 terminate-machine --force (all workers) / deploy.py (same code, so safe) runs without errors.
@Andy: I'm now convinced the patch does the right thing. The failures I saw on Friday were a different kind of race introduced by stupidly not using delete=False.
What happened with delete=True (the default) is that it succeeded "often" because the rename succeeded before the os really removed the tmp file.
I've now completed 2 successful full deployments without any errors.
On top of that, I've done 3 terminate-machine --force (all workers) / deploy.py (same code, so safe) runs without errors.
@Andy: I'm now convinced the patch does the right thing. The failures I saw on Friday were a different kind of race introduced by stupidly not using delete=False.
What happened with delete=True (the default) is that it succeeded "often" because the rename succeeded before the os really removed the tmp file.