Merge lp:~bac/launchpad/lxc_cleanup into lp:launchpad

Proposed by Brad Crittenden
Status: Merged
Approved by: Brad Crittenden
Approved revision: no longer in the source branch.
Merged at revision: 15045
Proposed branch: lp:~bac/launchpad/lxc_cleanup
Merge into: lp:launchpad
Diff against target: 0 lines
To merge this branch: bzr merge lp:~bac/launchpad/lxc_cleanup
Reviewer Review Type Date Requested Status
Gary Poster (community) Approve
Review via email: mp+100155@code.launchpad.net

Commit message

Have utilities/setuplxc.py create /usr/local/bin/launchpad-lxc-cleanup for use by buildbot slaves to freshen up the environment before starting.

Description of the change

Create a script /usr/local/bin/launchpad-lxc-cleanup that will be
invoked by the buildbot slave at startup to clean up any cruft
possibly left by previous lxc ephemeral runs.

Here is the state before such a run

/home/bac> ls -ld /tmp/lxc*
drwxrwxrwt 4 root root 160 Mar 30 07:43 /tmp/lxc-lp-s5Yv3XH/
/home/bac> ls -ld /var/lib/lxc/lptests-temp*
drwxrwxrwt 1 root root 160 Mar 30 07:43 /var/lib/lxc/lptests-temp-0PMxSrb/
/home/bac> mount |grep lxc
none on /tmp/lxc-lp-s5Yv3XH type tmpfs (rw)
none on /var/lib/lxc/lptests-temp-0PMxSrb type overlayfs (rw,upperdir=/tmp/lxc-lp-s5Yv3XH,lowerdir=/var/lib/lxc/lptests)
none on /var/lib/lxc/lptests-temp-0PMxSrb/ephemeralbind type tmpfs (rw)

Now run the cleanup script.
/home/bac> sudo /usr/local/bin/launchpad-lxc-cleanup

And here we see the directories are gone and no containers are
running.

/home/bac> ls -ld /tmp/lxc*
ls: cannot access /tmp/lxc*: No such file or directory
/home/bac> ls -ld /var/lib/lxc/lptests-temp*
ls: cannot access /var/lib/lxc/lptests-temp*: No such file or directory
/home/bac> mount |grep lxc

A follow-on branch for the public branch of lpbuildbot will put in the
step to have this script invoked.

To post a comment you must log in.
Revision history for this message
Gary Poster (gary) wrote :

Thank you Brad. Very nice. Your demo in the initial description did not also show that the process was killed, but you told me that this was accomplished as well, as I see in the code. Nice changes to how we write the sudoers file.

You could make the string raw, so that you didn't have to escape \"\"\" so many times. The only downside I see is that you'd have to do something clever about the first line, to make sure it has the shebang in it. Your call.

Maybe adding a comment about the fact that the unpleasant string parsing that you have to do in getPID mirrors what the approach done in lxc-list would be appropriate, to slightly reduce shock and surprise for someone reviewing this in the future.

The apparmor stuff (147-155 of diff) is supposed to be unnecessary now. I'd prefer to have that proved before we rip it out. Since you touched that code, I'm tempted to ask you to do something about it...but I'll forego, and add a separate card on the kanban board for the future.

You mentioned that you would send the unit tests for this code, so that we could have them available when we can divide things up better in lpsetup. I don't see that yet. Please make sure you do so. I'd hate to lose that work.

Thanks again!

review: Approve
Revision history for this message
Robert Collins (lifeless) wrote :

Hi, there is already a script to delete containers; I think you should
run that script rather than inventing our own - its an abstraction
barrier. (E.g. this script should just identify the lxc containers to
remove and invoke lxc-destroy for each lxc container it no longer
wants.).

Also, this should be in lp-dev-utils as it is not part of LP itself.

Thanks,
Rob

Revision history for this message
Gary Poster (gary) wrote :

@Rob

Thank you for the comments.

First, this is not for the same purpose as lxc-destroy. This is entirely about build stability in the face of exceptional circumstances.

lxc-destroy is used to destroy standard lxc containers. This branch's code is to clean up ephemeral containers that were not properly shut down. lxc-destroy is not designed for such a purpose, and will fail, because it knows nothing about umounting the necessary directories, and moreover knows nothing about conditionally stopping containers before it shuts things down. lxc-start-ephemeral is the only lxc code that should need to know how to kill an ephemeral container in the non-exceptional case. In exceptional circumstances, such as when lxc-start-ephemeral is kill -9'd, you will be left with a mess that needs to cleaned up. We have seen this sort of situation when a parallel test run went very wrong and buildbot killed the testr process after a timeout. In such a case, the host is still polluted with a running ephemeral lxc and all the various mounts it uses. We use the script Brad wrote here when buildbot first starts a run, to make sure that everything is clean before we start a new build. This should not happen, but in rare cases it has, and handling this case smoothly adds to system reliability.

Second, this script is currently written by setuplxc, so from the LP tree's perspective there is no new file, only more in setuplxc. As with previous discussions of this issue, we entirely agree that all of the lxc setup code should move out of the LP tree, and moreover we look forward to being able to have a saner division of files, rather than a single behemoth setuplxc.py. However, we already have another project that is pursuing this, and other related fixes, in slack time: lpsetup. I don't see the purpose in a temporary move of setuplxc to a new home when we're actively pursuing an entirely different solution.

Revision history for this message
Robert Collins (lifeless) wrote :

On Sat, Mar 31, 2012 at 7:40 AM, Gary Poster <email address hidden> wrote:
> @Rob
>
> Thank you for the comments.
>
> First, this is not for the same purpose as lxc-destroy.  This is entirely about build stability in the face of exceptional circumstances.

...

On the exceptional circumstances case - I consider that a bug in lxc
if it cannot handle shutting down a container in unusual state;
lxc-ephemeral does indeed setup some special mounts, but there is no
reason that once setup lxc cannot know about them. In fact, it should,
via the same mechanism other container specific mounts are known
about.

Its fine if in the short term you want to maintain this part in LP,
but it shouldn't be the long term case: the reason lxc-ephemeral was
pushed into lxc is that container *management* is an LXC problem, not
an LP problem.

I missed that this was a delta on setuplxc - if I had spotted that I
wouldn't have mentioned it being in the wrong tree, as I knew you were
working on that separately. I will say though, now that I see that,
that writing a script out from another script is very meta. Maybe it
would be simpler to have it as a regular file, have the juju charm for
slaves pull that file into the right place and make it setuid; for the
datacentre environment then the sysadmins can audit the file and use
puppet or similar to deploy it. If

I suspect we will get told that this cannot be run as-is in the
datacentre. My recommendation would be to make the script a regular
file in lp-dev-utils, and use the juju charm to get it in the right
place with the right permissions for buildbot slaves. (And separately
give it to the sysadmins for our DC server).

-Rob

Revision history for this message
Gary Poster (gary) wrote :

For the lxc discussion, I just filed a bug: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/969604 . The solution will need more than just the usual umounting mechanism, for reasons I discuss there. lxc-stop would also need to grow some globbing functionality in order for us not to have to write a script.

In terms of lxcsetup writing the script, that has to do with the simple configuration mechanisms that we built into the generic buildbot slave charm: you can specify a PPA to be installed, extra packages to be installed, and a file to be downloaded and run as root. That's pretty much it. It might be nice to add more features to the charm, but this has been sufficient, and I don't want to return to charm writing right now, particularly given that it won't be used in the DC. FWIW, when we have lpsetup, it will be a package, and then things can be divided up nicely, with files actually separated out as one would expect, and the charm can specify the PPA and package.

In terms of the data center, gnuoy simply isn't using setuplxc, per the RT discussion. For this clean-up script, we will give him the script to install, as he's already had to install two others that setuplxc creates.

Preview Diff

Empty