Merge lp:~ev/apport/cache_extracted_debs into lp:apport
| Status: | Merged |
|---|---|
| Merged at revision: | 2275 |
| Proposed branch: | lp:~ev/apport/cache_extracted_debs |
| Merge into: | lp:apport |
| Diff against target: |
193 lines (+76/-9) 5 files modified
apport/packaging.py (+4/-1) backends/packaging-apt-dpkg.py (+8/-5) bin/apport-retrace (+13/-3) man/apport-retrace.1 (+8/-0) test/test_backend_apt_dpkg.py (+43/-0) |
| To merge this branch: | bzr merge lp:~ev/apport/cache_extracted_debs |
| Related bugs: |
| Reviewer | Review Type | Date Requested | Status |
|---|---|---|---|
| Martin Pitt | 2012-04-10 | Approve on 2012-04-11 | |
|
Review via email:
|
|||
Description of the Change
As discussed on IRC, here's a branch to cache already extracted debs. I tried filtering these out at the Cache() level, looking at package.
So the solution I've come up with is to mark the time before we did a fetch, and only process debs that have been downloaded since by checking their ctime.
I've made this the non-default -k, --keep-unpackaged option. If we cannot squeeze the upload in before final freeze, or you aren't quite happy with this living in trunk, I can always run it from a branch as you suggested.
Thanks!
| Martin Pitt (pitti) wrote : | # |
Thanks for working on this!
I would like to clean this up a little though, so that it fits better into the existing code:
9 + verbose=False, cache_dir=None, keep_unpacked=
How about adding a sandbox_dir=None instead? If you specify it, it uses an existing one and just unpacks the debs that it actually downloaded in this run. If it is None, it uses a temporary directory. This works exactly like the cache_dir argument. Please document the new argument in the docstring, and update the function/docstring in apport/packaging.py (the abstract base class).
Filtering by time should work OK in most circumstances, but I wonder if apt.Cache() does not have a more elegant way to ask "which debs did you actually download"?
42 + optparser.
Likewise, I think a --sandbox-dir argument would be a bit clearer, and behave similar to --cache. --keep-unpacked sounds too much like a boolean option, specifying a directory looks strange. Please document this in man/apport-
This should also help to simplify the patch a bit, you won't need a separate "keep" variable, etc.
I'm quite happy to have that functionality in trunk with those changes. I originally thought you wanted to make this much more complex.
At some point we need to teach crash-digger to purge its caches and sandboxes regularly. In particular Contents.gz ages rather quickly, and the permanent sandboxes will grow pretty much indefinitely (and pile up a lot of cruft which might even get in the way). But that's a separate problem indeed.
On Tue, Apr 10, 2012 at 6:14 PM, Martin Pitt <email address hidden> wrote:
> How about adding a sandbox_dir=None instead? If you specify it, it uses an existing one and just unpacks the debs that it actually downloaded in this run. If it is None, it uses a temporary directory. This works exactly like the cache_dir argument. Please document the new argument in the docstring, and update the function/docstring in apport/packaging.py (the abstract base class).
So you would specify this and rootdir to be the same thing?
I agree that the name is better.
| Martin Pitt (pitti) wrote : | # |
> So you would specify this and rootdir to be the same thing?
Argh, sorry for that; I should have looked more closely when I reviewed that. Indeed we already have rootdir, so no need for a new sandbox_dir argument in install_packages(). So for install_packages() we indeed either need an explicit boolean option which will only unpack packages that it downloaded ("permanent_
I would like to keep the --sandbox-dir option for apport-retrace though, IMHO that looks much nicer.
54 + try:
55 + os.makedirs(
56 + except OSError:
57 + pass
Please let's rather use "if not os.path.isdir()... os.makedirs()", to catch errors resulting from EPERM, etc. I know the existing code in install_packages() uses this pattern as well once, but other places in Apport get this right. Let's fix it while we are at it.
77 + archives = os.path.
Please let's get this from Dir::Cache:
80 + cached = os.path.
Instead of line 77 and this, could you not just call os.path.
95 + help=_('Cache directory for unpacked packages. Future runs will assume that any already downloaded package is also extracted to this sandbox.'))
I'd drop the "Cache" here, same in the manpage.
Finally this could do with a test case, something like this:
- First call with a package or two, starting from an empty sandbox
- Check that packages are installed
- Second call with an extra package
- Check that both the old and the new packages are still installed
- Make it impossible to download new packages (damaging apt/sources.list perhaps?)
- Ensure that a third call with an already installed package succeeds
- Ensure that a fourth call with a new package fails with a FetchFailedExce
Thanks!
On Wed, Apr 11, 2012 at 8:38 AM, Martin Pitt <email address hidden> wrote:
> Argh, sorry for that; I should have looked more closely when I reviewed that. Indeed we already have rootdir, so no need for a new sandbox_dir argument in install_packages(). So for install_packages() we indeed either need an explicit boolean option which will only unpack packages that it downloaded ("permanent_
Done as permanent_rootdir.
> I would like to keep the --sandbox-dir option for apport-retrace though, IMHO that looks much nicer.
Agreed.
> 54 + try:
> 55 + os.makedirs(
> 56 + except OSError:
> 57 + pass
>
> Please let's rather use "if not os.path.isdir()... os.makedirs()", to catch errors resulting from EPERM, etc. I know the existing code in install_packages() uses this pattern as well once, but other places in Apport get this right. Let's fix it while we are at it.
Indeed, at the time I was just trying to keep consistent and hadn't
considered -EPERM. Fixed.
> 77 + archives = os.path.
>
> Please let's get this from Dir::Cache:
>
> 80 + cached = os.path.
>
> Instead of line 77 and this, could you not just call os.path.
Indeed. I think this was the result of copying code around as the
functionality was changing. Fixed.
> 95 + help=_('Cache directory for unpacked packages. Future runs will assume that any already downloaded package is also extracted to this sandbox.'))
>
> I'd drop the "Cache" here, same in the manpage.
Fixed.
> Finally this could do with a test case, something like this:
>
> - First call with a package or two, starting from an empty sandbox
> - Check that packages are installed
> - Second call with an extra package
> - Check that both the old and the new packages are still installed
> - Make it impossible to download new packages (damaging apt/sources.list perhaps?)
> - Ensure that a third call with an already installed package succeeds
> - Ensure that a fourth call with a new package fails with a FetchFailedExce
Done with Acquire:
| Martin Pitt (pitti) wrote : | # |
Perfect, thanks! Merged with slight whitespace fix (I keep an empty line after single-line docstrings).


And the timings against my previous merge:
ubuntu@ server- 8149:~$ ./judge/judge './previous_ changes' './with_changes' v^v^v^v^ v^v^v^v^ v^v^v^v^ v^v^v^v^ v^v^v^v^ v^v^v^v^ v^v^v^v^ v^v^v^v^ v^v^v^v^ v^v^v^v^ v^v^v^v^ v^v^v^
v^v^v^v^
n mean sd min max cmd
-14074. 836ms -62.1% p=0.000
difference is significant at 95.0% confidence (p=0.000):
ubuntu@ server- 8149:~$
50 22647.1ms 1468.2 20855.4 26954.3 ./previous_changes
50 8572.3ms 562.3 7958.8 11131.5 ./with_changes
based on these samples, suggested sample size is n>=1 to have a 3518.71ms confidence interval