Merge lp:~ev/apport/cache_extracted_debs into lp:~apport-hackers/apport/trunk
- cache_extracted_debs
- Merge into trunk
Status: | Merged |
---|---|
Merged at revision: | 2275 |
Proposed branch: | lp:~ev/apport/cache_extracted_debs |
Merge into: | lp:~apport-hackers/apport/trunk |
Diff against target: |
193 lines (+76/-9) 5 files modified
apport/packaging.py (+4/-1) backends/packaging-apt-dpkg.py (+8/-5) bin/apport-retrace (+13/-3) man/apport-retrace.1 (+8/-0) test/test_backend_apt_dpkg.py (+43/-0) |
To merge this branch: | bzr merge lp:~ev/apport/cache_extracted_debs |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Martin Pitt (community) | Approve | ||
Review via email: mp+101400@code.launchpad.net |
Commit message
Description of the change
As discussed on IRC, here's a branch to cache already extracted debs. I tried filtering these out at the Cache() level, looking at package.
So the solution I've come up with is to mark the time before we did a fetch, and only process debs that have been downloaded since by checking their ctime.
I've made this the non-default -k, --keep-unpackaged option. If we cannot squeeze the upload in before final freeze, or you aren't quite happy with this living in trunk, I can always run it from a branch as you suggested.
Thanks!
Martin Pitt (pitti) wrote : | # |
Thanks for working on this!
I would like to clean this up a little though, so that it fits better into the existing code:
9 + verbose=False, cache_dir=None, keep_unpacked=
How about adding a sandbox_dir=None instead? If you specify it, it uses an existing one and just unpacks the debs that it actually downloaded in this run. If it is None, it uses a temporary directory. This works exactly like the cache_dir argument. Please document the new argument in the docstring, and update the function/docstring in apport/packaging.py (the abstract base class).
Filtering by time should work OK in most circumstances, but I wonder if apt.Cache() does not have a more elegant way to ask "which debs did you actually download"?
42 + optparser.
Likewise, I think a --sandbox-dir argument would be a bit clearer, and behave similar to --cache. --keep-unpacked sounds too much like a boolean option, specifying a directory looks strange. Please document this in man/apport-
This should also help to simplify the patch a bit, you won't need a separate "keep" variable, etc.
I'm quite happy to have that functionality in trunk with those changes. I originally thought you wanted to make this much more complex.
At some point we need to teach crash-digger to purge its caches and sandboxes regularly. In particular Contents.gz ages rather quickly, and the permanent sandboxes will grow pretty much indefinitely (and pile up a lot of cruft which might even get in the way). But that's a separate problem indeed.
On Tue, Apr 10, 2012 at 6:14 PM, Martin Pitt <email address hidden> wrote:
> How about adding a sandbox_dir=None instead? If you specify it, it uses an existing one and just unpacks the debs that it actually downloaded in this run. If it is None, it uses a temporary directory. This works exactly like the cache_dir argument. Please document the new argument in the docstring, and update the function/docstring in apport/packaging.py (the abstract base class).
So you would specify this and rootdir to be the same thing?
I agree that the name is better.
Martin Pitt (pitti) wrote : | # |
> So you would specify this and rootdir to be the same thing?
Argh, sorry for that; I should have looked more closely when I reviewed that. Indeed we already have rootdir, so no need for a new sandbox_dir argument in install_packages(). So for install_packages() we indeed either need an explicit boolean option which will only unpack packages that it downloaded ("permanent_
I would like to keep the --sandbox-dir option for apport-retrace though, IMHO that looks much nicer.
54 + try:
55 + os.makedirs(
56 + except OSError:
57 + pass
Please let's rather use "if not os.path.isdir()... os.makedirs()", to catch errors resulting from EPERM, etc. I know the existing code in install_packages() uses this pattern as well once, but other places in Apport get this right. Let's fix it while we are at it.
77 + archives = os.path.
Please let's get this from Dir::Cache:
80 + cached = os.path.
Instead of line 77 and this, could you not just call os.path.
95 + help=_('Cache directory for unpacked packages. Future runs will assume that any already downloaded package is also extracted to this sandbox.'))
I'd drop the "Cache" here, same in the manpage.
Finally this could do with a test case, something like this:
- First call with a package or two, starting from an empty sandbox
- Check that packages are installed
- Second call with an extra package
- Check that both the old and the new packages are still installed
- Make it impossible to download new packages (damaging apt/sources.list perhaps?)
- Ensure that a third call with an already installed package succeeds
- Ensure that a fourth call with a new package fails with a FetchFailedExce
Thanks!
On Wed, Apr 11, 2012 at 8:38 AM, Martin Pitt <email address hidden> wrote:
> Argh, sorry for that; I should have looked more closely when I reviewed that. Indeed we already have rootdir, so no need for a new sandbox_dir argument in install_packages(). So for install_packages() we indeed either need an explicit boolean option which will only unpack packages that it downloaded ("permanent_
Done as permanent_rootdir.
> I would like to keep the --sandbox-dir option for apport-retrace though, IMHO that looks much nicer.
Agreed.
> 54 + try:
> 55 + os.makedirs(
> 56 + except OSError:
> 57 + pass
>
> Please let's rather use "if not os.path.isdir()... os.makedirs()", to catch errors resulting from EPERM, etc. I know the existing code in install_packages() uses this pattern as well once, but other places in Apport get this right. Let's fix it while we are at it.
Indeed, at the time I was just trying to keep consistent and hadn't
considered -EPERM. Fixed.
> 77 + archives = os.path.
>
> Please let's get this from Dir::Cache:
>
> 80 + cached = os.path.
>
> Instead of line 77 and this, could you not just call os.path.
Indeed. I think this was the result of copying code around as the
functionality was changing. Fixed.
> 95 + help=_('Cache directory for unpacked packages. Future runs will assume that any already downloaded package is also extracted to this sandbox.'))
>
> I'd drop the "Cache" here, same in the manpage.
Fixed.
> Finally this could do with a test case, something like this:
>
> - First call with a package or two, starting from an empty sandbox
> - Check that packages are installed
> - Second call with an extra package
> - Check that both the old and the new packages are still installed
> - Make it impossible to download new packages (damaging apt/sources.list perhaps?)
> - Ensure that a third call with an already installed package succeeds
> - Ensure that a fourth call with a new package fails with a FetchFailedExce
Done with Acquire:
Martin Pitt (pitti) wrote : | # |
Perfect, thanks! Merged with slight whitespace fix (I keep an empty line after single-line docstrings).
Preview Diff
1 | === modified file 'apport/packaging.py' |
2 | --- apport/packaging.py 2012-03-01 10:06:25 +0000 |
3 | +++ apport/packaging.py 2012-04-11 11:30:31 +0000 |
4 | @@ -172,7 +172,7 @@ |
5 | raise NotImplementedError('this method must be implemented by a concrete subclass') |
6 | |
7 | def install_packages(self, rootdir, configdir, release, packages, |
8 | - verbose=False, cache_dir=None): |
9 | + verbose=False, cache_dir=None, permanent_rootdir=False): |
10 | '''Install packages into a sandbox (for apport-retrace). |
11 | |
12 | In order to work without any special permissions and without touching |
13 | @@ -194,6 +194,9 @@ |
14 | If cache_dir is given, then the downloaded packages will be stored |
15 | there, to speed up subsequent retraces. |
16 | |
17 | + If permanent_rootdir is True, then the sandbox created from the |
18 | + downloaded packages will be reused, to speed up subsequent retraces. |
19 | + |
20 | Return a string with outdated packages, or None if all packages were |
21 | installed. |
22 | |
23 | |
24 | === modified file 'backends/packaging-apt-dpkg.py' |
25 | --- backends/packaging-apt-dpkg.py 2012-04-10 12:39:32 +0000 |
26 | +++ backends/packaging-apt-dpkg.py 2012-04-11 11:30:31 +0000 |
27 | @@ -463,7 +463,7 @@ |
28 | return (installed, outdated) |
29 | |
30 | def install_packages(self, rootdir, configdir, release, packages, |
31 | - verbose=False, cache_dir=None): |
32 | + verbose=False, cache_dir=None, permanent_rootdir=False): |
33 | '''Install packages into a sandbox (for apport-retrace). |
34 | |
35 | In order to work without any special permissions and without touching |
36 | @@ -485,6 +485,9 @@ |
37 | If cache_dir is given, then the downloaded packages will be stored |
38 | there, to speed up subsequent retraces. |
39 | |
40 | + If permanent_rootdir is True, then the sandbox created from the |
41 | + downloaded packages will be reused, to speed up subsequent retraces. |
42 | + |
43 | Return a string with outdated packages, or None if all packages were |
44 | installed. |
45 | |
46 | @@ -506,10 +509,8 @@ |
47 | aptroot = os.path.join(cache_dir, release, 'apt') |
48 | else: |
49 | aptroot = os.path.join(cache_dir, 'system', 'apt') |
50 | - try: |
51 | + if not os.path.isdir(aptroot): |
52 | os.makedirs(aptroot) |
53 | - except OSError: |
54 | - pass |
55 | else: |
56 | tmp_aptroot = True |
57 | aptroot = tempfile.mkdtemp() |
58 | @@ -561,6 +562,7 @@ |
59 | for p in real_pkgs: |
60 | c[p].mark_install(False, False) |
61 | |
62 | + last_written = time.time() |
63 | # fetch packages |
64 | fetcher = apt.apt_pkg.Acquire(fetchProgress) |
65 | try: |
66 | @@ -573,7 +575,8 @@ |
67 | if verbose: |
68 | print('Extracting downloaded debs...') |
69 | for i in fetcher.items: |
70 | - subprocess.check_call(['dpkg', '-x', i.destfile, rootdir]) |
71 | + if not permanent_rootdir or os.path.getctime(i.destfile) > last_written: |
72 | + subprocess.check_call(['dpkg', '-x', i.destfile, rootdir]) |
73 | real_pkgs.remove(os.path.basename(i.destfile).split('_', 1)[0]) |
74 | |
75 | if tmp_aptroot: |
76 | |
77 | === modified file 'bin/apport-retrace' |
78 | --- bin/apport-retrace 2012-02-29 11:13:45 +0000 |
79 | +++ bin/apport-retrace 2012-04-11 11:30:31 +0000 |
80 | @@ -59,6 +59,8 @@ |
81 | help=_('Prepend timestamps to log messages, for batch operation')) |
82 | optparser.add_option('-C', '--cache', metavar='DIR', |
83 | help=_('Cache directory for packages downloaded in the sandbox')) |
84 | + optparser.add_option('--sandbox-dir', metavar='DIR', |
85 | + help=_('Directory for unpacked packages. Future runs will assume that any already downloaded package is also extracted to this sandbox.')) |
86 | optparser.add_option('-p', '--extra-package', action='append', default=[], |
87 | help=_('Install an extra package into the sandbox (can be specified multiple times)')) |
88 | optparser.add_option('--auth', |
89 | @@ -336,8 +338,15 @@ |
90 | sandbox = None |
91 | outdated_msg = None |
92 | if options.sandbox: |
93 | - sandbox = tempfile.mkdtemp() |
94 | - atexit.register(shutil.rmtree, sandbox) |
95 | + if options.sandbox_dir: |
96 | + sandbox = os.path.abspath(options.sandbox_dir) |
97 | + if not os.path.isdir(sandbox): |
98 | + os.makedirs(sandbox) |
99 | + permanent_rootdir = True |
100 | + else: |
101 | + sandbox = tempfile.mkdtemp() |
102 | + atexit.register(shutil.rmtree, sandbox) |
103 | + permanent_rootdir = False |
104 | |
105 | pkgs = needed_packages(report) |
106 | for p in options.extra_package: |
107 | @@ -355,7 +364,8 @@ |
108 | |
109 | try: |
110 | outdated_msg = apport.packaging.install_packages(sandbox, sandbox_arg, |
111 | - report['DistroRelease'], pkgs, options.verbose, options.cache) |
112 | + report['DistroRelease'], pkgs, options.verbose, options.cache, |
113 | + permanent_rootdir) |
114 | except SystemError as e: |
115 | sys.stderr.write(str(e) + '\n') |
116 | sys.exit(1) |
117 | |
118 | === modified file 'man/apport-retrace.1' |
119 | --- man/apport-retrace.1 2011-08-25 05:18:52 +0000 |
120 | +++ man/apport-retrace.1 2012-04-11 11:30:31 +0000 |
121 | @@ -123,6 +123,14 @@ |
122 | recommended. |
123 | |
124 | .TP |
125 | +.B \-\-sandbox\-dir=\fIDIR |
126 | +Permanent directory for the sandbox of extracted packages. If not specified all |
127 | +cached packages will have to be re-extracted at each run of |
128 | +.B apport\-retrace\fR. |
129 | +If you use sandbox mode regularly, using a permanent cache directory is highly |
130 | +recommended. |
131 | + |
132 | +.TP |
133 | .B \-h, \-\-help |
134 | Print a short help that documents all options. |
135 | |
136 | |
137 | === modified file 'test/test_backend_apt_dpkg.py' |
138 | --- test/test_backend_apt_dpkg.py 2012-04-03 11:52:41 +0000 |
139 | +++ test/test_backend_apt_dpkg.py 2012-04-11 11:30:31 +0000 |
140 | @@ -1,4 +1,6 @@ |
141 | import unittest, gzip, imp, subprocess, tempfile, shutil, os, os.path, time |
142 | +import apt_pkg |
143 | +import glob |
144 | |
145 | if os.environ.get('APPORT_TEST_LOCAL'): |
146 | impl = imp.load_source('', 'backends/packaging-apt-dpkg.py').impl |
147 | @@ -497,6 +499,47 @@ |
148 | self.assertTrue('nosuchdistro' in str(e), str(e)) |
149 | self.assertTrue('index files failed to download' in str(e)) |
150 | |
151 | + @unittest.skipUnless(_has_default_route(), 'online test') |
152 | + def test_install_packages_permanent_sandbox(self): |
153 | + '''install_packages() with a permanent sandbox''' |
154 | + self._setup_foonux_config() |
155 | + zonetab = os.path.join(self.rootdir, 'usr/share/zoneinfo/zone.tab') |
156 | + |
157 | + impl.install_packages(self.rootdir, self.configdir, 'Foonux 1.2', |
158 | + [('tzdata', None)], False, self.cachedir, permanent_rootdir=True) |
159 | + |
160 | + # This will now be using a Cache with our rootdir. |
161 | + archives = apt_pkg.Config.FindDir('Dir::Cache::archives') |
162 | + tzdata = glob.glob(os.path.join(archives, 'tzdata*.deb')) |
163 | + if not tzdata: |
164 | + self.fail('tzdata was not downloaded') |
165 | + tzdata_written = os.path.getctime(tzdata[0]) |
166 | + zonetab_written = os.path.getctime(zonetab) |
167 | + |
168 | + impl.install_packages(self.rootdir, self.configdir, 'Foonux 1.2', |
169 | + [('coreutils', None), ('tzdata', None)], False, self.cachedir, |
170 | + permanent_rootdir=True) |
171 | + |
172 | + if not glob.glob(os.path.join(archives, 'coreutils*.deb')): |
173 | + self.fail('coreutils was not downloaded.') |
174 | + self.assertEqual(os.path.getctime(tzdata[0]), tzdata_written, |
175 | + 'tzdata downloaded twice.') |
176 | + self.assertEqual(zonetab_written, os.path.getctime(zonetab), |
177 | + 'zonetab written twice.') |
178 | + self.assertTrue(os.path.exists(os.path.join(self.rootdir, |
179 | + 'usr/bin/stat'))) |
180 | + |
181 | + # Prevent packages from downloading. |
182 | + apt_pkg.Config.set('Acquire::http::Proxy', 'http://nonexistent') |
183 | + self.assertRaises(SystemExit, impl.install_packages, self.rootdir, |
184 | + self.configdir, 'Foonux 1.2', [('libc6', None)], False, |
185 | + self.cachedir, permanent_rootdir=True) |
186 | + # These packages exist, so attempting to install them should not fail. |
187 | + impl.install_packages(self.rootdir, self.configdir, 'Foonux 1.2', |
188 | + [('coreutils', None), ('tzdata', None)], False, self.cachedir, |
189 | + permanent_rootdir=True) |
190 | + apt_pkg.Config.set('Acquire::http::Proxy', '') |
191 | + |
192 | def _setup_foonux_config(self): |
193 | '''Set up directories and configuration for install_packages()''' |
194 |
And the timings against my previous merge:
ubuntu@ server- 8149:~$ ./judge/judge './previous_ changes' './with_changes' v^v^v^v^ v^v^v^v^ v^v^v^v^ v^v^v^v^ v^v^v^v^ v^v^v^v^ v^v^v^v^ v^v^v^v^ v^v^v^v^ v^v^v^v^ v^v^v^v^ v^v^v^
v^v^v^v^
n mean sd min max cmd
-14074. 836ms -62.1% p=0.000
difference is significant at 95.0% confidence (p=0.000):
ubuntu@ server- 8149:~$
50 22647.1ms 1468.2 20855.4 26954.3 ./previous_changes
50 8572.3ms 562.3 7958.8 11131.5 ./with_changes
based on these samples, suggested sample size is n>=1 to have a 3518.71ms confidence interval