Merge lp:~ev/apport/cache_extracted_debs into lp:~apport-hackers/apport/trunk

Proposed by Evan
Status: Merged
Merged at revision: 2275
Proposed branch: lp:~ev/apport/cache_extracted_debs
Merge into: lp:~apport-hackers/apport/trunk
Diff against target: 193 lines (+76/-9)
5 files modified
apport/packaging.py (+4/-1)
backends/packaging-apt-dpkg.py (+8/-5)
bin/apport-retrace (+13/-3)
man/apport-retrace.1 (+8/-0)
test/test_backend_apt_dpkg.py (+43/-0)
To merge this branch: bzr merge lp:~ev/apport/cache_extracted_debs
Reviewer Review Type Date Requested Status
Martin Pitt (community) Approve
Review via email: mp+101400@code.launchpad.net

Description of the change

As discussed on IRC, here's a branch to cache already extracted debs. I tried filtering these out at the Cache() level, looking at package.candidate.uri. Unfortunately, this doesn't exactly map to the resulting filename, as it doesn't take epochs into account.

So the solution I've come up with is to mark the time before we did a fetch, and only process debs that have been downloaded since by checking their ctime.

I've made this the non-default -k, --keep-unpackaged option. If we cannot squeeze the upload in before final freeze, or you aren't quite happy with this living in trunk, I can always run it from a branch as you suggested.

Thanks!

To post a comment you must log in.
Revision history for this message
Evan (ev) wrote :

And the timings against my previous merge:

ubuntu@server-8149:~$ ./judge/judge './previous_changes' './with_changes'
v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^v^

    n mean sd min max cmd
       50 22647.1ms 1468.2 20855.4 26954.3 ./previous_changes
          50 8572.3ms 562.3 7958.8 11131.5 ./with_changes
          -14074.836ms -62.1% p=0.000
          difference is significant at 95.0% confidence (p=0.000):
          based on these samples, suggested sample size is n>=1 to have a 3518.71ms confidence interval
          ubuntu@server-8149:~$

Revision history for this message
Martin Pitt (pitti) wrote :

Thanks for working on this!

I would like to clean this up a little though, so that it fits better into the existing code:

9 + verbose=False, cache_dir=None, keep_unpacked=False):

How about adding a sandbox_dir=None instead? If you specify it, it uses an existing one and just unpacks the debs that it actually downloaded in this run. If it is None, it uses a temporary directory. This works exactly like the cache_dir argument. Please document the new argument in the docstring, and update the function/docstring in apport/packaging.py (the abstract base class).

Filtering by time should work OK in most circumstances, but I wonder if apt.Cache() does not have a more elegant way to ask "which debs did you actually download"?

42 + optparser.add_option('-k', '--keep-unpacked', metavar='DIR',

Likewise, I think a --sandbox-dir argument would be a bit clearer, and behave similar to --cache. --keep-unpacked sounds too much like a boolean option, specifying a directory looks strange. Please document this in man/apport-retrace.1.

This should also help to simplify the patch a bit, you won't need a separate "keep" variable, etc.

I'm quite happy to have that functionality in trunk with those changes. I originally thought you wanted to make this much more complex.

At some point we need to teach crash-digger to purge its caches and sandboxes regularly. In particular Contents.gz ages rather quickly, and the permanent sandboxes will grow pretty much indefinitely (and pile up a lot of cruft which might even get in the way). But that's a separate problem indeed.

review: Needs Fixing
lp:~ev/apport/cache_extracted_debs updated
2279. By Evan

Move to a --sandbox-dir option, as suggested by Martin. Document.

Revision history for this message
Evan (ev) wrote :

On Tue, Apr 10, 2012 at 6:14 PM, Martin Pitt <email address hidden> wrote:
> How about adding a sandbox_dir=None instead? If you specify it, it uses an existing one and just unpacks the debs that it actually downloaded in this run. If it is None, it uses a temporary directory. This works exactly like the cache_dir argument. Please document the new argument in the docstring, and update the function/docstring in apport/packaging.py (the abstract base class).

So you would specify this and rootdir to be the same thing?

I agree that the name is better.

Revision history for this message
Martin Pitt (pitti) wrote :

> So you would specify this and rootdir to be the same thing?

Argh, sorry for that; I should have looked more closely when I reviewed that. Indeed we already have rootdir, so no need for a new sandbox_dir argument in install_packages(). So for install_packages() we indeed either need an explicit boolean option which will only unpack packages that it downloaded ("permanent_rootdir=False" perhaps?), or it gets implied by checking if the rootdir is nonempty.

I would like to keep the --sandbox-dir option for apport-retrace though, IMHO that looks much nicer.

54 + try:
55 + os.makedirs(sandbox_dir)
56 + except OSError:
57 + pass

Please let's rather use "if not os.path.isdir()... os.makedirs()", to catch errors resulting from EPERM, etc. I know the existing code in install_packages() uses this pattern as well once, but other places in Apport get this right. Let's fix it while we are at it.

77 + archives = os.path.join(aptroot, 'var/cache/apt/archives')

Please let's get this from Dir::Cache::archives instead of hardcoding the path. But I wonder why we need this at all?

80 + cached = os.path.join(archives, os.path.basename(i.destfile))

Instead of line 77 and this, could you not just call os.path.getctime(i.destfile)? It should be the very same.

95 + help=_('Cache directory for unpacked packages. Future runs will assume that any already downloaded package is also extracted to this sandbox.'))

I'd drop the "Cache" here, same in the manpage.

Finally this could do with a test case, something like this:

 - First call with a package or two, starting from an empty sandbox
 - Check that packages are installed
 - Second call with an extra package
 - Check that both the old and the new packages are still installed
 - Make it impossible to download new packages (damaging apt/sources.list perhaps?)
 - Ensure that a third call with an already installed package succeeds
 - Ensure that a fourth call with a new package fails with a FetchFailedException

Thanks!

review: Needs Fixing
lp:~ev/apport/cache_extracted_debs updated
2280. By Evan

Use a boolean option to specify that we have a permanent root directory in install packages, rather than passing a redundant directory as the final argument.

2281. By Evan

Accidentally inverted test.

Revision history for this message
Evan (ev) wrote :

On Wed, Apr 11, 2012 at 8:38 AM, Martin Pitt <email address hidden> wrote:
> Argh, sorry for that; I should have looked more closely when I reviewed that. Indeed we already have rootdir, so no need for a new sandbox_dir argument in install_packages(). So for install_packages() we indeed either need an explicit boolean option which will only unpack packages that it downloaded ("permanent_rootdir=False" perhaps?), or it gets implied by checking if the rootdir is nonempty.

Done as permanent_rootdir.

> I would like to keep the --sandbox-dir option for apport-retrace though, IMHO that looks much nicer.

Agreed.

> 54      + try:
> 55      + os.makedirs(sandbox_dir)
> 56      + except OSError:
> 57      + pass
>
> Please let's rather use "if not os.path.isdir()... os.makedirs()", to catch errors resulting from EPERM, etc. I know the existing code in install_packages() uses this pattern as well once, but other places in Apport get this right. Let's fix it while we are at it.

Indeed, at the time I was just trying to keep consistent and hadn't
considered -EPERM. Fixed.

> 77      + archives = os.path.join(aptroot, 'var/cache/apt/archives')
>
> Please let's get this from Dir::Cache::archives instead of hardcoding the path. But I wonder why we need this at all?
>
> 80      + cached = os.path.join(archives, os.path.basename(i.destfile))
>
> Instead of line 77 and this, could you not just call os.path.getctime(i.destfile)? It should be the very same.

Indeed. I think this was the result of copying code around as the
functionality was changing. Fixed.

> 95      + help=_('Cache directory for unpacked packages. Future runs will assume that any already downloaded package is also extracted to this sandbox.'))
>
> I'd drop the "Cache" here, same in the manpage.

Fixed.

> Finally this could do with a test case, something like this:
>
>  - First call with a package or two, starting from an empty sandbox
>  - Check that packages are installed
>  - Second call with an extra package
>  - Check that both the old and the new packages are still installed
>  - Make it impossible to download new packages (damaging apt/sources.list perhaps?)
>  - Ensure that a third call with an already installed package succeeds
>  - Ensure that a fourth call with a new package fails with a FetchFailedException

Done with Acquire::http::Proxy.

lp:~ev/apport/cache_extracted_debs updated
2282. By Evan

Test for permanent sandbox (--sandbox-dir).

Revision history for this message
Martin Pitt (pitti) wrote :

Perfect, thanks! Merged with slight whitespace fix (I keep an empty line after single-line docstrings).

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'apport/packaging.py'
2--- apport/packaging.py 2012-03-01 10:06:25 +0000
3+++ apport/packaging.py 2012-04-11 11:30:31 +0000
4@@ -172,7 +172,7 @@
5 raise NotImplementedError('this method must be implemented by a concrete subclass')
6
7 def install_packages(self, rootdir, configdir, release, packages,
8- verbose=False, cache_dir=None):
9+ verbose=False, cache_dir=None, permanent_rootdir=False):
10 '''Install packages into a sandbox (for apport-retrace).
11
12 In order to work without any special permissions and without touching
13@@ -194,6 +194,9 @@
14 If cache_dir is given, then the downloaded packages will be stored
15 there, to speed up subsequent retraces.
16
17+ If permanent_rootdir is True, then the sandbox created from the
18+ downloaded packages will be reused, to speed up subsequent retraces.
19+
20 Return a string with outdated packages, or None if all packages were
21 installed.
22
23
24=== modified file 'backends/packaging-apt-dpkg.py'
25--- backends/packaging-apt-dpkg.py 2012-04-10 12:39:32 +0000
26+++ backends/packaging-apt-dpkg.py 2012-04-11 11:30:31 +0000
27@@ -463,7 +463,7 @@
28 return (installed, outdated)
29
30 def install_packages(self, rootdir, configdir, release, packages,
31- verbose=False, cache_dir=None):
32+ verbose=False, cache_dir=None, permanent_rootdir=False):
33 '''Install packages into a sandbox (for apport-retrace).
34
35 In order to work without any special permissions and without touching
36@@ -485,6 +485,9 @@
37 If cache_dir is given, then the downloaded packages will be stored
38 there, to speed up subsequent retraces.
39
40+ If permanent_rootdir is True, then the sandbox created from the
41+ downloaded packages will be reused, to speed up subsequent retraces.
42+
43 Return a string with outdated packages, or None if all packages were
44 installed.
45
46@@ -506,10 +509,8 @@
47 aptroot = os.path.join(cache_dir, release, 'apt')
48 else:
49 aptroot = os.path.join(cache_dir, 'system', 'apt')
50- try:
51+ if not os.path.isdir(aptroot):
52 os.makedirs(aptroot)
53- except OSError:
54- pass
55 else:
56 tmp_aptroot = True
57 aptroot = tempfile.mkdtemp()
58@@ -561,6 +562,7 @@
59 for p in real_pkgs:
60 c[p].mark_install(False, False)
61
62+ last_written = time.time()
63 # fetch packages
64 fetcher = apt.apt_pkg.Acquire(fetchProgress)
65 try:
66@@ -573,7 +575,8 @@
67 if verbose:
68 print('Extracting downloaded debs...')
69 for i in fetcher.items:
70- subprocess.check_call(['dpkg', '-x', i.destfile, rootdir])
71+ if not permanent_rootdir or os.path.getctime(i.destfile) > last_written:
72+ subprocess.check_call(['dpkg', '-x', i.destfile, rootdir])
73 real_pkgs.remove(os.path.basename(i.destfile).split('_', 1)[0])
74
75 if tmp_aptroot:
76
77=== modified file 'bin/apport-retrace'
78--- bin/apport-retrace 2012-02-29 11:13:45 +0000
79+++ bin/apport-retrace 2012-04-11 11:30:31 +0000
80@@ -59,6 +59,8 @@
81 help=_('Prepend timestamps to log messages, for batch operation'))
82 optparser.add_option('-C', '--cache', metavar='DIR',
83 help=_('Cache directory for packages downloaded in the sandbox'))
84+ optparser.add_option('--sandbox-dir', metavar='DIR',
85+ help=_('Directory for unpacked packages. Future runs will assume that any already downloaded package is also extracted to this sandbox.'))
86 optparser.add_option('-p', '--extra-package', action='append', default=[],
87 help=_('Install an extra package into the sandbox (can be specified multiple times)'))
88 optparser.add_option('--auth',
89@@ -336,8 +338,15 @@
90 sandbox = None
91 outdated_msg = None
92 if options.sandbox:
93- sandbox = tempfile.mkdtemp()
94- atexit.register(shutil.rmtree, sandbox)
95+ if options.sandbox_dir:
96+ sandbox = os.path.abspath(options.sandbox_dir)
97+ if not os.path.isdir(sandbox):
98+ os.makedirs(sandbox)
99+ permanent_rootdir = True
100+ else:
101+ sandbox = tempfile.mkdtemp()
102+ atexit.register(shutil.rmtree, sandbox)
103+ permanent_rootdir = False
104
105 pkgs = needed_packages(report)
106 for p in options.extra_package:
107@@ -355,7 +364,8 @@
108
109 try:
110 outdated_msg = apport.packaging.install_packages(sandbox, sandbox_arg,
111- report['DistroRelease'], pkgs, options.verbose, options.cache)
112+ report['DistroRelease'], pkgs, options.verbose, options.cache,
113+ permanent_rootdir)
114 except SystemError as e:
115 sys.stderr.write(str(e) + '\n')
116 sys.exit(1)
117
118=== modified file 'man/apport-retrace.1'
119--- man/apport-retrace.1 2011-08-25 05:18:52 +0000
120+++ man/apport-retrace.1 2012-04-11 11:30:31 +0000
121@@ -123,6 +123,14 @@
122 recommended.
123
124 .TP
125+.B \-\-sandbox\-dir=\fIDIR
126+Permanent directory for the sandbox of extracted packages. If not specified all
127+cached packages will have to be re-extracted at each run of
128+.B apport\-retrace\fR.
129+If you use sandbox mode regularly, using a permanent cache directory is highly
130+recommended.
131+
132+.TP
133 .B \-h, \-\-help
134 Print a short help that documents all options.
135
136
137=== modified file 'test/test_backend_apt_dpkg.py'
138--- test/test_backend_apt_dpkg.py 2012-04-03 11:52:41 +0000
139+++ test/test_backend_apt_dpkg.py 2012-04-11 11:30:31 +0000
140@@ -1,4 +1,6 @@
141 import unittest, gzip, imp, subprocess, tempfile, shutil, os, os.path, time
142+import apt_pkg
143+import glob
144
145 if os.environ.get('APPORT_TEST_LOCAL'):
146 impl = imp.load_source('', 'backends/packaging-apt-dpkg.py').impl
147@@ -497,6 +499,47 @@
148 self.assertTrue('nosuchdistro' in str(e), str(e))
149 self.assertTrue('index files failed to download' in str(e))
150
151+ @unittest.skipUnless(_has_default_route(), 'online test')
152+ def test_install_packages_permanent_sandbox(self):
153+ '''install_packages() with a permanent sandbox'''
154+ self._setup_foonux_config()
155+ zonetab = os.path.join(self.rootdir, 'usr/share/zoneinfo/zone.tab')
156+
157+ impl.install_packages(self.rootdir, self.configdir, 'Foonux 1.2',
158+ [('tzdata', None)], False, self.cachedir, permanent_rootdir=True)
159+
160+ # This will now be using a Cache with our rootdir.
161+ archives = apt_pkg.Config.FindDir('Dir::Cache::archives')
162+ tzdata = glob.glob(os.path.join(archives, 'tzdata*.deb'))
163+ if not tzdata:
164+ self.fail('tzdata was not downloaded')
165+ tzdata_written = os.path.getctime(tzdata[0])
166+ zonetab_written = os.path.getctime(zonetab)
167+
168+ impl.install_packages(self.rootdir, self.configdir, 'Foonux 1.2',
169+ [('coreutils', None), ('tzdata', None)], False, self.cachedir,
170+ permanent_rootdir=True)
171+
172+ if not glob.glob(os.path.join(archives, 'coreutils*.deb')):
173+ self.fail('coreutils was not downloaded.')
174+ self.assertEqual(os.path.getctime(tzdata[0]), tzdata_written,
175+ 'tzdata downloaded twice.')
176+ self.assertEqual(zonetab_written, os.path.getctime(zonetab),
177+ 'zonetab written twice.')
178+ self.assertTrue(os.path.exists(os.path.join(self.rootdir,
179+ 'usr/bin/stat')))
180+
181+ # Prevent packages from downloading.
182+ apt_pkg.Config.set('Acquire::http::Proxy', 'http://nonexistent')
183+ self.assertRaises(SystemExit, impl.install_packages, self.rootdir,
184+ self.configdir, 'Foonux 1.2', [('libc6', None)], False,
185+ self.cachedir, permanent_rootdir=True)
186+ # These packages exist, so attempting to install them should not fail.
187+ impl.install_packages(self.rootdir, self.configdir, 'Foonux 1.2',
188+ [('coreutils', None), ('tzdata', None)], False, self.cachedir,
189+ permanent_rootdir=True)
190+ apt_pkg.Config.set('Acquire::http::Proxy', '')
191+
192 def _setup_foonux_config(self):
193 '''Set up directories and configuration for install_packages()'''
194

Subscribers

People subscribed via source and target branches