Merge ~cjwatson/launchpad:archive-file-history into launchpad:master

Proposed by Colin Watson
Status: Needs review
Proposed branch: ~cjwatson/launchpad:archive-file-history
Merge into: launchpad:master
Diff against target: 766 lines (+242/-172)
5 files modified
lib/lp/archivepublisher/publishing.py (+43/-22)
lib/lp/archivepublisher/tests/test_publisher.py (+88/-55)
lib/lp/soyuz/interfaces/archivefile.py (+12/-9)
lib/lp/soyuz/model/archivefile.py (+37/-47)
lib/lp/soyuz/tests/test_archivefile.py (+62/-39)
Reviewer Review Type Date Requested Status
Launchpad code reviewers Pending
Review via email: mp+373971@code.launchpad.net

Commit message

Turn ArchiveFile into a history table

This adds date_created and date_superseded columns. Adjust the
publisher to match.

LP: #1765933

Description of the change

The main complexity here is in the changed publisher logic, especially for reprieving (that is, the situation where file contents that were scheduled for deletion become live again, particularly common for empty files). We previously did this by simply clearing scheduled_deletion_date on the old ArchiveFile row, but that strategy no longer works when we're trying to maintain history: instead, we need to create new rows in such cases. As a result of the logic changes here, we no longer need the only_condemned=True option in ArchiveFileSet.getByArchive.

I think the publisher tests are now somewhat clearer, since they now explicitly test creation dates, making the chain of events more obvious.

ArchiveFile.superseded_at is arguably redundant with ArchiveFile.scheduled_deletion_date, but I think keeping both is a good idea for clarity, and in case we ever change the stay of execution in future.

We'll need to backfill the new columns. I'll probably make a separate branch with a garbo job for this: my plan is to set date_created to some arbitrary value (probably just the epoch, so that it's clear that it's arbitrary), and to set date_superseded to scheduled_deletion_date minus the stay of execution for rows that have a scheduled_deletion_date.

This is essentially the same as https://code.launchpad.net/~cjwatson/launchpad/archive-file-history/+merge/343752, converted to git and rebased on master. https://code.launchpad.net/~cjwatson/launchpad/+git/launchpad/+merge/373970 has the corresponding DB patch.

To post a comment you must log in.
Revision history for this message
Colin Watson (cjwatson) wrote :

On second thoughts, we probably shouldn't backfill date_created; NULL is just as good a way to indicate an unknown creation date as the epoch, probably better. Backfilling date_superseded to scheduled_deletion_date minus the stay of execution where applicable does still seem to make sense though, and would allow historical queries to be close enough in most cases (by taking the row with the highest ID that hadn't been superseded at the requested time).

Revision history for this message
William Grant (wgrant) wrote :

I found the new _updateByHash structure somewhat difficult to understand, and tried a different approach in https://code.launchpad.net/~wgrant/launchpad/+git/launchpad/+merge/390811. Not nearly landable, but I think the flow is much clearer, and I'd be interested in your thoughts.

Unmerged commits

178ada8... by Colin Watson on 2018-04-21

Turn ArchiveFile into a history table

This adds date_created and date_superseded columns. Adjust the
publisher to match.

LP: #1765933

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/lib/lp/archivepublisher/publishing.py b/lib/lp/archivepublisher/publishing.py
2index 5908ccc..4735b89 100644
3--- a/lib/lp/archivepublisher/publishing.py
4+++ b/lib/lp/archivepublisher/publishing.py
5@@ -1074,8 +1074,7 @@ class Publisher(object):
6 assert path.startswith("dists/")
7 return path[len("dists/"):]
8
9- # Gather information on entries in the current Release file, and
10- # make sure nothing there is condemned.
11+ # Gather information on entries in the current Release file.
12 current_files = {}
13 for current_entry in (
14 release_data["SHA256"] + extra_data.get("SHA256", [])):
15@@ -1084,33 +1083,54 @@ class Publisher(object):
16 real_path = os.path.join(suite_dir, real_name)
17 current_files[path] = (
18 int(current_entry["size"]), current_entry["sha256"], real_path)
19+
20+ # Gather information on entries currently in the database. Ensure
21+ # that we know about all the relevant by-hash directory trees before
22+ # doing any removals so that we can prune them properly later, and
23+ # work out which condemned files should be reprieved due to the
24+ # paths in question having their previous content again.
25+ reprieved_files = defaultdict(list)
26 uncondemned_files = set()
27 for db_file in archive_file_set.getByArchive(
28- self.archive, container=container, only_condemned=True,
29- eager_load=True):
30- stripped_path = strip_dists(db_file.path)
31- if stripped_path in current_files:
32- current_sha256 = current_files[stripped_path][1]
33- if db_file.library_file.content.sha256 == current_sha256:
34- uncondemned_files.add(db_file)
35- if uncondemned_files:
36- for container, path, sha256 in archive_file_set.unscheduleDeletion(
37- uncondemned_files):
38+ self.archive, container=container, eager_load=True):
39+ by_hashes.registerChild(os.path.dirname(strip_dists(db_file.path)))
40+ file_key = (db_file.path, db_file.library_file.content.sha256)
41+ if db_file.scheduled_deletion_date is None:
42+ uncondemned_files.add(file_key)
43+ else:
44+ stripped_path = strip_dists(db_file.path)
45+ if stripped_path in current_files:
46+ current_sha256 = current_files[stripped_path][1]
47+ if db_file.library_file.content.sha256 == current_sha256:
48+ reprieved_files[file_key].append(db_file)
49+
50+ # We may already have uncondemned entries with the same path and
51+ # content as condemned entries that we were about to reprieve; if
52+ # so, there's no need to reprieve them.
53+ for file_key in uncondemned_files:
54+ reprieved_files.pop(file_key, None)
55+
56+ # Make sure nothing in the current Release file is condemned.
57+ if reprieved_files:
58+ reprieved_files_flat = set(
59+ chain.from_iterable(reprieved_files.values()))
60+ archive_file_set.unscheduleDeletion(reprieved_files_flat)
61+ for db_file in reprieved_files_flat:
62 self.log.debug(
63 "by-hash: Unscheduled %s for %s in %s for deletion" % (
64- sha256, path, container))
65+ db_file.library_file.content.sha256, db_file.path,
66+ db_file.container))
67
68 # Remove any condemned files from the database whose stay of
69 # execution has elapsed. We ensure that we know about all the
70 # relevant by-hash directory trees before doing any removals so that
71 # we can prune them properly later.
72- for db_file in archive_file_set.getByArchive(
73- self.archive, container=container):
74- by_hashes.registerChild(os.path.dirname(strip_dists(db_file.path)))
75 for container, path, sha256 in archive_file_set.reap(
76 self.archive, container=container):
77- self.log.debug(
78- "by-hash: Deleted %s for %s in %s" % (sha256, path, container))
79+ if (path, sha256) not in uncondemned_files:
80+ self.log.debug(
81+ "by-hash: Deleted %s for %s in %s" %
82+ (sha256, path, container))
83
84 # Ensure that all files recorded in the database are in by-hash.
85 db_files = archive_file_set.getByArchive(
86@@ -1131,12 +1151,13 @@ class Publisher(object):
87 if db_file.library_file.content.sha256 != current_sha256:
88 condemned_files.add(db_file)
89 if condemned_files:
90- for container, path, sha256 in archive_file_set.scheduleDeletion(
91- condemned_files,
92- timedelta(days=BY_HASH_STAY_OF_EXECUTION)):
93+ archive_file_set.scheduleDeletion(
94+ condemned_files, timedelta(days=BY_HASH_STAY_OF_EXECUTION))
95+ for db_file in condemned_files:
96 self.log.debug(
97 "by-hash: Scheduled %s for %s in %s for deletion" % (
98- sha256, path, container))
99+ db_file.library_file.content.sha256, db_file.path,
100+ db_file.container))
101
102 # Ensure that all the current index files are in by-hash and have
103 # corresponding database entries.
104diff --git a/lib/lp/archivepublisher/tests/test_publisher.py b/lib/lp/archivepublisher/tests/test_publisher.py
105index c60eccc..5308e12 100644
106--- a/lib/lp/archivepublisher/tests/test_publisher.py
107+++ b/lib/lp/archivepublisher/tests/test_publisher.py
108@@ -21,8 +21,6 @@ from fnmatch import fnmatch
109 from functools import partial
110 import gzip
111 import hashlib
112-from itertools import product
113-from operator import attrgetter
114 import os
115 import shutil
116 import stat
117@@ -47,7 +45,6 @@ from testtools.matchers import (
118 LessThan,
119 Matcher,
120 MatchesDict,
121- MatchesListwise,
122 MatchesSetwise,
123 MatchesStructure,
124 Not,
125@@ -2649,12 +2646,12 @@ class TestUpdateByHash(TestPublisherBase):
126 publisher.D_writeReleaseFiles(False)
127
128 @classmethod
129- def _makeScheduledDeletionDateMatcher(cls, condemned_at):
130- if condemned_at is None:
131+ def _makeScheduledDeletionDateMatcher(cls, superseded_at):
132+ if superseded_at is None:
133 return Is(None)
134 else:
135 return Equals(
136- condemned_at + timedelta(days=BY_HASH_STAY_OF_EXECUTION))
137+ superseded_at + timedelta(days=BY_HASH_STAY_OF_EXECUTION))
138
139 def assertHasSuiteFiles(self, patterns, *properties):
140 def is_interesting(path):
141@@ -2668,11 +2665,13 @@ class TestUpdateByHash(TestPublisherBase):
142 self.ubuntutest.main_archive)
143 if is_interesting(archive_file.path)]
144 matchers = []
145- for path, condemned_at in properties:
146+ for path, created_at, superseded_at in properties:
147 matchers.append(MatchesStructure(
148 path=Equals('dists/breezy-autotest/%s' % path),
149+ date_created=Equals(created_at),
150+ date_superseded=Equals(superseded_at),
151 scheduled_deletion_date=self._makeScheduledDeletionDateMatcher(
152- condemned_at)))
153+ superseded_at)))
154 self.assertThat(files, MatchesSetwise(*matchers))
155
156 def test_disabled(self):
157@@ -2822,7 +2821,8 @@ class TestUpdateByHash(TestPublisherBase):
158 flush_database_caches()
159 self.assertHasSuiteFiles(
160 ('Contents-*', 'Release'),
161- ('Contents-i386', None), ('Release', None))
162+ ('Contents-i386', self.times[0], None),
163+ ('Release', self.times[0], None))
164 releases = [get_release_contents()]
165 self.assertThat(
166 suite_path('by-hash'),
167@@ -2836,8 +2836,10 @@ class TestUpdateByHash(TestPublisherBase):
168 flush_database_caches()
169 self.assertHasSuiteFiles(
170 ('Contents-*', 'Release'),
171- ('Contents-i386', None), ('Contents-hppa', None),
172- ('Release', self.times[1]), ('Release', None))
173+ ('Contents-i386', self.times[0], None),
174+ ('Contents-hppa', self.times[1], None),
175+ ('Release', self.times[0], self.times[1]),
176+ ('Release', self.times[1], None))
177 releases.append(get_release_contents())
178 self.assertThat(
179 suite_path('by-hash'),
180@@ -2850,9 +2852,11 @@ class TestUpdateByHash(TestPublisherBase):
181 flush_database_caches()
182 self.assertHasSuiteFiles(
183 ('Contents-*', 'Release'),
184- ('Contents-i386', self.times[2]), ('Contents-hppa', None),
185- ('Release', self.times[1]), ('Release', self.times[2]),
186- ('Release', None))
187+ ('Contents-i386', self.times[0], self.times[2]),
188+ ('Contents-hppa', self.times[1], None),
189+ ('Release', self.times[0], self.times[1]),
190+ ('Release', self.times[1], self.times[2]),
191+ ('Release', self.times[2], None))
192 releases.append(get_release_contents())
193 self.assertThat(
194 suite_path('by-hash'),
195@@ -2864,9 +2868,12 @@ class TestUpdateByHash(TestPublisherBase):
196 flush_database_caches()
197 self.assertHasSuiteFiles(
198 ('Contents-*', 'Release'),
199- ('Contents-i386', self.times[2]), ('Contents-hppa', None),
200- ('Release', self.times[1]), ('Release', self.times[2]),
201- ('Release', self.times[3]), ('Release', None))
202+ ('Contents-i386', self.times[0], self.times[2]),
203+ ('Contents-hppa', self.times[1], None),
204+ ('Release', self.times[0], self.times[1]),
205+ ('Release', self.times[1], self.times[2]),
206+ ('Release', self.times[2], self.times[3]),
207+ ('Release', self.times[3], None))
208 releases.append(get_release_contents())
209 self.assertThat(
210 suite_path('by-hash'),
211@@ -2885,9 +2892,10 @@ class TestUpdateByHash(TestPublisherBase):
212 flush_database_caches()
213 self.assertHasSuiteFiles(
214 ('Contents-*', 'Release'),
215- ('Contents-hppa', self.times[4]),
216- ('Release', self.times[3]), ('Release', self.times[4]),
217- ('Release', None))
218+ ('Contents-hppa', self.times[1], self.times[4]),
219+ ('Release', self.times[2], self.times[3]),
220+ ('Release', self.times[3], self.times[4]),
221+ ('Release', self.times[4], None))
222 releases.append(get_release_contents())
223 self.assertThat(
224 suite_path('by-hash'),
225@@ -2904,7 +2912,8 @@ class TestUpdateByHash(TestPublisherBase):
226 flush_database_caches()
227 self.assertHasSuiteFiles(
228 ('Contents-*', 'Release'),
229- ('Release', self.times[5]), ('Release', None))
230+ ('Release', self.times[4], self.times[5]),
231+ ('Release', self.times[5], None))
232 releases.append(get_release_contents())
233 self.assertThat(suite_path('by-hash'), ByHashHasContents(releases[4:]))
234
235@@ -2931,9 +2940,13 @@ class TestUpdateByHash(TestPublisherBase):
236 for name in ('Release', 'Sources'):
237 with open(suite_path('main', 'source', name), 'rb') as f:
238 main_contents.add(f.read())
239+ self.assertHasSuiteFiles(
240+ ('main/source/Sources',),
241+ ('main/source/Sources', self.times[0], None))
242
243 # Add a source package so that Sources is non-empty.
244 pub_source = self.getPubSource(filecontent='Source: foo\n')
245+ self.advanceTime(delta=timedelta(hours=1))
246 self.runSteps(publisher, step_a=True, step_c=True, step_d=True)
247 transaction.commit()
248 with open(suite_path('main', 'source', 'Sources'), 'rb') as f:
249@@ -2942,28 +2955,42 @@ class TestUpdateByHash(TestPublisherBase):
250 self.assertThat(
251 suite_path('main', 'source', 'by-hash'),
252 ByHashHasContents(main_contents))
253-
254- # Make the empty Sources file ready to prune.
255- self.advanceTime(
256- delta=timedelta(days=BY_HASH_STAY_OF_EXECUTION, hours=1))
257+ self.assertHasSuiteFiles(
258+ ('main/source/Sources',),
259+ ('main/source/Sources', self.times[0], self.times[1]),
260+ ('main/source/Sources', self.times[1], None))
261
262 # Delete the source package so that Sources is empty again. The
263- # empty file is reprieved and the non-empty one is condemned.
264+ # empty file is reprieved (by creating a new ArchiveFile referring
265+ # to it) and the non-empty one is condemned.
266 pub_source.requestDeletion(self.ubuntutest.owner)
267+ self.advanceTime(delta=timedelta(hours=1))
268 self.runSteps(publisher, step_a=True, step_c=True, step_d=True)
269 transaction.commit()
270 self.assertThat(
271 suite_path('main', 'source', 'by-hash'),
272 ByHashHasContents(main_contents))
273- archive_files = getUtility(IArchiveFileSet).getByArchive(
274- self.ubuntutest.main_archive,
275- path='dists/breezy-autotest/main/source/Sources')
276+ self.assertHasSuiteFiles(
277+ ('main/source/Sources',),
278+ ('main/source/Sources', self.times[0], self.times[1]),
279+ ('main/source/Sources', self.times[1], self.times[2]),
280+ ('main/source/Sources', self.times[2], None))
281+
282+ # Make the first empty Sources file ready to prune. This doesn't
283+ # change the set of files on disk, because there's still a newer
284+ # reference to the empty file.
285+ self.advanceTime(
286+ absolute=self.times[1] + timedelta(
287+ days=BY_HASH_STAY_OF_EXECUTION, minutes=30))
288+ self.runSteps(publisher, step_a=True, step_c=True, step_d=True)
289+ transaction.commit()
290 self.assertThat(
291- sorted(archive_files, key=attrgetter('id')),
292- MatchesListwise([
293- MatchesStructure(scheduled_deletion_date=Is(None)),
294- MatchesStructure(scheduled_deletion_date=Not(Is(None))),
295- ]))
296+ suite_path('main', 'source', 'by-hash'),
297+ ByHashHasContents(main_contents))
298+ self.assertHasSuiteFiles(
299+ ('main/source/Sources',),
300+ ('main/source/Sources', self.times[1], self.times[2]),
301+ ('main/source/Sources', self.times[2], None))
302
303 def setUpPruneableSuite(self):
304 self.setUpMockTime()
305@@ -2992,14 +3019,18 @@ class TestUpdateByHash(TestPublisherBase):
306 # We have two condemned sets of index files and one uncondemned set.
307 # main/source/Release contains a small enough amount of information
308 # that it doesn't change.
309- expected_suite_files = (
310- list(product(
311- ('main/source/Sources.gz', 'main/source/Sources.bz2',
312- 'Release'),
313- (self.times[1], self.times[2], None))) +
314- [('main/source/Release', None)])
315 self.assertHasSuiteFiles(
316- ('main/source/*', 'Release'), *expected_suite_files)
317+ ('main/source/*', 'Release'),
318+ ('main/source/Sources.gz', self.times[0], self.times[1]),
319+ ('main/source/Sources.gz', self.times[1], self.times[2]),
320+ ('main/source/Sources.gz', self.times[2], None),
321+ ('main/source/Sources.bz2', self.times[0], self.times[1]),
322+ ('main/source/Sources.bz2', self.times[1], self.times[2]),
323+ ('main/source/Sources.bz2', self.times[2], None),
324+ ('main/source/Release', self.times[0], None),
325+ ('Release', self.times[0], self.times[1]),
326+ ('Release', self.times[1], self.times[2]),
327+ ('Release', self.times[2], None))
328 self.assertThat(suite_path('by-hash'), ByHashHasContents(top_contents))
329 self.assertThat(
330 suite_path('main', 'source', 'by-hash'),
331@@ -3032,14 +3063,15 @@ class TestUpdateByHash(TestPublisherBase):
332 self.assertEqual(set(), publisher.dirty_pockets)
333 # The condemned index files are removed, and no new Release file is
334 # generated.
335- expected_suite_files = (
336- list(product(
337- ('main/source/Sources.gz', 'main/source/Sources.bz2'),
338- (self.times[2], None))) +
339- [('main/source/Release', None),
340- ('Release', self.times[2]), ('Release', None)])
341 self.assertHasSuiteFiles(
342- ('main/source/*', 'Release'), *expected_suite_files)
343+ ('main/source/*', 'Release'),
344+ ('main/source/Sources.gz', self.times[1], self.times[2]),
345+ ('main/source/Sources.gz', self.times[2], None),
346+ ('main/source/Sources.bz2', self.times[1], self.times[2]),
347+ ('main/source/Sources.bz2', self.times[2], None),
348+ ('main/source/Release', self.times[0], None),
349+ ('Release', self.times[1], self.times[2]),
350+ ('Release', self.times[2], None))
351 self.assertThat(suite_path('by-hash'), ByHashHasContents(top_contents))
352 self.assertThat(
353 suite_path('main', 'source', 'by-hash'),
354@@ -3066,14 +3098,15 @@ class TestUpdateByHash(TestPublisherBase):
355 self.assertEqual(release_mtime, os.stat(release_path).st_mtime)
356 # The condemned index files are removed, and no new Release file is
357 # generated.
358- expected_suite_files = (
359- list(product(
360- ('main/source/Sources.gz', 'main/source/Sources.bz2'),
361- (self.times[2], None))) +
362- [('main/source/Release', None),
363- ('Release', self.times[2]), ('Release', None)])
364 self.assertHasSuiteFiles(
365- ('main/source/*', 'Release'), *expected_suite_files)
366+ ('main/source/*', 'Release'),
367+ ('main/source/Sources.gz', self.times[1], self.times[2]),
368+ ('main/source/Sources.gz', self.times[2], None),
369+ ('main/source/Sources.bz2', self.times[1], self.times[2]),
370+ ('main/source/Sources.bz2', self.times[2], None),
371+ ('main/source/Release', self.times[0], None),
372+ ('Release', self.times[1], self.times[2]),
373+ ('Release', self.times[2], None))
374 self.assertThat(suite_path('by-hash'), ByHashHasContents(top_contents))
375 self.assertThat(
376 suite_path('main', 'source', 'by-hash'),
377diff --git a/lib/lp/soyuz/interfaces/archivefile.py b/lib/lp/soyuz/interfaces/archivefile.py
378index 005a599..e81419f 100644
379--- a/lib/lp/soyuz/interfaces/archivefile.py
380+++ b/lib/lp/soyuz/interfaces/archivefile.py
381@@ -1,4 +1,4 @@
382-# Copyright 2016 Canonical Ltd. This software is licensed under the
383+# Copyright 2016-2018 Canonical Ltd. This software is licensed under the
384 # GNU Affero General Public License version 3 (see the file LICENSE).
385
386 """Interface for a file in an archive."""
387@@ -49,6 +49,16 @@ class IArchiveFile(Interface):
388 title=_("The index file in the librarian."),
389 schema=ILibraryFileAlias, required=True, readonly=True)
390
391+ date_created = Datetime(
392+ title=_("The date when this file was created."),
393+ # XXX cjwatson 2018-04-17: Should be required=True, but we need to
394+ # backfill existing rows first.
395+ required=False, readonly=False)
396+
397+ date_superseded = Datetime(
398+ title=_("The date when this file was scheduled for future deletion."),
399+ required=False, readonly=False)
400+
401 scheduled_deletion_date = Datetime(
402 title=_("The date when this file should stop being published."),
403 required=False, readonly=False)
404@@ -79,15 +89,12 @@ class IArchiveFileSet(Interface):
405 :param content_type: The MIME type of the file.
406 """
407
408- def getByArchive(archive, container=None, path=None, only_condemned=False,
409- eager_load=False):
410+ def getByArchive(archive, container=None, path=None, eager_load=False):
411 """Get files in an archive.
412
413 :param archive: Return files in this `IArchive`.
414 :param container: Return only files with this container.
415 :param path: Return only files with this path.
416- :param only_condemned: If True, return only files with a
417- scheduled_deletion_date set.
418 :param eager_load: If True, preload related `LibraryFileAlias` and
419 `LibraryFileContent` rows.
420 :return: An iterable of matched files.
421@@ -99,8 +106,6 @@ class IArchiveFileSet(Interface):
422 :param archive_files: The `IArchiveFile`s to schedule for deletion.
423 :param stay_of_execution: A `timedelta`; schedule files for deletion
424 this amount of time in the future.
425- :return: An iterable of (container, path, sha256) for files that
426- were scheduled for deletion.
427 """
428
429 def unscheduleDeletion(archive_files):
430@@ -110,8 +115,6 @@ class IArchiveFileSet(Interface):
431 identical to a version that was previously condemned.
432
433 :param archive_files: The `IArchiveFile`s to unschedule for deletion.
434- :return: An iterable of (container, path, sha256) for files that
435- were unscheduled for deletion.
436 """
437
438 def getContainersToReap(archive, container_prefix=None):
439diff --git a/lib/lp/soyuz/model/archivefile.py b/lib/lp/soyuz/model/archivefile.py
440index 50d66ba..530fe66 100644
441--- a/lib/lp/soyuz/model/archivefile.py
442+++ b/lib/lp/soyuz/model/archivefile.py
443@@ -14,7 +14,6 @@ __all__ = [
444 import os.path
445
446 import pytz
447-from storm.databases.postgres import Returning
448 from storm.locals import (
449 And,
450 DateTime,
451@@ -26,7 +25,10 @@ from storm.locals import (
452 from zope.component import getUtility
453 from zope.interface import implementer
454
455-from lp.services.database.bulk import load_related
456+from lp.services.database.bulk import (
457+ create,
458+ load_related,
459+ )
460 from lp.services.database.constants import UTC_NOW
461 from lp.services.database.decoratedresultset import DecoratedResultSet
462 from lp.services.database.interfaces import (
463@@ -34,7 +36,6 @@ from lp.services.database.interfaces import (
464 IStore,
465 )
466 from lp.services.database.sqlbase import convert_storm_clause_to_string
467-from lp.services.database.stormexpr import BulkUpdate
468 from lp.services.librarian.interfaces import ILibraryFileAliasSet
469 from lp.services.librarian.model import (
470 LibraryFileAlias,
471@@ -46,6 +47,15 @@ from lp.soyuz.interfaces.archivefile import (
472 )
473
474
475+def _now():
476+ """Get the current transaction timestamp.
477+
478+ Tests can override this with a Storm expression or a `datetime` to
479+ simulate time changes.
480+ """
481+ return UTC_NOW
482+
483+
484 @implementer(IArchiveFile)
485 class ArchiveFile(Storm):
486 """See `IArchiveFile`."""
487@@ -64,6 +74,14 @@ class ArchiveFile(Storm):
488 library_file_id = Int(name='library_file', allow_none=False)
489 library_file = Reference(library_file_id, 'LibraryFileAlias.id')
490
491+ date_created = DateTime(
492+ # XXX cjwatson 2018-04-17: Should be allow_none=False, but we need
493+ # to backfill existing rows first.
494+ name='date_created', tzinfo=pytz.UTC, allow_none=True)
495+
496+ date_superseded = DateTime(
497+ name='date_superseded', tzinfo=pytz.UTC, allow_none=True)
498+
499 scheduled_deletion_date = DateTime(
500 name='scheduled_deletion_date', tzinfo=pytz.UTC, allow_none=True)
501
502@@ -74,18 +92,11 @@ class ArchiveFile(Storm):
503 self.container = container
504 self.path = path
505 self.library_file = library_file
506+ self.date_created = _now()
507+ self.date_superseded = None
508 self.scheduled_deletion_date = None
509
510
511-def _now():
512- """Get the current transaction timestamp.
513-
514- Tests can override this with a Storm expression or a `datetime` to
515- simulate time changes.
516- """
517- return UTC_NOW
518-
519-
520 @implementer(IArchiveFileSet)
521 class ArchiveFileSet:
522 """See `IArchiveFileSet`."""
523@@ -106,8 +117,7 @@ class ArchiveFileSet:
524 return cls.new(archive, container, path, library_file)
525
526 @staticmethod
527- def getByArchive(archive, container=None, path=None, only_condemned=False,
528- eager_load=False):
529+ def getByArchive(archive, container=None, path=None, eager_load=False):
530 """See `IArchiveFileSet`."""
531 clauses = [ArchiveFile.archive == archive]
532 # XXX cjwatson 2016-03-15: We'll need some more sophisticated way to
533@@ -116,8 +126,6 @@ class ArchiveFileSet:
534 clauses.append(ArchiveFile.container == container)
535 if path is not None:
536 clauses.append(ArchiveFile.path == path)
537- if only_condemned:
538- clauses.append(ArchiveFile.scheduled_deletion_date != None)
539 archive_files = IStore(ArchiveFile).find(ArchiveFile, *clauses)
540
541 def eager_load(rows):
542@@ -132,41 +140,23 @@ class ArchiveFileSet:
543 @staticmethod
544 def scheduleDeletion(archive_files, stay_of_execution):
545 """See `IArchiveFileSet`."""
546- clauses = [
547- ArchiveFile.id.is_in(
548- set(archive_file.id for archive_file in archive_files)),
549- ArchiveFile.library_file == LibraryFileAlias.id,
550- LibraryFileAlias.content == LibraryFileContent.id,
551- ]
552- new_date = _now() + stay_of_execution
553- return_columns = [
554- ArchiveFile.container, ArchiveFile.path, LibraryFileContent.sha256]
555- return list(IMasterStore(ArchiveFile).execute(Returning(
556- BulkUpdate(
557- {ArchiveFile.scheduled_deletion_date: new_date},
558- table=ArchiveFile,
559- values=[LibraryFileAlias, LibraryFileContent],
560- where=And(*clauses)),
561- columns=return_columns)))
562+ rows = IMasterStore(ArchiveFile).find(
563+ ArchiveFile, ArchiveFile.id.is_in(
564+ set(archive_file.id for archive_file in archive_files)))
565+ rows.set(
566+ date_superseded=_now(),
567+ scheduled_deletion_date=_now() + stay_of_execution)
568
569 @staticmethod
570 def unscheduleDeletion(archive_files):
571 """See `IArchiveFileSet`."""
572- clauses = [
573- ArchiveFile.id.is_in(
574- set(archive_file.id for archive_file in archive_files)),
575- ArchiveFile.library_file == LibraryFileAlias.id,
576- LibraryFileAlias.content == LibraryFileContent.id,
577- ]
578- return_columns = [
579- ArchiveFile.container, ArchiveFile.path, LibraryFileContent.sha256]
580- return list(IMasterStore(ArchiveFile).execute(Returning(
581- BulkUpdate(
582- {ArchiveFile.scheduled_deletion_date: None},
583- table=ArchiveFile,
584- values=[LibraryFileAlias, LibraryFileContent],
585- where=And(*clauses)),
586- columns=return_columns)))
587+ create(
588+ (ArchiveFile.archive, ArchiveFile.container, ArchiveFile.path,
589+ ArchiveFile.library_file, ArchiveFile.date_created,
590+ ArchiveFile.date_superseded, ArchiveFile.scheduled_deletion_date),
591+ [(archive_file.archive, archive_file.container, archive_file.path,
592+ archive_file.library_file, _now(), None, None)
593+ for archive_file in archive_files])
594
595 @staticmethod
596 def getContainersToReap(archive, container_prefix=None):
597diff --git a/lib/lp/soyuz/tests/test_archivefile.py b/lib/lp/soyuz/tests/test_archivefile.py
598index 039fec6..d1dea99 100644
599--- a/lib/lp/soyuz/tests/test_archivefile.py
600+++ b/lib/lp/soyuz/tests/test_archivefile.py
601@@ -1,4 +1,4 @@
602-# Copyright 2016 Canonical Ltd. This software is licensed under the
603+# Copyright 2016-2018 Canonical Ltd. This software is licensed under the
604 # GNU Affero General Public License version 3 (see the file LICENSE).
605
606 """ArchiveFile tests."""
607@@ -11,6 +11,13 @@ from datetime import timedelta
608 import os
609
610 from storm.store import Store
611+from testtools.matchers import (
612+ AfterPreprocessing,
613+ Equals,
614+ Is,
615+ MatchesSetwise,
616+ MatchesStructure,
617+ )
618 import transaction
619 from zope.component import getUtility
620 from zope.security.proxy import removeSecurityProxy
621@@ -25,6 +32,14 @@ from lp.testing import TestCaseWithFactory
622 from lp.testing.layers import LaunchpadZopelessLayer
623
624
625+def read_library_file(library_file):
626+ library_file.open()
627+ try:
628+ return library_file.read()
629+ finally:
630+ library_file.close()
631+
632+
633 class TestArchiveFile(TestCaseWithFactory):
634
635 layer = LaunchpadZopelessLayer
636@@ -34,11 +49,15 @@ class TestArchiveFile(TestCaseWithFactory):
637 library_file = self.factory.makeLibraryFileAlias()
638 archive_file = getUtility(IArchiveFileSet).new(
639 archive, "foo", "dists/foo", library_file)
640- self.assertEqual(archive, archive_file.archive)
641- self.assertEqual("foo", archive_file.container)
642- self.assertEqual("dists/foo", archive_file.path)
643- self.assertEqual(library_file, archive_file.library_file)
644- self.assertIsNone(archive_file.scheduled_deletion_date)
645+ self.assertThat(archive_file, MatchesStructure(
646+ archive=Equals(archive),
647+ container=Equals("foo"),
648+ path=Equals("dists/foo"),
649+ library_file=Equals(library_file),
650+ date_created=Equals(
651+ get_transaction_timestamp(Store.of(archive_file))),
652+ date_superseded=Is(None),
653+ scheduled_deletion_date=Is(None)))
654
655 def test_newFromFile(self):
656 root = self.makeTemporaryDirectory()
657@@ -48,24 +67,24 @@ class TestArchiveFile(TestCaseWithFactory):
658 with open(os.path.join(root, "dists/foo"), "rb") as f:
659 archive_file = getUtility(IArchiveFileSet).newFromFile(
660 archive, "foo", "dists/foo", f, 4, "text/plain")
661+ now = get_transaction_timestamp(Store.of(archive_file))
662 transaction.commit()
663- self.assertEqual(archive, archive_file.archive)
664- self.assertEqual("foo", archive_file.container)
665- self.assertEqual("dists/foo", archive_file.path)
666- archive_file.library_file.open()
667- try:
668- self.assertEqual("abc\n", archive_file.library_file.read())
669- finally:
670- archive_file.library_file.close()
671- self.assertIsNone(archive_file.scheduled_deletion_date)
672+ self.assertThat(archive_file, MatchesStructure(
673+ archive=Equals(archive),
674+ container=Equals("foo"),
675+ path=Equals("dists/foo"),
676+ library_file=AfterPreprocessing(
677+ read_library_file, Equals("abc\n")),
678+ date_created=Equals(now),
679+ date_superseded=Is(None),
680+ scheduled_deletion_date=Is(None)))
681
682 def test_getByArchive(self):
683 archives = [self.factory.makeArchive(), self.factory.makeArchive()]
684 archive_files = []
685- now = get_transaction_timestamp(Store.of(archives[0]))
686 for archive in archives:
687 archive_files.append(self.factory.makeArchiveFile(
688- archive=archive, scheduled_deletion_date=now))
689+ archive=archive))
690 archive_files.append(self.factory.makeArchiveFile(
691 archive=archive, container="foo"))
692 archive_file_set = getUtility(IArchiveFileSet)
693@@ -83,9 +102,6 @@ class TestArchiveFile(TestCaseWithFactory):
694 self.assertContentEqual(
695 [], archive_file_set.getByArchive(archives[0], path="other"))
696 self.assertContentEqual(
697- [archive_files[0]],
698- archive_file_set.getByArchive(archives[0], only_condemned=True))
699- self.assertContentEqual(
700 archive_files[2:], archive_file_set.getByArchive(archives[1]))
701 self.assertContentEqual(
702 [archive_files[3]],
703@@ -98,19 +114,11 @@ class TestArchiveFile(TestCaseWithFactory):
704 archives[1], path=archive_files[3].path))
705 self.assertContentEqual(
706 [], archive_file_set.getByArchive(archives[1], path="other"))
707- self.assertContentEqual(
708- [archive_files[2]],
709- archive_file_set.getByArchive(archives[1], only_condemned=True))
710
711 def test_scheduleDeletion(self):
712 archive_files = [self.factory.makeArchiveFile() for _ in range(3)]
713- expected_rows = [
714- (archive_file.container, archive_file.path,
715- archive_file.library_file.content.sha256)
716- for archive_file in archive_files[:2]]
717- rows = getUtility(IArchiveFileSet).scheduleDeletion(
718+ getUtility(IArchiveFileSet).scheduleDeletion(
719 archive_files[:2], timedelta(days=1))
720- self.assertContentEqual(expected_rows, rows)
721 flush_database_caches()
722 tomorrow = (
723 get_transaction_timestamp(Store.of(archive_files[0])) +
724@@ -124,17 +132,32 @@ class TestArchiveFile(TestCaseWithFactory):
725 now = get_transaction_timestamp(Store.of(archive_files[0]))
726 for archive_file in archive_files:
727 removeSecurityProxy(archive_file).scheduled_deletion_date = now
728- expected_rows = [
729- (archive_file.container, archive_file.path,
730- archive_file.library_file.content.sha256)
731- for archive_file in archive_files[:2]]
732- rows = getUtility(IArchiveFileSet).unscheduleDeletion(
733- archive_files[:2])
734- self.assertContentEqual(expected_rows, rows)
735+ archive_file_set = getUtility(IArchiveFileSet)
736+ archive_file_set.unscheduleDeletion(archive_files[:2])
737 flush_database_caches()
738- self.assertIsNone(archive_files[0].scheduled_deletion_date)
739- self.assertIsNone(archive_files[1].scheduled_deletion_date)
740- self.assertEqual(now, archive_files[2].scheduled_deletion_date)
741+ self.assertThat(
742+ archive_file_set.getByArchive(
743+ archive_files[0].archive,
744+ container=archive_files[0].container,
745+ path=archive_files[0].path),
746+ MatchesSetwise(
747+ MatchesStructure(scheduled_deletion_date=Equals(now)),
748+ MatchesStructure(scheduled_deletion_date=Is(None))))
749+ self.assertThat(
750+ archive_file_set.getByArchive(
751+ archive_files[1].archive,
752+ container=archive_files[1].container,
753+ path=archive_files[1].path),
754+ MatchesSetwise(
755+ MatchesStructure(scheduled_deletion_date=Equals(now)),
756+ MatchesStructure(scheduled_deletion_date=Is(None))))
757+ self.assertThat(
758+ archive_file_set.getByArchive(
759+ archive_files[2].archive,
760+ container=archive_files[2].container,
761+ path=archive_files[2].path),
762+ MatchesSetwise(
763+ MatchesStructure(scheduled_deletion_date=Equals(now))))
764
765 def test_getContainersToReap(self):
766 archive = self.factory.makeArchive()

Subscribers

People subscribed via source and target branches

to status/vote changes: