Merge lp:~cprov/launchpad/bug-430552-unblock-death-row into lp:launchpad/db-devel

Proposed by Celso Providelo
Status: Merged
Merged at revision: not available
Proposed branch: lp:~cprov/launchpad/bug-430552-unblock-death-row
Merge into: lp:launchpad/db-devel
Diff against target: 261 lines
4 files modified
lib/lp/archivepublisher/deathrow.py (+10/-5)
lib/lp/archivepublisher/diskpool.py (+10/-4)
lib/lp/archivepublisher/tests/test_deathrow.py (+155/-0)
lib/lp/soyuz/interfaces/publishing.py (+10/-0)
To merge this branch: bzr merge lp:~cprov/launchpad/bug-430552-unblock-death-row
Reviewer Review Type Date Requested Status
Brad Crittenden (community) release-critical Approve
Michael Nelson (community) code Approve
Review via email: mp+12346@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Celso Providelo (cprov) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

= Summary =

This branch fixes https://bugs.edge.launchpad.net/bugs/430552.

The is a issue currently blocking `process-death-row` (p-d-r) in
production and increasing its backlog of removal-candidates, since
it's not able to remove files from pool/

After an investigation we discovered that p-d-r is failing due
because it trying to remove files that do not exist anymore.

The missing files (symlinks, in fact) could be gone by the following
reasons:

 * Someone butter-fingered the ubuntu archive disk trying to fix
   something else (only remotely possible, there are many broken
   files)

 * The last effective run of p-d-r was aborted during the DB update
   (quite possible, disk operations and updates are not atomic)

== Proposed fix ==

The fix for this problem is to deal with 'missing symlinks' as we deal
with 'missing files'.

It will be an *attempt* to remove what was already removed, somehow,
so should carry on and update the DB records as if we have
successfully removed the files from disk.

== Tests ==

./bin/test -vv -t TestDeathRow -t deathrow.txt

== Demo and Q/A ==

The best! Code is cowboyed in production and dealing with 200k records
backlog (ETA 5h).

= Launchpad lint =

Checking for conflicts. and issues in doctests and templates.
Running jslint, xmllint, pyflakes, and pylint.
Using normal rules.

Linting changed files:
  lib/lp/archivepublisher/deathrow.py
  lib/lp/soyuz/interfaces/publishing.py
  lib/lp/archivepublisher/diskpool.py
  lib/lp/archivepublisher/tests/test_deathrow.py
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkq7ZvgACgkQ7KBXuXyZSjDw9ACgrP9P0ceJ38CB75+hX8GkmP0/
kogAnjmB5M46LfyxrUWwvNwVZSgMlQ9U
=xq9C
-----END PGP SIGNATURE-----

Revision history for this message
Michael Nelson (michael.nelson) wrote :
Download full text (12.7 KiB)

Great work Celso - you make complex things easy to understand :)
r=me

> = Summary =
>
> This branch fixes https://bugs.edge.launchpad.net/bugs/430552.
>
> The is a issue currently blocking `process-death-row` (p-d-r) in
> production and increasing its backlog of removal-candidates, since
> it's not able to remove files from pool/
>
> After an investigation we discovered that p-d-r is failing due
> because it trying to remove files that do not exist anymore.
>
> The missing files (symlinks, in fact) could be gone by the following
> reasons:
>
> * Someone butter-fingered the ubuntu archive disk trying to fix
> something else (only remotely possible, there are many broken
> files)
>
> * The last effective run of p-d-r was aborted during the DB update
> (quite possible, disk operations and updates are not atomic)
>

Thanks for the detailed explanation - I'd been wondering about this
issue while seeing you and Julian chatting about it.

It's not possible that this could have happened while converting the
privacy of ppas? I guess you would already know that - just a thought.

>
> == Proposed fix ==
>
> The fix for this problem is to deal with 'missing symlinks' as we deal
> with 'missing files'.
>
> It will be an *attempt* to remove what was already removed, somehow,
> so should carry on and update the DB records as if we have
> successfully removed the files from disk.

Great.

>
> == Tests ==
>
> ./bin/test -vv -t TestDeathRow -t deathrow.txt
>
> == Demo and Q/A ==
>
> The best! Code is cowboyed in production and dealing with 200k records
> backlog (ETA 5h).

ouch. It can't even be tested on df? Can't we mess with df and delete
some symlinks that p-d-r expects?

>
> = Launchpad lint =
>
> Checking for conflicts. and issues in doctests and templates.
> Running jslint, xmllint, pyflakes, and pylint.
> Using normal rules.
>
> Linting changed files:
> lib/lp/archivepublisher/deathrow.py
> lib/lp/soyuz/interfaces/publishing.py
> lib/lp/archivepublisher/diskpool.py
> lib/lp/archivepublisher/tests/test_deathrow.py

> === modified file 'lib/lp/archivepublisher/deathrow.py'
> --- lib/lp/archivepublisher/deathrow.py 2009-06-24 23:28:16 +0000
> +++ lib/lp/archivepublisher/deathrow.py 2009-09-24 12:54:03 +0000
> @@ -23,7 +23,8 @@
> from lp.soyuz.interfaces.publishing import (
> ISecureBinaryPackagePublishingHistory,
> ISecureSourcePackagePublishingHistory)
> -from canonical.launchpad.interfaces import NotInPool
> +from lp.soyuz.interfaces.publishing import (
> + MissingSymlinkInPool, NotInPool)
>
>
> def getDeathRow(archive, log, pool_root_override):
> @@ -292,12 +293,16 @@
> try:
> bytes += self._removeFile(
> component_name, source_name, file_name)
> - except NotInPool:
> + except NotInPool, info:
> # It's safe for us to let this slide because it means that
> # the file is already gone.
> - self.logger.debug(
> - "File for removing %s %s/%s is not in pool, skipping" %
> - (component_name, source_name, file_name))
> + self.logger....

review: Approve (code)
Revision history for this message
Celso Providelo (cprov) wrote :
Download full text (14.1 KiB)

On Thu, Sep 24, 2009 at 10:51 AM, Michael Nelson
<email address hidden> wrote:
> Review: Approve code
> Great work Celso - you make complex things easy to understand :)
> r=me

Aha! thanks Michael, you always make me few important.

>> = Summary =
>>
>> This branch fixes https://bugs.edge.launchpad.net/bugs/430552.
>>
>> The is a issue currently blocking `process-death-row` (p-d-r) in
>> production and increasing its backlog of removal-candidates, since
>> it's not able to remove files from pool/
>>
>> After an investigation we discovered that p-d-r is failing due
>> because it trying to remove files that do not exist anymore.
>>
>> The missing files (symlinks, in fact) could be gone by the following
>> reasons:
>>
>>  * Someone butter-fingered the ubuntu archive disk trying to fix
>>    something else (only remotely possible, there are many broken
>>    files)
>>
>>  * The last effective run of p-d-r was aborted during the DB update
>>    (quite possible, disk operations and updates are not atomic)
>>
>
> Thanks for the detailed explanation - I'd been wondering about this
> issue while seeing you and Julian chatting about it.
>
> It's not possible that this could have happened while converting the
> privacy of ppas? I guess you would already know that - just a thought.

No I don't think so ... it's certainly some slip in the death-row
workflow, most likely to be the second scenario, when it removed the
symlinks from disk but failed to update the DB.

We have to change the code to make those actions atomic, that will
probably help us to also reduce the transaction time by doing isolated
batches of work.

Future ...

>>
>> == Proposed fix ==
>>
>> The fix for this problem is to deal with 'missing symlinks' as we deal
>> with 'missing files'.
>>
>> It will be an *attempt* to remove what was already removed, somehow,
>> so should carry on and update the DB records as if we have
>> successfully removed the files from disk.
>
> Great.
>
>>
>> == Tests ==
>>
>> ./bin/test -vv -t TestDeathRow -t deathrow.txt
>>
>> == Demo and Q/A ==
>>
>> The best! Code is cowboyed in production and dealing with 200k records
>> backlog (ETA 5h).
>
> ouch. It can't even be tested on df? Can't we mess with df and delete
> some symlinks that p-d-r expects?

Yes, but it would be as artificial as the test-case, we can't reproduce the
huge backlog we have in prod neither 'God knows how many' missing symlinks
in pool. If it wasn't that urgent and complex I would have invested time to
tweak DF.

>>
>> = Launchpad lint =
>>
>> Checking for conflicts. and issues in doctests and templates.
>> Running jslint, xmllint, pyflakes, and pylint.
>> Using normal rules.
>>
>> Linting changed files:
>>   lib/lp/archivepublisher/deathrow.py
>>   lib/lp/soyuz/interfaces/publishing.py
>>   lib/lp/archivepublisher/diskpool.py
>>   lib/lp/archivepublisher/tests/test_deathrow.py
>
>> === modified file 'lib/lp/archivepublisher/deathrow.py'
>> --- lib/lp/archivepublisher/deathrow.py       2009-06-24 23:28:16 +0000
>> +++ lib/lp/archivepublisher/deathrow.py       2009-09-24 12:54:03 +0000
>> @@ -23,7 +23,8 @@
>>  from lp.soyuz.interfaces.publishing import (
>>      ISecureBinaryPackagePu...

Revision history for this message
Brad Crittenden (bac) :
review: Approve (release-critical)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'lib/lp/archivepublisher/deathrow.py'
--- lib/lp/archivepublisher/deathrow.py 2009-06-24 23:28:16 +0000
+++ lib/lp/archivepublisher/deathrow.py 2009-09-24 16:51:13 +0000
@@ -23,7 +23,8 @@
23from lp.soyuz.interfaces.publishing import (23from lp.soyuz.interfaces.publishing import (
24 ISecureBinaryPackagePublishingHistory,24 ISecureBinaryPackagePublishingHistory,
25 ISecureSourcePackagePublishingHistory)25 ISecureSourcePackagePublishingHistory)
26from canonical.launchpad.interfaces import NotInPool26from lp.soyuz.interfaces.publishing import (
27 MissingSymlinkInPool, NotInPool)
2728
2829
29def getDeathRow(archive, log, pool_root_override):30def getDeathRow(archive, log, pool_root_override):
@@ -292,12 +293,16 @@
292 try:293 try:
293 bytes += self._removeFile(294 bytes += self._removeFile(
294 component_name, source_name, file_name)295 component_name, source_name, file_name)
295 except NotInPool:296 except NotInPool, info:
296 # It's safe for us to let this slide because it means that297 # It's safe for us to let this slide because it means that
297 # the file is already gone.298 # the file is already gone.
298 self.logger.debug(299 self.logger.debug(str(info))
299 "File for removing %s %s/%s is not in pool, skipping" %300 except MissingSymlinkInPool, info:
300 (component_name, source_name, file_name))301 # This one is a little more worrying, because an expected
302 # symlink has vanished from the pool/ (could be a code
303 # mistake) but there is nothing we can do about it at this
304 # point.
305 self.logger.warn(str(info))
301306
302 self.logger.info("Total bytes freed: %s" % bytes)307 self.logger.info("Total bytes freed: %s" % bytes)
303308
304309
=== modified file 'lib/lp/archivepublisher/diskpool.py'
--- lib/lp/archivepublisher/diskpool.py 2009-06-24 23:28:16 +0000
+++ lib/lp/archivepublisher/diskpool.py 2009-09-24 16:51:13 +0000
@@ -8,8 +8,8 @@
88
9from lp.archivepublisher import HARDCODED_COMPONENT_ORDER9from lp.archivepublisher import HARDCODED_COMPONENT_ORDER
10from canonical.cachedproperty import cachedproperty10from canonical.cachedproperty import cachedproperty
11from canonical.launchpad.interfaces import (11from lp.soyuz.interfaces.publishing import (
12 NotInPool, PoolFileOverwriteError)12 MissingSymlinkInPool, NotInPool, PoolFileOverwriteError)
13from canonical.librarian.utils import copy_and_close, sha1_from_path13from canonical.librarian.utils import copy_and_close, sha1_from_path
1414
1515
@@ -229,7 +229,10 @@
229 3) Remove the main file and there are symlinks left.229 3) Remove the main file and there are symlinks left.
230 """230 """
231 if not self.file_component:231 if not self.file_component:
232 raise NotInPool()232 raise NotInPool(
233 "File for removing %s %s/%s is not in pool, skipping." %
234 (component, self.source, self.filename))
235
233236
234 # Okay, it's there, if it's a symlink then we need to remove237 # Okay, it's there, if it's a symlink then we need to remove
235 # it simply.238 # it simply.
@@ -242,7 +245,10 @@
242 assert os.path.islink(link_path)245 assert os.path.islink(link_path)
243 return self._reallyRemove(component)246 return self._reallyRemove(component)
244247
245 assert component == self.file_component248 if component != self.file_component:
249 raise MissingSymlinkInPool(
250 "Symlink for %s/%s in %s is missing, skipping." %
251 (self.source, self.filename, component))
246252
247 # It's not a symlink, this means we need to check whether we253 # It's not a symlink, this means we need to check whether we
248 # have symlinks or not.254 # have symlinks or not.
249255
=== added file 'lib/lp/archivepublisher/tests/test_deathrow.py'
--- lib/lp/archivepublisher/tests/test_deathrow.py 1970-01-01 00:00:00 +0000
+++ lib/lp/archivepublisher/tests/test_deathrow.py 2009-09-24 16:51:13 +0000
@@ -0,0 +1,155 @@
1# Copyright 2009 Canonical Ltd. This software is licensed under the
2# GNU Affero General Public License version 3 (see the file LICENSE).
3
4"""Tests for deathrow class."""
5
6__metaclass__ = type
7
8
9import os
10import shutil
11import tempfile
12import unittest
13
14from zope.component import getUtility
15
16from canonical.testing import LaunchpadZopelessLayer
17from canonical.launchpad.scripts.logger import BufferLogger
18
19from lp.archivepublisher.deathrow import DeathRow
20from lp.archivepublisher.diskpool import DiskPool
21from lp.registry.interfaces.distribution import IDistributionSet
22from lp.soyuz.interfaces.component import IComponentSet
23from lp.soyuz.model.publishing import SourcePackagePublishingHistory
24from lp.soyuz.tests.test_publishing import SoyuzTestPublisher
25from lp.testing import TestCase
26
27
28class TestDeathRow(TestCase):
29
30 layer = LaunchpadZopelessLayer
31
32 def getTestPublisher(self, distroseries):
33 """Return an `SoyuzTestPublisher`instance."""
34 stp = SoyuzTestPublisher()
35 stp.addFakeChroots(distroseries)
36 stp.setUpDefaultDistroSeries(distroseries)
37 return stp
38
39 def getDeathRow(self, archive):
40 """Return an `DeathRow` for the given archive.
41
42 Created the temporary 'pool' and 'temp' directories and register
43 a 'cleanup' to purge them after the test runs.
44 """
45 pool_path = tempfile.mkdtemp('-pool')
46 temp_path = tempfile.mkdtemp('-pool-tmp')
47 def clean_pool(pool_path, temp_path):
48 shutil.rmtree(pool_path)
49 shutil.rmtree(temp_path)
50 self.addCleanup(clean_pool, pool_path, temp_path)
51
52 logger = BufferLogger()
53 diskpool = DiskPool(pool_path, temp_path, logger)
54 return DeathRow(archive, diskpool, logger)
55
56 def getDiskPoolPath(self, pub_file, diskpool):
57 """Return the absolute path to a published file in the disk pool/."""
58 return diskpool.pathFor(
59 pub_file.componentname.encode('utf-8'),
60 pub_file.sourcepackagename.encode('utf8'),
61 pub_file.libraryfilealiasfilename.encode('utf-8')
62 )
63
64 def assertIsFile(self, path):
65 """Assert the path exists and is a regular file."""
66 self.assertTrue(
67 os.path.exists(path),
68 "File %s does not exist" % os.path.basename(path))
69 self.assertFalse(
70 os.path.islink(path),
71 "File %s is a symbolic link" % os.path.basename(path))
72
73 def assertIsLink(self, path):
74 """Assert the path exists and is a symbolic link."""
75 self.assertTrue(
76 os.path.exists(path),
77 "File %s does not exist" % os.path.basename(path))
78 self.assertTrue(
79 os.path.islink(path),
80 "File %s is a not symbolic link" % os.path.basename(path))
81
82 def assertDoesNotExist(self, path):
83 """Assert the path does not exit."""
84 self.assertFalse(
85 os.path.exists(path),
86 "File %s exists" % os.path.basename(path))
87
88 def test_MissingSymLinkInPool(self):
89 # When a publication is promoted from 'universe' to 'main' and
90 # the symbolic links expected in 'universe' are not present,
91 # a `MissingSymlinkInPool` error is generated and immediately
92 # ignored by the `DeathRow` processor. Even in this adverse
93 # circumstances the database record (removal candidate) is
94 # updated to match the disk status.
95
96 # Setup an `SoyuzTestPublisher` and a `DeathRow` instance.
97 ubuntu = getUtility(IDistributionSet).getByName('ubuntu')
98 hoary = ubuntu.getSeries('hoary')
99 stp = self.getTestPublisher(hoary)
100 deathrow = self.getDeathRow(hoary.main_archive)
101
102 # Create a source publication with a since file (DSC) in
103 # 'universe' and promote it to 'main'.
104 source_universe = stp.getPubSource(component='universe')
105 secure_record = source_universe.changeOverride(
106 new_component=getUtility(IComponentSet)['main'])
107 source_main = SourcePackagePublishingHistory.get(
108 secure_record.id)
109 test_publications = (source_universe, source_main)
110
111 # Commit for exposing the just-created librarian files.
112 self.layer.commit()
113
114 # Publish the testing publication on disk, the file for the
115 # 'universe' component will be a symbolic link to the one
116 # in 'main'.
117 for pub in test_publications:
118 pub.publish(deathrow.diskpool, deathrow.logger)
119 [main_dsc_path] = [
120 self.getDiskPoolPath(pub_file, deathrow.diskpool)
121 for pub_file in source_main.files]
122 [universe_dsc_path] = [
123 self.getDiskPoolPath(pub_file, deathrow.diskpool)
124 for pub_file in source_universe.files]
125 self.assertIsFile(main_dsc_path)
126 self.assertIsLink(universe_dsc_path)
127
128 # Remove the symbolic link to emulate MissingSymlinkInPool scenario.
129 os.remove(universe_dsc_path)
130
131 # Remove the testing publications.
132 for pub in test_publications:
133 pub.requestObsolescence()
134
135 # Commit for exposing the just-created removal candidates.
136 self.layer.commit()
137
138 # Due to the MissingSymlinkInPool scenario, it takes 2 iteration to
139 # remove both references to the shared file in pool/.
140 deathrow.reap()
141 deathrow.reap()
142
143 for pub in test_publications:
144 self.assertTrue(
145 pub.dateremoved is not None,
146 '%s (%s) is not marked as removed.'
147 % (pub.displayname, pub.component.name))
148
149 self.assertDoesNotExist(main_dsc_path)
150 self.assertDoesNotExist(universe_dsc_path)
151
152
153def test_suite():
154 return unittest.TestLoader().loadTestsFromName(__name__)
155
0156
=== modified file 'lib/lp/soyuz/interfaces/publishing.py'
--- lib/lp/soyuz/interfaces/publishing.py 2009-09-15 09:24:15 +0000
+++ lib/lp/soyuz/interfaces/publishing.py 2009-09-24 16:51:13 +0000
@@ -18,6 +18,7 @@
18 'ISecureSourcePackagePublishingHistory',18 'ISecureSourcePackagePublishingHistory',
19 'ISourcePackageFilePublishing',19 'ISourcePackageFilePublishing',
20 'ISourcePackagePublishingHistory',20 'ISourcePackagePublishingHistory',
21 'MissingSymlinkInPool',
21 'NotInPool',22 'NotInPool',
22 'PackagePublishingPriority',23 'PackagePublishingPriority',
23 'PackagePublishingStatus',24 'PackagePublishingStatus',
@@ -60,6 +61,15 @@
60 requires manual intervention in the archive.61 requires manual intervention in the archive.
61 """62 """
6263
64class MissingSymlinkInPool(Exception):
65 """Raised when there is a missing symlink in pool.
66
67 This condition is ignored, similarly to what we do for `NotInPool`,
68 since the pool entry requested to be removed is not there anymore.
69
70 The corresponding record is marked as removed and the process
71 continues.
72 """
6373
64class PackagePublishingStatus(DBEnumeratedType):74class PackagePublishingStatus(DBEnumeratedType):
65 """Package Publishing Status75 """Package Publishing Status

Subscribers

People subscribed via source and target branches

to status/vote changes: