Merge lp:~michael.nelson/launchpad/982938-dont-load-all-tokens-for-all-ppas-into-memory into lp:launchpad

Proposed by Michael Nelson on 2012-05-30
Status: Merged
Approved by: Aaron Bentley on 2012-05-30
Approved revision: no longer in the source branch.
Merged at revision: 15335
Proposed branch: lp:~michael.nelson/launchpad/982938-dont-load-all-tokens-for-all-ppas-into-memory
Merge into: lp:launchpad
Diff against target: 137 lines (+13/-36)
4 files modified
lib/lp/archivepublisher/htaccess.py (+6/-4)
lib/lp/archivepublisher/scripts/generate_ppa_htaccess.py (+4/-27)
lib/lp/archivepublisher/tests/test_generate_ppa_htaccess.py (+1/-1)
lib/lp/archivepublisher/tests/test_htaccess.py (+2/-4)
To merge this branch: bzr merge lp:~michael.nelson/launchpad/982938-dont-load-all-tokens-for-all-ppas-into-memory
Reviewer Review Type Date Requested Status
Aaron Bentley (community) 2012-05-30 Approve on 2012-05-30
Review via email: mp+107980@code.launchpad.net

Commit Message

Don't load all tokens for all affected apps into memory when updating htpasswd.

Description of the Change

Summary
=======
This branch aims to reduce the load/memory consumption on germanium while generating htpasswd files.

Proposed fix
============

Use a generator to return a storm result set of tokens for each PPA, rather than
loading all tokens for all PPAs into a dict.

It will result in a larger query count. jml is working on a script that will generate data to be able to verify whether this branch does indeed help.

Pre-implementation notes
========================
Referring to https://bugs.launchpad.net/launchpad/+bug/982938/comments/6
12:27 < noodles> jml: if you think the first of those points (in my comment) is very likely to be causing performance issues, I could start coding that straight away...
12:27 < noodles> (given the numbers achuni mentioned above, it would seem likely, but only the measurements will tell).
12:27 < jml> as you say, only measurements will tell :)
12:28 < jml> noodles: I reckon code it up straight away. :)

Implementation details
======================
Sheesh - this is the quick cover letter? :P

LOC Rationale
=============
Yay - 12 lines of credit.

Tests
=====
./bin/test -vvct TestPPAHtaccessTokenGeneration
./bin/test -vvct TestHtpasswdGeneration

Demo and Q/A
============
If this script runs on staging, we could QA by comparing output before and after running it.

Lint
====
Checking for conflicts and issues in changed files.

Linting changed files:
  lib/lp/archivepublisher/scripts/generate_ppa_htaccess.py

To post a comment you must log in.
Aaron Bentley (abentley) :
review: Approve
Aaron Bentley (abentley) wrote :

A nice simplification!

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/lp/archivepublisher/htaccess.py'
2--- lib/lp/archivepublisher/htaccess.py 2011-12-19 23:38:16 +0000
3+++ lib/lp/archivepublisher/htaccess.py 2012-05-30 17:10:23 +0000
4@@ -23,6 +23,7 @@
5 from zope.component import getUtility
6
7 from lp.soyuz.interfaces.archiveauthtoken import IArchiveAuthTokenSet
8+from lp.soyuz.model.archiveauthtoken import ArchiveAuthToken
9
10
11 HTACCESS_TEMPLATE = """
12@@ -79,13 +80,14 @@
13 assert archive.private, "Archive %r must be private" % archive
14
15 if tokens is None:
16- tokens = getUtility(IArchiveAuthTokenSet).getByArchive(archive)
17+ tokens = getUtility(IArchiveAuthTokenSet).getByArchive(
18+ archive).order_by(ArchiveAuthToken.id)
19
20 # The first .htpasswd entry is the buildd_secret.
21 yield (BUILDD_USER_NAME, archive.buildd_secret, BUILDD_USER_NAME[:2])
22
23 # Iterate over tokens and write the appropriate htpasswd
24- # entries for them. Use a consistent sort order so that the
25- # generated file can be compared to an existing one later.
26- for token in sorted(tokens, key=attrgetter("id")):
27+ # entries for them. For consistent sort order, the tokens
28+ # should be ordered by id.
29+ for token in tokens:
30 yield (token.person.name, token.token, token.person.name[:2])
31
32=== modified file 'lib/lp/archivepublisher/scripts/generate_ppa_htaccess.py'
33--- lib/lp/archivepublisher/scripts/generate_ppa_htaccess.py 2012-05-10 21:44:21 +0000
34+++ lib/lp/archivepublisher/scripts/generate_ppa_htaccess.py 2012-05-30 17:10:23 +0000
35@@ -79,11 +79,10 @@
36 write_htaccess(htaccess_filename, pub_config.htaccessroot)
37 self.logger.debug("Created .htaccess for %s" % ppa.displayname)
38
39- def generateHtpasswd(self, ppa, tokens):
40+ def generateHtpasswd(self, ppa):
41 """Generate a htpasswd file for `ppa`s `tokens`.
42
43 :param ppa: The context PPA (an `IArchive`).
44- :param tokens: A iterable containing `IArchiveAuthToken`s.
45 :return: The filename of the htpasswd file that was generated.
46 """
47 # Create a temporary file that will be a new .htpasswd.
48@@ -94,7 +93,7 @@
49 os.close(fd)
50
51 write_htpasswd(
52- temp_filename, htpasswd_credentials_for_archive(ppa, tokens))
53+ temp_filename, htpasswd_credentials_for_archive(ppa))
54
55 return temp_filename
56
57@@ -248,23 +247,6 @@
58
59 return new_ppa_tokens
60
61- def _getValidTokensForPPAs(self, ppas):
62- """Returns a dict keyed by archive with all the tokens for each."""
63- affected_ppas_with_tokens = dict([
64- (ppa, []) for ppa in ppas])
65-
66- store = IStore(ArchiveAuthToken)
67- affected_ppa_tokens = store.find(
68- ArchiveAuthToken,
69- ArchiveAuthToken.date_deactivated == None,
70- ArchiveAuthToken.archive_id.is_in(
71- [ppa.id for ppa in ppas]))
72-
73- for token in affected_ppa_tokens:
74- affected_ppas_with_tokens[token.archive].append(token)
75-
76- return affected_ppas_with_tokens
77-
78 def getNewPrivatePPAs(self):
79 """Return the recently created private PPAs."""
80 # Avoid circular import.
81@@ -296,12 +278,7 @@
82
83 affected_ppas.update(self.getNewPrivatePPAs())
84
85- affected_ppas_with_tokens = {}
86- if affected_ppas:
87- affected_ppas_with_tokens = self._getValidTokensForPPAs(
88- affected_ppas)
89-
90- for ppa, valid_tokens in affected_ppas_with_tokens.iteritems():
91+ for ppa in affected_ppas:
92 # If this PPA is blacklisted, do not touch it's htaccess/pwd
93 # files.
94 blacklisted_ppa_names_for_owner = self.blacklist.get(
95@@ -322,7 +299,7 @@
96 continue
97
98 self.ensureHtaccess(ppa)
99- temp_htpasswd = self.generateHtpasswd(ppa, valid_tokens)
100+ temp_htpasswd = self.generateHtpasswd(ppa)
101 if not self.replaceUpdatedHtpasswd(ppa, temp_htpasswd):
102 os.remove(temp_htpasswd)
103
104
105=== modified file 'lib/lp/archivepublisher/tests/test_generate_ppa_htaccess.py'
106--- lib/lp/archivepublisher/tests/test_generate_ppa_htaccess.py 2012-05-10 21:44:21 +0000
107+++ lib/lp/archivepublisher/tests/test_generate_ppa_htaccess.py 2012-05-30 17:10:23 +0000
108@@ -126,7 +126,7 @@
109
110 # Generate the passwd file.
111 script = self.getScript()
112- filename = script.generateHtpasswd(self.ppa, tokens)
113+ filename = script.generateHtpasswd(self.ppa)
114
115 # It should be a temp file in the same directory as the intended
116 # target file when it's renamed, so that os.rename() won't
117
118=== modified file 'lib/lp/archivepublisher/tests/test_htaccess.py'
119--- lib/lp/archivepublisher/tests/test_htaccess.py 2012-01-01 02:58:52 +0000
120+++ lib/lp/archivepublisher/tests/test_htaccess.py 2012-05-30 17:10:23 +0000
121@@ -103,15 +103,13 @@
122 def test_credentials_for_archive(self):
123 # ArchiveAuthTokens for an archive are returned by
124 # credentials_for_archive.
125- # Make some subscriptions and tokens.
126 self.ppa.buildd_secret = "geheim"
127 name12 = getUtility(IPersonSet).getByName("name12")
128 name16 = getUtility(IPersonSet).getByName("name16")
129 self.ppa.newSubscription(name12, self.ppa.owner)
130 self.ppa.newSubscription(name16, self.ppa.owner)
131- tokens = []
132- tokens.append(self.ppa.newAuthToken(name12))
133- tokens.append(self.ppa.newAuthToken(name16))
134+ first_created_token = self.ppa.newAuthToken(name16)
135+ tokens = [self.ppa.newAuthToken(name12), first_created_token]
136
137 credentials = list(htpasswd_credentials_for_archive(self.ppa, tokens))
138