Merge ~ilasc/launchpad:git-repack-cron-script into launchpad:master

Proposed by Ioana Lasc
Status: Merged
Approved by: Ioana Lasc
Approved revision: a078850beec0a6bee402dfc649f259481bb58708
Merge reported by: Otto Co-Pilot
Merged at revision: not available
Proposed branch: ~ilasc/launchpad:git-repack-cron-script
Merge into: launchpad:master
Diff against target: 340 lines (+289/-0)
6 files modified
cronscripts/repack_git_repositories.py (+33/-0)
database/schema/security.cfg (+1/-0)
lib/lp/code/model/gitrepository.py (+1/-0)
lib/lp/code/scripts/repackgitrepository.py (+97/-0)
lib/lp/code/scripts/tests/test_repack_git_repositories.py (+154/-0)
lib/lp/services/config/schema-lazr.conf (+3/-0)
Reviewer Review Type Date Requested Status
Colin Watson (community) Approve
Review via email: mp+398963@code.launchpad.net

Commit message

Add cron script for git repack

To post a comment you must log in.
8e67df7... by Ioana Lasc

Organize imports

Revision history for this message
Ioana Lasc (ilasc) wrote :

The unit test (test_request_git_repack) doesn't seem to trigger the script, this is really a copy/edit/debug and attempt to understand how we trigger daily builds, but not getting far on this.

Revision history for this message
Colin Watson (cjwatson) :
7ce4087... by Ioana Lasc

Move repackgitrepository to lp/code/scripts

51ca1f4... by Ioana Lasc

Cleanup unnecessary git-repack user

Revision history for this message
Ioana Lasc (ilasc) wrote :

Thanks Colin, MP is now ready for review.

Revision history for this message
Colin Watson (cjwatson) wrote :

Looks like a good first attempt, thanks!

This is potentially quite operationally expensive, so in this review I've paid special attention to things that I think could make it generally kinder and gentler to our infrastructure; the "Needs Fixing" is mainly for those.

review: Needs Fixing
Revision history for this message
Colin Watson (cjwatson) :
3020f70... by Ioana Lasc

Add TunableLoop to git repack automation

Revision history for this message
Ioana Lasc (ilasc) wrote :

Thanks Colin, excellent suggestions as usual!

This is now working with a TunableLoop but needs more work: the TunableLoop doesn't hook into the fake turnip thread for the calls to gitHostingClient and it is oops-ing with: https://pastebin.canonical.com/p/6RKM3kfkTT/
I'm working on getting the LaunchpadCronScript and TunableLoop to interact with Fake Turnip Thread in unit test instead of trying to contact the real application.

In terms of the permissions added to the branchscanner user, this is how it crashed if SELECT rights are not granted to the user:https://pastebin.canonical.com/p/qVwRKjztzM/
And it appears I've gone overboard with permissions as UPDATE is not necessary, so will be taking that out.

I didn't miss your last comment about adding at least another test for the case where we have more than single repository qualifying for repack. As soon as I get the Unit Tests to play nicely with the FakeTurnipApplication will be adding that in too.

82b632e... by Ioana Lasc

Set default timeout function for RepackTunableLoop

5f84608... by Ioana Lasc

Add unit tests for RepackTunableLoop

Revision history for this message
Ioana Lasc (ilasc) wrote :

MP now ready for review.

Revision history for this message
Colin Watson (cjwatson) wrote :

Thanks for the updates. Various details follow; at least the index needs to be sorted out before landing this.

review: Approve
a078850... by Ioana Lasc

Address review comments for repack cron job

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/cronscripts/repack_git_repositories.py b/cronscripts/repack_git_repositories.py
2new file mode 100755
3index 0000000..e553616
4--- /dev/null
5+++ b/cronscripts/repack_git_repositories.py
6@@ -0,0 +1,33 @@
7+#!/usr/bin/python2 -S
8+#
9+# Copyright 2021 Canonical Ltd. This software is licensed under the
10+# GNU Affero General Public License version 3 (see the file LICENSE).
11+
12+import _pythonpath
13+
14+from lp.code.scripts.repackgitrepository import RepackTunableLoop
15+from lp.services.config import config
16+from lp.services.scripts.base import LaunchpadCronScript
17+from lp.services.timeout import set_default_timeout_function
18+
19+
20+class RepackGitRepositories(LaunchpadCronScript):
21+
22+ def add_my_options(self):
23+ self.parser.add_option(
24+ "--dry-run", action="store_true",
25+ dest="dry_run", default=False,
26+ help="Reports which repositories would be repacked without "
27+ "actually repacking the repositories.")
28+
29+ def main(self):
30+ set_default_timeout_function(
31+ lambda: config.repack_git_repositories.timeout)
32+ updater = RepackTunableLoop(self.logger, self.options.dry_run)
33+ updater.run()
34+
35+
36+if __name__ == '__main__':
37+ script = RepackGitRepositories(
38+ 'repack_git_repositories', dbuser='branchscanner')
39+ script.lock_and_run()
40diff --git a/database/schema/security.cfg b/database/schema/security.cfg
41index 2f4ffdf..2911da2 100644
42--- a/database/schema/security.cfg
43+++ b/database/schema/security.cfg
44@@ -699,6 +699,7 @@ public.bugsubscriptionfilterimportance = SELECT
45 public.bugsubscriptionfilterstatus = SELECT
46 public.bugsubscriptionfiltertag = SELECT
47 public.bugtag = SELECT
48+public.codeimport = SELECT
49 public.codereviewmessage = SELECT
50 public.codereviewvote = SELECT
51 public.diff = SELECT, INSERT, DELETE
52diff --git a/lib/lp/code/model/gitrepository.py b/lib/lp/code/model/gitrepository.py
53index 589dcbf..429b502 100644
54--- a/lib/lp/code/model/gitrepository.py
55+++ b/lib/lp/code/model/gitrepository.py
56@@ -489,6 +489,7 @@ class GitRepository(StormBase, WebhookTargetMixin, GitIdentityMixin):
57
58 def repackRepository(self):
59 getUtility(IGitHostingClient).repackRepository(self.getInternalPath())
60+ self.date_last_repacked = UTC_NOW
61
62 @property
63 def namespace(self):
64diff --git a/lib/lp/code/scripts/repackgitrepository.py b/lib/lp/code/scripts/repackgitrepository.py
65new file mode 100644
66index 0000000..0c25ff6
67--- /dev/null
68+++ b/lib/lp/code/scripts/repackgitrepository.py
69@@ -0,0 +1,97 @@
70+# Copyright 2021 Canonical Ltd. This software is licensed under the
71+# GNU Affero General Public License version 3 (see the file LICENSE).
72+
73+"""Functions used with the repack Git repositories script."""
74+
75+__metaclass__ = type
76+
77+from psycopg2.extensions import TransactionRollbackError
78+import six
79+from storm.expr import (
80+ And,
81+ Or,
82+ )
83+import transaction
84+
85+from lp.code.enums import GitRepositoryStatus
86+from lp.code.errors import CannotRepackRepository
87+from lp.code.model.gitrepository import GitRepository
88+from lp.services.config import config
89+from lp.services.database.interfaces import IStore
90+from lp.services.looptuner import (
91+ LoopTuner,
92+ TunableLoop,
93+ )
94+
95+
96+class RepackTunableLoop(TunableLoop):
97+ tuner_class = LoopTuner
98+ maximum_chunk_size = 5
99+
100+ def __init__(self, log, dry_run, abort_time=None):
101+ super(RepackTunableLoop, self).__init__(log, abort_time)
102+ self.dry_run = dry_run
103+ self.start_at = 0
104+ self.logger = log
105+ self.store = IStore(GitRepository)
106+
107+ def findRepackCandidates(self):
108+ repos = self.store.find(
109+ GitRepository,
110+ (Or(
111+ GitRepository.loose_object_count >=
112+ config.codehosting.loose_objects_threshold,
113+ GitRepository.status == GitRepositoryStatus.AVAILABLE,
114+ GitRepository.pack_count >=
115+ config.codehosting.packs_threshold
116+ ),
117+ And(GitRepository.id > self.start_at))).order_by(GitRepository.id)
118+ return repos
119+
120+ def isDone(self):
121+ return self.findRepackCandidates().is_empty()
122+
123+ def __call__(self, chunk_size):
124+ repackable_repos = list(self.findRepackCandidates()[:chunk_size])
125+ counter = 0
126+ for repo in repackable_repos:
127+ try:
128+ if self.dry_run:
129+ print ('Would repack %s' % repo.identity)
130+ else:
131+ self.logger.info(
132+ 'Requesting automatic git repository repack for %s.'
133+ % repo.identity)
134+ repo.repackRepository()
135+ counter += 1
136+ except CannotRepackRepository as e:
137+ self.logger.error(
138+ 'An error occurred while requesting repository repack %s'
139+ % e.message)
140+ continue
141+ except TransactionRollbackError as error:
142+ self.logger.error(
143+ 'An error occurred while requesting repository repack %s'
144+ % six.text_type(error))
145+ if transaction is not None:
146+ transaction.abort()
147+ continue
148+
149+ if self.dry_run:
150+ print(
151+ 'Reporting %d automatic git repository repacks '
152+ 'would have been requested as part of this run '
153+ 'out of the %d qualifying for repack.'
154+ % (counter, len(repackable_repos)))
155+ else:
156+ self.logger.info(
157+ 'Requested %d automatic git repository repacks '
158+ 'out of the %d qualifying for repack.'
159+ % (counter, len(repackable_repos)))
160+
161+ self.start_at = repackable_repos[-1].id
162+
163+ if not self.dry_run:
164+ transaction.commit()
165+ else:
166+ transaction.abort()
167diff --git a/lib/lp/code/scripts/tests/test_repack_git_repositories.py b/lib/lp/code/scripts/tests/test_repack_git_repositories.py
168new file mode 100644
169index 0000000..59585cd
170--- /dev/null
171+++ b/lib/lp/code/scripts/tests/test_repack_git_repositories.py
172@@ -0,0 +1,154 @@
173+# Copyright 2021 Canonical Ltd. This software is licensed under the
174+# GNU Affero General Public License version 3 (see the file LICENSE).
175+
176+"""Test the repack_git_repositories script."""
177+import threading
178+from wsgiref.simple_server import (
179+ make_server,
180+ WSGIRequestHandler,
181+ )
182+
183+import transaction
184+from zope.security.proxy import removeSecurityProxy
185+
186+from lp.services.config import config
187+from lp.services.config.fixture import (
188+ ConfigFixture,
189+ ConfigUseFixture,
190+ )
191+from lp.testing import TestCaseWithFactory
192+from lp.testing.layers import ZopelessAppServerLayer
193+from lp.testing.script import run_script
194+
195+
196+class SilentWSGIRequestHandler(WSGIRequestHandler):
197+ """A request handler that doesn't log requests."""
198+
199+ def log_message(self, fmt, *args):
200+ pass
201+
202+
203+class FakeTurnipApplication:
204+ """A WSGI application that provides a fake turnip endpoint."""
205+
206+ def __init__(self):
207+ self.contents = []
208+
209+ def __call__(self, environ, start_response):
210+ self.contents.append(environ['PATH_INFO'])
211+ start_response('200 OK', [('Content-Type', 'text/plain')])
212+ return [b'']
213+
214+
215+class FakeTurnipServer(threading.Thread):
216+ """Thread that runs a fake turnip server."""
217+
218+ def __init__(self):
219+ super(FakeTurnipServer, self).__init__()
220+ self.name = 'FakeTurnipServer'
221+ self.app = FakeTurnipApplication()
222+ self.server = make_server(
223+ 'localhost', 0, self.app, handler_class=SilentWSGIRequestHandler)
224+
225+ def run(self):
226+ self.server.serve_forever()
227+
228+ def getURL(self):
229+ host, port = self.server.server_address
230+ return 'http://%s:%d/' % (host, port)
231+
232+ def stop(self):
233+ self.server.shutdown()
234+
235+
236+class TestRequestGitRepack(TestCaseWithFactory):
237+
238+ layer = ZopelessAppServerLayer
239+
240+ def setUp(self):
241+ super(TestRequestGitRepack, self).setUp()
242+
243+ def runScript_no_Turnip(self):
244+ transaction.commit()
245+
246+ (ret, out, err) = run_script('cronscripts/repack_git_repositories.py')
247+ self.assertIn(
248+ 'An error occurred while requesting repository repack',
249+ err)
250+ self.assertIn(
251+ 'Failed to repack Git repository 1', err)
252+ self.assertIn(
253+ 'Requested 0 automatic git repository '
254+ 'repacks out of the 1 qualifying for repack.', err)
255+ transaction.commit()
256+
257+ def runScript_with_Turnip(self):
258+ transaction.commit()
259+ (ret, out, err) = run_script('cronscripts/repack_git_repositories.py')
260+ self.assertIn(
261+ 'Requested 1 automatic git repository repacks '
262+ 'out of the 1 qualifying for repack.', err)
263+ transaction.commit()
264+
265+ def makeTurnipServer(self):
266+ self.turnip_server = FakeTurnipServer()
267+ config_name = self.factory.getUniqueString()
268+ config_fixture = self.useFixture(ConfigFixture(
269+ config_name, config.instance_name))
270+ setting_lines = [
271+ '[codehosting]',
272+ 'internal_git_api_endpoint: %s' % self.turnip_server.getURL(),
273+ ]
274+ config_fixture.add_section('\n' + '\n'.join(setting_lines))
275+ self.useFixture(ConfigUseFixture(config_name))
276+ self.turnip_server.start()
277+ self.addCleanup(self.turnip_server.stop)
278+ return self.turnip_server
279+
280+ def test_auto_repack_without_Turnip(self):
281+ repo = self.factory.makeGitRepository()
282+ repo = removeSecurityProxy(repo)
283+ repo.loose_object_count = 7000
284+ repo.pack_count = 43
285+
286+ # Do not start the fake turnip server here
287+ # to test if the RequestGitRepack will catch and
288+ # log correctly the failure to establish
289+ # a connection to Turnip
290+ self.runScript_no_Turnip()
291+ self.assertIsNone(repo.date_last_repacked)
292+
293+ def test_auto_repack_with_Turnip_one_repo(self):
294+ # Test repack works when only one repository
295+ # qualifies for a repack
296+ repo = self.factory.makeGitRepository()
297+ repo = removeSecurityProxy(repo)
298+ repo.loose_object_count = 7000
299+ repo.pack_count = 43
300+ transaction.commit()
301+
302+ self.makeTurnipServer()
303+
304+ self.runScript_with_Turnip()
305+
306+ self.assertIsNotNone(repo.date_last_repacked)
307+
308+ def test_auto_repack_with_Turnip_multiple_repos(self):
309+ # Test repack works when 10 repositories
310+ # qualifies for a repack
311+ repo = []
312+ for i in range(10):
313+ repo.append(self.factory.makeGitRepository())
314+ repo[i] = removeSecurityProxy(repo[i])
315+ repo[i].loose_object_count = 7000
316+ repo[i].pack_count = 43
317+ transaction.commit()
318+
319+ self.makeTurnipServer()
320+
321+ self.runScript_with_Turnip()
322+
323+ for i in range(10):
324+ self.assertIsNotNone(repo[i].date_last_repacked)
325+ self.assertEqual("/repo/%s/repack" % repo[i].getInternalPath(),
326+ self.turnip_server.app.contents[i])
327diff --git a/lib/lp/services/config/schema-lazr.conf b/lib/lp/services/config/schema-lazr.conf
328index fe28a8c..c676504 100644
329--- a/lib/lp/services/config/schema-lazr.conf
330+++ b/lib/lp/services/config/schema-lazr.conf
331@@ -1495,6 +1495,9 @@ dbuser: request-daily-builds
332 # datatype: integer
333 timeout: 60
334
335+[repack_git_repositories]
336+# datatype: integer
337+timeout: 15
338
339 [revisionkarma]
340 # The database user which will be used by this process.

Subscribers

People subscribed via source and target branches

to status/vote changes: