Merge ~ilasc/launchpad:git-repack-cron-script into launchpad:master

Proposed by Ioana Lasc
Status: Merged
Approved by: Ioana Lasc
Approved revision: a078850beec0a6bee402dfc649f259481bb58708
Merge reported by: Otto Co-Pilot
Merged at revision: not available
Proposed branch: ~ilasc/launchpad:git-repack-cron-script
Merge into: launchpad:master
Diff against target: 340 lines (+289/-0)
6 files modified
cronscripts/repack_git_repositories.py (+33/-0)
database/schema/security.cfg (+1/-0)
lib/lp/code/model/gitrepository.py (+1/-0)
lib/lp/code/scripts/repackgitrepository.py (+97/-0)
lib/lp/code/scripts/tests/test_repack_git_repositories.py (+154/-0)
lib/lp/services/config/schema-lazr.conf (+3/-0)
Reviewer Review Type Date Requested Status
Colin Watson (community) Approve
Review via email: mp+398963@code.launchpad.net

Commit message

Add cron script for git repack

To post a comment you must log in.
8e67df7... by Ioana Lasc

Organize imports

Revision history for this message
Ioana Lasc (ilasc) wrote :

The unit test (test_request_git_repack) doesn't seem to trigger the script, this is really a copy/edit/debug and attempt to understand how we trigger daily builds, but not getting far on this.

Revision history for this message
Colin Watson (cjwatson) :
7ce4087... by Ioana Lasc

Move repackgitrepository to lp/code/scripts

51ca1f4... by Ioana Lasc

Cleanup unnecessary git-repack user

Revision history for this message
Ioana Lasc (ilasc) wrote :

Thanks Colin, MP is now ready for review.

Revision history for this message
Colin Watson (cjwatson) wrote :

Looks like a good first attempt, thanks!

This is potentially quite operationally expensive, so in this review I've paid special attention to things that I think could make it generally kinder and gentler to our infrastructure; the "Needs Fixing" is mainly for those.

review: Needs Fixing
Revision history for this message
Colin Watson (cjwatson) :
3020f70... by Ioana Lasc

Add TunableLoop to git repack automation

Revision history for this message
Ioana Lasc (ilasc) wrote :

Thanks Colin, excellent suggestions as usual!

This is now working with a TunableLoop but needs more work: the TunableLoop doesn't hook into the fake turnip thread for the calls to gitHostingClient and it is oops-ing with: https://pastebin.canonical.com/p/6RKM3kfkTT/
I'm working on getting the LaunchpadCronScript and TunableLoop to interact with Fake Turnip Thread in unit test instead of trying to contact the real application.

In terms of the permissions added to the branchscanner user, this is how it crashed if SELECT rights are not granted to the user:https://pastebin.canonical.com/p/qVwRKjztzM/
And it appears I've gone overboard with permissions as UPDATE is not necessary, so will be taking that out.

I didn't miss your last comment about adding at least another test for the case where we have more than single repository qualifying for repack. As soon as I get the Unit Tests to play nicely with the FakeTurnipApplication will be adding that in too.

82b632e... by Ioana Lasc

Set default timeout function for RepackTunableLoop

5f84608... by Ioana Lasc

Add unit tests for RepackTunableLoop

Revision history for this message
Ioana Lasc (ilasc) wrote :

MP now ready for review.

Revision history for this message
Colin Watson (cjwatson) wrote :

Thanks for the updates. Various details follow; at least the index needs to be sorted out before landing this.

review: Approve
a078850... by Ioana Lasc

Address review comments for repack cron job

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
diff --git a/cronscripts/repack_git_repositories.py b/cronscripts/repack_git_repositories.py
0new file mode 1007550new file mode 100755
index 0000000..e553616
--- /dev/null
+++ b/cronscripts/repack_git_repositories.py
@@ -0,0 +1,33 @@
1#!/usr/bin/python2 -S
2#
3# Copyright 2021 Canonical Ltd. This software is licensed under the
4# GNU Affero General Public License version 3 (see the file LICENSE).
5
6import _pythonpath
7
8from lp.code.scripts.repackgitrepository import RepackTunableLoop
9from lp.services.config import config
10from lp.services.scripts.base import LaunchpadCronScript
11from lp.services.timeout import set_default_timeout_function
12
13
14class RepackGitRepositories(LaunchpadCronScript):
15
16 def add_my_options(self):
17 self.parser.add_option(
18 "--dry-run", action="store_true",
19 dest="dry_run", default=False,
20 help="Reports which repositories would be repacked without "
21 "actually repacking the repositories.")
22
23 def main(self):
24 set_default_timeout_function(
25 lambda: config.repack_git_repositories.timeout)
26 updater = RepackTunableLoop(self.logger, self.options.dry_run)
27 updater.run()
28
29
30if __name__ == '__main__':
31 script = RepackGitRepositories(
32 'repack_git_repositories', dbuser='branchscanner')
33 script.lock_and_run()
diff --git a/database/schema/security.cfg b/database/schema/security.cfg
index 2f4ffdf..2911da2 100644
--- a/database/schema/security.cfg
+++ b/database/schema/security.cfg
@@ -699,6 +699,7 @@ public.bugsubscriptionfilterimportance = SELECT
699public.bugsubscriptionfilterstatus = SELECT699public.bugsubscriptionfilterstatus = SELECT
700public.bugsubscriptionfiltertag = SELECT700public.bugsubscriptionfiltertag = SELECT
701public.bugtag = SELECT701public.bugtag = SELECT
702public.codeimport = SELECT
702public.codereviewmessage = SELECT703public.codereviewmessage = SELECT
703public.codereviewvote = SELECT704public.codereviewvote = SELECT
704public.diff = SELECT, INSERT, DELETE705public.diff = SELECT, INSERT, DELETE
diff --git a/lib/lp/code/model/gitrepository.py b/lib/lp/code/model/gitrepository.py
index 589dcbf..429b502 100644
--- a/lib/lp/code/model/gitrepository.py
+++ b/lib/lp/code/model/gitrepository.py
@@ -489,6 +489,7 @@ class GitRepository(StormBase, WebhookTargetMixin, GitIdentityMixin):
489489
490 def repackRepository(self):490 def repackRepository(self):
491 getUtility(IGitHostingClient).repackRepository(self.getInternalPath())491 getUtility(IGitHostingClient).repackRepository(self.getInternalPath())
492 self.date_last_repacked = UTC_NOW
492493
493 @property494 @property
494 def namespace(self):495 def namespace(self):
diff --git a/lib/lp/code/scripts/repackgitrepository.py b/lib/lp/code/scripts/repackgitrepository.py
495new file mode 100644496new file mode 100644
index 0000000..0c25ff6
--- /dev/null
+++ b/lib/lp/code/scripts/repackgitrepository.py
@@ -0,0 +1,97 @@
1# Copyright 2021 Canonical Ltd. This software is licensed under the
2# GNU Affero General Public License version 3 (see the file LICENSE).
3
4"""Functions used with the repack Git repositories script."""
5
6__metaclass__ = type
7
8from psycopg2.extensions import TransactionRollbackError
9import six
10from storm.expr import (
11 And,
12 Or,
13 )
14import transaction
15
16from lp.code.enums import GitRepositoryStatus
17from lp.code.errors import CannotRepackRepository
18from lp.code.model.gitrepository import GitRepository
19from lp.services.config import config
20from lp.services.database.interfaces import IStore
21from lp.services.looptuner import (
22 LoopTuner,
23 TunableLoop,
24 )
25
26
27class RepackTunableLoop(TunableLoop):
28 tuner_class = LoopTuner
29 maximum_chunk_size = 5
30
31 def __init__(self, log, dry_run, abort_time=None):
32 super(RepackTunableLoop, self).__init__(log, abort_time)
33 self.dry_run = dry_run
34 self.start_at = 0
35 self.logger = log
36 self.store = IStore(GitRepository)
37
38 def findRepackCandidates(self):
39 repos = self.store.find(
40 GitRepository,
41 (Or(
42 GitRepository.loose_object_count >=
43 config.codehosting.loose_objects_threshold,
44 GitRepository.status == GitRepositoryStatus.AVAILABLE,
45 GitRepository.pack_count >=
46 config.codehosting.packs_threshold
47 ),
48 And(GitRepository.id > self.start_at))).order_by(GitRepository.id)
49 return repos
50
51 def isDone(self):
52 return self.findRepackCandidates().is_empty()
53
54 def __call__(self, chunk_size):
55 repackable_repos = list(self.findRepackCandidates()[:chunk_size])
56 counter = 0
57 for repo in repackable_repos:
58 try:
59 if self.dry_run:
60 print ('Would repack %s' % repo.identity)
61 else:
62 self.logger.info(
63 'Requesting automatic git repository repack for %s.'
64 % repo.identity)
65 repo.repackRepository()
66 counter += 1
67 except CannotRepackRepository as e:
68 self.logger.error(
69 'An error occurred while requesting repository repack %s'
70 % e.message)
71 continue
72 except TransactionRollbackError as error:
73 self.logger.error(
74 'An error occurred while requesting repository repack %s'
75 % six.text_type(error))
76 if transaction is not None:
77 transaction.abort()
78 continue
79
80 if self.dry_run:
81 print(
82 'Reporting %d automatic git repository repacks '
83 'would have been requested as part of this run '
84 'out of the %d qualifying for repack.'
85 % (counter, len(repackable_repos)))
86 else:
87 self.logger.info(
88 'Requested %d automatic git repository repacks '
89 'out of the %d qualifying for repack.'
90 % (counter, len(repackable_repos)))
91
92 self.start_at = repackable_repos[-1].id
93
94 if not self.dry_run:
95 transaction.commit()
96 else:
97 transaction.abort()
diff --git a/lib/lp/code/scripts/tests/test_repack_git_repositories.py b/lib/lp/code/scripts/tests/test_repack_git_repositories.py
0new file mode 10064498new file mode 100644
index 0000000..59585cd
--- /dev/null
+++ b/lib/lp/code/scripts/tests/test_repack_git_repositories.py
@@ -0,0 +1,154 @@
1# Copyright 2021 Canonical Ltd. This software is licensed under the
2# GNU Affero General Public License version 3 (see the file LICENSE).
3
4"""Test the repack_git_repositories script."""
5import threading
6from wsgiref.simple_server import (
7 make_server,
8 WSGIRequestHandler,
9 )
10
11import transaction
12from zope.security.proxy import removeSecurityProxy
13
14from lp.services.config import config
15from lp.services.config.fixture import (
16 ConfigFixture,
17 ConfigUseFixture,
18 )
19from lp.testing import TestCaseWithFactory
20from lp.testing.layers import ZopelessAppServerLayer
21from lp.testing.script import run_script
22
23
24class SilentWSGIRequestHandler(WSGIRequestHandler):
25 """A request handler that doesn't log requests."""
26
27 def log_message(self, fmt, *args):
28 pass
29
30
31class FakeTurnipApplication:
32 """A WSGI application that provides a fake turnip endpoint."""
33
34 def __init__(self):
35 self.contents = []
36
37 def __call__(self, environ, start_response):
38 self.contents.append(environ['PATH_INFO'])
39 start_response('200 OK', [('Content-Type', 'text/plain')])
40 return [b'']
41
42
43class FakeTurnipServer(threading.Thread):
44 """Thread that runs a fake turnip server."""
45
46 def __init__(self):
47 super(FakeTurnipServer, self).__init__()
48 self.name = 'FakeTurnipServer'
49 self.app = FakeTurnipApplication()
50 self.server = make_server(
51 'localhost', 0, self.app, handler_class=SilentWSGIRequestHandler)
52
53 def run(self):
54 self.server.serve_forever()
55
56 def getURL(self):
57 host, port = self.server.server_address
58 return 'http://%s:%d/' % (host, port)
59
60 def stop(self):
61 self.server.shutdown()
62
63
64class TestRequestGitRepack(TestCaseWithFactory):
65
66 layer = ZopelessAppServerLayer
67
68 def setUp(self):
69 super(TestRequestGitRepack, self).setUp()
70
71 def runScript_no_Turnip(self):
72 transaction.commit()
73
74 (ret, out, err) = run_script('cronscripts/repack_git_repositories.py')
75 self.assertIn(
76 'An error occurred while requesting repository repack',
77 err)
78 self.assertIn(
79 'Failed to repack Git repository 1', err)
80 self.assertIn(
81 'Requested 0 automatic git repository '
82 'repacks out of the 1 qualifying for repack.', err)
83 transaction.commit()
84
85 def runScript_with_Turnip(self):
86 transaction.commit()
87 (ret, out, err) = run_script('cronscripts/repack_git_repositories.py')
88 self.assertIn(
89 'Requested 1 automatic git repository repacks '
90 'out of the 1 qualifying for repack.', err)
91 transaction.commit()
92
93 def makeTurnipServer(self):
94 self.turnip_server = FakeTurnipServer()
95 config_name = self.factory.getUniqueString()
96 config_fixture = self.useFixture(ConfigFixture(
97 config_name, config.instance_name))
98 setting_lines = [
99 '[codehosting]',
100 'internal_git_api_endpoint: %s' % self.turnip_server.getURL(),
101 ]
102 config_fixture.add_section('\n' + '\n'.join(setting_lines))
103 self.useFixture(ConfigUseFixture(config_name))
104 self.turnip_server.start()
105 self.addCleanup(self.turnip_server.stop)
106 return self.turnip_server
107
108 def test_auto_repack_without_Turnip(self):
109 repo = self.factory.makeGitRepository()
110 repo = removeSecurityProxy(repo)
111 repo.loose_object_count = 7000
112 repo.pack_count = 43
113
114 # Do not start the fake turnip server here
115 # to test if the RequestGitRepack will catch and
116 # log correctly the failure to establish
117 # a connection to Turnip
118 self.runScript_no_Turnip()
119 self.assertIsNone(repo.date_last_repacked)
120
121 def test_auto_repack_with_Turnip_one_repo(self):
122 # Test repack works when only one repository
123 # qualifies for a repack
124 repo = self.factory.makeGitRepository()
125 repo = removeSecurityProxy(repo)
126 repo.loose_object_count = 7000
127 repo.pack_count = 43
128 transaction.commit()
129
130 self.makeTurnipServer()
131
132 self.runScript_with_Turnip()
133
134 self.assertIsNotNone(repo.date_last_repacked)
135
136 def test_auto_repack_with_Turnip_multiple_repos(self):
137 # Test repack works when 10 repositories
138 # qualifies for a repack
139 repo = []
140 for i in range(10):
141 repo.append(self.factory.makeGitRepository())
142 repo[i] = removeSecurityProxy(repo[i])
143 repo[i].loose_object_count = 7000
144 repo[i].pack_count = 43
145 transaction.commit()
146
147 self.makeTurnipServer()
148
149 self.runScript_with_Turnip()
150
151 for i in range(10):
152 self.assertIsNotNone(repo[i].date_last_repacked)
153 self.assertEqual("/repo/%s/repack" % repo[i].getInternalPath(),
154 self.turnip_server.app.contents[i])
diff --git a/lib/lp/services/config/schema-lazr.conf b/lib/lp/services/config/schema-lazr.conf
index fe28a8c..c676504 100644
--- a/lib/lp/services/config/schema-lazr.conf
+++ b/lib/lp/services/config/schema-lazr.conf
@@ -1495,6 +1495,9 @@ dbuser: request-daily-builds
1495# datatype: integer1495# datatype: integer
1496timeout: 601496timeout: 60
14971497
1498[repack_git_repositories]
1499# datatype: integer
1500timeout: 15
14981501
1499[revisionkarma]1502[revisionkarma]
1500# The database user which will be used by this process.1503# The database user which will be used by this process.

Subscribers

People subscribed via source and target branches

to status/vote changes: