Merge ~pappacena/turnip:paginated-check-refs-permissions into turnip:master

Proposed by Thiago F. Pappacena
Status: Rejected
Rejected by: Thiago F. Pappacena
Proposed branch: ~pappacena/turnip:paginated-check-refs-permissions
Merge into: turnip:master
Diff against target: 172 lines (+77/-12)
4 files modified
requirements.txt (+1/-0)
setup.py (+1/-0)
turnip/pack/hooks/hook.py (+25/-9)
turnip/pack/tests/test_hooks.py (+50/-3)
Reviewer Review Type Date Requested Status
Colin Watson (community) Needs Information
Review via email: mp+384593@code.launchpad.net

Commit message

Adding better error reporting and paginating the requests to check_ref_permissions on hooks, to avoid connection timeout when a user git pushes dozens of refs at once.

Description of the change

Ideally, when a user pushes a lot of branches at once, we should parallelize (in a thread pool executor, for example) the permissions checks in batches (instead of checking one batch after another).

But such change would require a bit more refactoring to create different sockets, so this goes in a future MP.

To post a comment you must log in.
Revision history for this message
Thiago F. Pappacena (pappacena) wrote :

This branch uses mock package for python 2. So, it depends on https://code.launchpad.net/~pappacena/turnip/+git/turnip-dependencies/+merge/384592.

Revision history for this message
Colin Watson (cjwatson) wrote :

(Please attach bugs to your merge proposals where appropriate; it's handy to have things interlinked. The method described in https://help.launchpad.net/Code/Git#Linking_to_bugs is easiest, but you can also link bugs from the merge proposal web UI after the fact.)

I'm not convinced that this is solving the right problem here. My instinct is that checking a couple of thousand refs shouldn't cause a timeout on the appserver side. Now, my instinct could be wrong, but if so we ought to be able to find an OOPS corresponding to the timeout; but when I went hunting for Andy's problem on 2020-05-26, I could find the push attempt (which took over half an hour), but it looked as though the checkRefPermissions request never made it as far as the appserver. That suggests that the problem is a somewhat different one.

So, I had a look at how the netstring protocol is implemented between turnip.pack.hooks.hook and turnip.pack.hookrpc. I found two separate problems:

 * netstring_send calls sock.send, but it doesn't handle the case of short sends. sock.send is perfectly entitled to send fewer bytes than you asked it to send, and in that case it expects you to call it again to attempt delivery of the remaining data. This could account for the problem all by itself: we'd see the described symptoms if sock.send only sent part of the netstring and then hookrpc waited forever (or at least until a timeout somewhere) waiting to read the rest of it.

 * hookrpc uses twisted.protocols.basic.NetstringReceiver, which imposes a maximum length of 99999 on netstrings. I think we should consider either removing this limit entirely (perhaps by overriding _checkStringSize) or setting it to something a couple of orders of magnitude larger. Our netstring receiver isn't on a significant security boundary, and we've already read the full list of ref paths into memory anyway by the time we get to this point. 99999 bytes may well not be enough for the JSON encoding of 2000 ref paths.

I also spotted one problem in the implementation of GitRepository.checkRefPermissions in Launchpad: it doesn't preload grantees of GitGranteeType.PERSON, so if there are many matching rules with grants for different teams of which the user is a member, it will incur on the order of one query per relevant grantee. This is bounded above by the number of teams of which the user is a member, so it's unlikely to be a timeout-level problem in practice, but it might be worth cleaning up.

I have a strong suspicion that just fixing the two netstring-related bugs will solve the problem without needing to do any complicated pagination, and I think we should try that first. If we do need to resort to pagination, then the page size can almost certainly be very much larger: I'd start at more like 250 than 25, and I wouldn't be surprised if we could go quite a bit higher still. With larger page sizes, there'll be many fewer round trips involved, and there should be no need to resort to the considerable extra complexity of parallel checks.

review: Needs Information
Revision history for this message
Thiago F. Pappacena (pappacena) wrote :

Hi, cjwatson.

- I'll check both problems you've mentioned on netstring, and open separated MPs for that. I didn't spot those problems when checking, and thanks for pointing that out.

- GitRepository.checkRefPermissions would be my next step, to see query count and other common performance problems. I'll check that too.

- I was planning to adjust the page size after benchmarking a bit better. Yesterday night I've run some quick tests locally, and already realized that 25 was way too small (before your comment, I had already pushed a change to make it 100). The problem is that the reported bug was trying to push 2.000 branches.

- The pagination and parallelization was actually not that complicated. Apart from the changes on where we do the socket creation, make it run in parallel was a change of about 15 lines of code, for example. But I agree that most of our problems are probably somewhere else.

Revision history for this message
Colin Watson (cjwatson) wrote :

It's true that it's not a completely unreasonable amount of complexity. But if we can avoid it in favour of a couple of more surgical and relatively simple changes, then I'd like to try those first. :-)

Unmerged commits

472112b... by Thiago F. Pappacena

Fixing comment

545ac8a... by Thiago F. Pappacena

Merge branch 'paginated-check-refs-permissions' of git+ssh://git.launchpad.net/~pappacena/turnip into paginated-check-refs-permissions

cfa68f9... by Thiago F. Pappacena

Paginating check_ref_permissions RPC calls to avoid timeouts

44036eb... by Thiago F. Pappacena

Paginating check_ref_permissions RPC calls to avoid timeouts

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
diff --git a/requirements.txt b/requirements.txt
index a298c0a..cb86dfe 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -26,6 +26,7 @@ linecache2==1.0.0
26m2r==0.1.1426m2r==0.1.14
27mccabe==0.327mccabe==0.3
28mistune==0.8.328mistune==0.8.3
29mock==3.0.5
29Paste==2.0.230Paste==2.0.2
30PasteDeploy==2.1.031PasteDeploy==2.1.0
31pbr==5.4.432pbr==5.4.4
diff --git a/setup.py b/setup.py
index cd8f4fb..b5821f7 100755
--- a/setup.py
+++ b/setup.py
@@ -34,6 +34,7 @@ test_requires = [
34 'docutils',34 'docutils',
35 'fixtures',35 'fixtures',
36 'flake8',36 'flake8',
37 'mock',
37 'testtools',38 'testtools',
38 'webtest',39 'webtest',
39 ]40 ]
diff --git a/turnip/pack/hooks/hook.py b/turnip/pack/hooks/hook.py
index e135395..2969e21 100755
--- a/turnip/pack/hooks/hook.py
+++ b/turnip/pack/hooks/hook.py
@@ -113,14 +113,20 @@ def netstring_recv(sock):
113 c = sock.recv(1)113 c = sock.recv(1)
114 lengthstr = b''114 lengthstr = b''
115 while c != b':':115 while c != b':':
116 assert c.isdigit()116 if not c.isdigit():
117 raise ValueError(
118 "Invalid response: %s" % (six.ensure_text(c + sock.recv(256))))
117 lengthstr += c119 lengthstr += c
118 c = sock.recv(1)120 c = sock.recv(1)
119 length = int(lengthstr)121 length = int(lengthstr)
120 s = bytearray()122 s = bytearray()
121 while len(s) < length:123 while len(s) < length:
122 s += sock.recv(length - len(s))124 s += sock.recv(length - len(s))
123 assert sock.recv(1) == b','125 ending = sock.recv(1)
126 if ending != b',':
127 raise ValueError(
128 "Length error for message '%s': ending='%s'" %
129 (six.ensure_text(bytes(s)), six.ensure_text(ending)))
124 return bytes(s)130 return bytes(s)
125131
126132
@@ -135,14 +141,24 @@ def rpc_invoke(sock, method, args):
135 return res['result']141 return res['result']
136142
137143
138def check_ref_permissions(sock, rpc_key, ref_paths):144def split_list(lst, n):
145 """Splits the given list into chunks of size n."""
146 lst = list(lst)
147 return [lst[i:i + n] for i in range(0, len(lst), n)]
148
149
150def check_ref_permissions(sock, rpc_key, ref_paths, page_size=100):
139 ref_paths = [base64.b64encode(path).decode('UTF-8') for path in ref_paths]151 ref_paths = [base64.b64encode(path).decode('UTF-8') for path in ref_paths]
140 rule_lines = rpc_invoke(152 permissions = {}
141 sock, 'check_ref_permissions',153 # Paginate the rpc calls to avoid timeouts.
142 {'key': rpc_key, 'paths': ref_paths})154 for ref_paths_chunk in split_list(ref_paths, page_size):
143 return {155 rule_lines = rpc_invoke(
144 base64.b64decode(path.encode('UTF-8')): permissions156 sock, 'check_ref_permissions',
145 for path, permissions in rule_lines.items()}157 {'key': rpc_key, 'paths': ref_paths_chunk})
158 permissions.update({
159 base64.b64decode(path.encode('UTF-8')): permissions
160 for path, permissions in rule_lines.items()})
161 return permissions
146162
147163
148def send_mp_url(received_line):164def send_mp_url(received_line):
diff --git a/turnip/pack/tests/test_hooks.py b/turnip/pack/tests/test_hooks.py
index 449b9f9..b383d6b 100644
--- a/turnip/pack/tests/test_hooks.py
+++ b/turnip/pack/tests/test_hooks.py
@@ -17,7 +17,12 @@ from fixtures import (
17 MonkeyPatch,17 MonkeyPatch,
18 TempDir,18 TempDir,
19 )19 )
20try:
21 from unittest import mock
22except ImportError:
23 import mock
20import pygit224import pygit2
25import six
21from testtools import TestCase26from testtools import TestCase
22from testtools.deferredruntest import AsynchronousDeferredRunTest27from testtools.deferredruntest import AsynchronousDeferredRunTest
23from twisted.internet import (28from twisted.internet import (
@@ -29,6 +34,7 @@ from twisted.internet import (
29from turnip.pack import hookrpc34from turnip.pack import hookrpc
30from turnip.pack.helpers import ensure_hooks35from turnip.pack.helpers import ensure_hooks
31from turnip.pack.hooks import hook36from turnip.pack.hooks import hook
37from turnip.pack.hooks.hook import split_list, check_ref_permissions
3238
3339
34class HookProcessProtocol(protocol.ProcessProtocol):40class HookProcessProtocol(protocol.ProcessProtocol):
@@ -108,15 +114,21 @@ class TestNetstringRecv(TestCase):
108114
109 def test_nondigit(self):115 def test_nondigit(self):
110 sock = MockSocket([b'zzz:abc,'])116 sock = MockSocket([b'zzz:abc,'])
111 self.assertRaises(AssertionError, hook.netstring_recv, sock)117 self.assertRaisesRegex(
118 ValueError, "Invalid response: zzz:abc,",
119 hook.netstring_recv, sock)
112120
113 def test_short(self):121 def test_short(self):
114 sock = MockSocket([b'4:abc,'])122 sock = MockSocket([b'4:abc,'])
115 self.assertRaises(AssertionError, hook.netstring_recv, sock)123 self.assertRaisesRegex(
124 ValueError, "Length error for message 'abc,': ending=''",
125 hook.netstring_recv, sock)
116126
117 def test_unterminated(self):127 def test_unterminated(self):
118 sock = MockSocket([b'4:abcd'])128 sock = MockSocket([b'4:abcd'])
119 self.assertRaises(AssertionError, hook.netstring_recv, sock)129 self.assertRaisesRegex(
130 ValueError, "Length error for message 'abcd': ending=''",
131 hook.netstring_recv, sock)
120132
121 def test_split_data(self):133 def test_split_data(self):
122 sock = MockSocket([b'12:abcd', b'efgh', b'ijkl,'])134 sock = MockSocket([b'12:abcd', b'efgh', b'ijkl,'])
@@ -446,3 +458,38 @@ class TestDeterminePermissions(TestCase):
446 output = hook.determine_permissions_outcome(458 output = hook.determine_permissions_outcome(
447 pygit2.GIT_OID_HEX_ZERO, 'ref', {'ref': ['force_push']})459 pygit2.GIT_OID_HEX_ZERO, 'ref', {'ref': ['force_push']})
448 self.assertEqual(b"You do not have permission to create ref.", output)460 self.assertEqual(b"You do not have permission to create ref.", output)
461
462
463class TestSplitRefPathsCalls(TestCase):
464 def test_split_list(self):
465 self.assertEqual(
466 split_list(range(10), 3), [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]])
467
468 @mock.patch('turnip.pack.hooks.hook.rpc_invoke')
469 def test_rcp_calls_are_paginated(self, rpc_invoke):
470 encode = lambda x: six.ensure_text(base64.b64encode(x))
471 rpc_invoke.side_effect = [
472 {encode(b'master'): [], encode(b'develop'): []},
473 {encode(b'head'): []}
474 ]
475 sock = mock.Mock()
476 # Call it with page size = 2
477 result = check_ref_permissions(
478 sock, "rpc-key", [b"master", b"develop", b"head"], 2)
479
480 # The final result should have been joined into.
481 self.assertEqual(
482 result, {b"master": [], b"develop": [], b"head": []})
483
484 # Check that it called correctly the rpc_invoke method.
485 self.assertEqual(rpc_invoke.call_args_list, [
486 mock.call(
487 sock, 'check_ref_permissions', {
488 'key': 'rpc-key', 'paths': [
489 encode(b"master"), encode(b"develop")]
490 }),
491 mock.call(
492 sock, 'check_ref_permissions', {
493 'key': 'rpc-key', 'paths': [encode(b"head")]
494 }),
495 ])

Subscribers

People subscribed via source and target branches