Merge ~pappacena/turnip:paginated-check-refs-permissions into turnip:master

Proposed by Thiago F. Pappacena
Status: Rejected
Rejected by: Thiago F. Pappacena
Proposed branch: ~pappacena/turnip:paginated-check-refs-permissions
Merge into: turnip:master
Diff against target: 172 lines (+77/-12)
4 files modified
requirements.txt (+1/-0)
setup.py (+1/-0)
turnip/pack/hooks/hook.py (+25/-9)
turnip/pack/tests/test_hooks.py (+50/-3)
Reviewer Review Type Date Requested Status
Colin Watson (community) Needs Information
Review via email: mp+384593@code.launchpad.net

Commit message

Adding better error reporting and paginating the requests to check_ref_permissions on hooks, to avoid connection timeout when a user git pushes dozens of refs at once.

Description of the change

Ideally, when a user pushes a lot of branches at once, we should parallelize (in a thread pool executor, for example) the permissions checks in batches (instead of checking one batch after another).

But such change would require a bit more refactoring to create different sockets, so this goes in a future MP.

To post a comment you must log in.
Revision history for this message
Thiago F. Pappacena (pappacena) wrote :

This branch uses mock package for python 2. So, it depends on https://code.launchpad.net/~pappacena/turnip/+git/turnip-dependencies/+merge/384592.

Revision history for this message
Colin Watson (cjwatson) wrote :

(Please attach bugs to your merge proposals where appropriate; it's handy to have things interlinked. The method described in https://help.launchpad.net/Code/Git#Linking_to_bugs is easiest, but you can also link bugs from the merge proposal web UI after the fact.)

I'm not convinced that this is solving the right problem here. My instinct is that checking a couple of thousand refs shouldn't cause a timeout on the appserver side. Now, my instinct could be wrong, but if so we ought to be able to find an OOPS corresponding to the timeout; but when I went hunting for Andy's problem on 2020-05-26, I could find the push attempt (which took over half an hour), but it looked as though the checkRefPermissions request never made it as far as the appserver. That suggests that the problem is a somewhat different one.

So, I had a look at how the netstring protocol is implemented between turnip.pack.hooks.hook and turnip.pack.hookrpc. I found two separate problems:

 * netstring_send calls sock.send, but it doesn't handle the case of short sends. sock.send is perfectly entitled to send fewer bytes than you asked it to send, and in that case it expects you to call it again to attempt delivery of the remaining data. This could account for the problem all by itself: we'd see the described symptoms if sock.send only sent part of the netstring and then hookrpc waited forever (or at least until a timeout somewhere) waiting to read the rest of it.

 * hookrpc uses twisted.protocols.basic.NetstringReceiver, which imposes a maximum length of 99999 on netstrings. I think we should consider either removing this limit entirely (perhaps by overriding _checkStringSize) or setting it to something a couple of orders of magnitude larger. Our netstring receiver isn't on a significant security boundary, and we've already read the full list of ref paths into memory anyway by the time we get to this point. 99999 bytes may well not be enough for the JSON encoding of 2000 ref paths.

I also spotted one problem in the implementation of GitRepository.checkRefPermissions in Launchpad: it doesn't preload grantees of GitGranteeType.PERSON, so if there are many matching rules with grants for different teams of which the user is a member, it will incur on the order of one query per relevant grantee. This is bounded above by the number of teams of which the user is a member, so it's unlikely to be a timeout-level problem in practice, but it might be worth cleaning up.

I have a strong suspicion that just fixing the two netstring-related bugs will solve the problem without needing to do any complicated pagination, and I think we should try that first. If we do need to resort to pagination, then the page size can almost certainly be very much larger: I'd start at more like 250 than 25, and I wouldn't be surprised if we could go quite a bit higher still. With larger page sizes, there'll be many fewer round trips involved, and there should be no need to resort to the considerable extra complexity of parallel checks.

review: Needs Information
Revision history for this message
Thiago F. Pappacena (pappacena) wrote :

Hi, cjwatson.

- I'll check both problems you've mentioned on netstring, and open separated MPs for that. I didn't spot those problems when checking, and thanks for pointing that out.

- GitRepository.checkRefPermissions would be my next step, to see query count and other common performance problems. I'll check that too.

- I was planning to adjust the page size after benchmarking a bit better. Yesterday night I've run some quick tests locally, and already realized that 25 was way too small (before your comment, I had already pushed a change to make it 100). The problem is that the reported bug was trying to push 2.000 branches.

- The pagination and parallelization was actually not that complicated. Apart from the changes on where we do the socket creation, make it run in parallel was a change of about 15 lines of code, for example. But I agree that most of our problems are probably somewhere else.

Revision history for this message
Colin Watson (cjwatson) wrote :

It's true that it's not a completely unreasonable amount of complexity. But if we can avoid it in favour of a couple of more surgical and relatively simple changes, then I'd like to try those first. :-)

Unmerged commits

472112b... by Thiago F. Pappacena

Fixing comment

545ac8a... by Thiago F. Pappacena

Merge branch 'paginated-check-refs-permissions' of git+ssh://git.launchpad.net/~pappacena/turnip into paginated-check-refs-permissions

cfa68f9... by Thiago F. Pappacena

Paginating check_ref_permissions RPC calls to avoid timeouts

44036eb... by Thiago F. Pappacena

Paginating check_ref_permissions RPC calls to avoid timeouts

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/requirements.txt b/requirements.txt
2index a298c0a..cb86dfe 100644
3--- a/requirements.txt
4+++ b/requirements.txt
5@@ -26,6 +26,7 @@ linecache2==1.0.0
6 m2r==0.1.14
7 mccabe==0.3
8 mistune==0.8.3
9+mock==3.0.5
10 Paste==2.0.2
11 PasteDeploy==2.1.0
12 pbr==5.4.4
13diff --git a/setup.py b/setup.py
14index cd8f4fb..b5821f7 100755
15--- a/setup.py
16+++ b/setup.py
17@@ -34,6 +34,7 @@ test_requires = [
18 'docutils',
19 'fixtures',
20 'flake8',
21+ 'mock',
22 'testtools',
23 'webtest',
24 ]
25diff --git a/turnip/pack/hooks/hook.py b/turnip/pack/hooks/hook.py
26index e135395..2969e21 100755
27--- a/turnip/pack/hooks/hook.py
28+++ b/turnip/pack/hooks/hook.py
29@@ -113,14 +113,20 @@ def netstring_recv(sock):
30 c = sock.recv(1)
31 lengthstr = b''
32 while c != b':':
33- assert c.isdigit()
34+ if not c.isdigit():
35+ raise ValueError(
36+ "Invalid response: %s" % (six.ensure_text(c + sock.recv(256))))
37 lengthstr += c
38 c = sock.recv(1)
39 length = int(lengthstr)
40 s = bytearray()
41 while len(s) < length:
42 s += sock.recv(length - len(s))
43- assert sock.recv(1) == b','
44+ ending = sock.recv(1)
45+ if ending != b',':
46+ raise ValueError(
47+ "Length error for message '%s': ending='%s'" %
48+ (six.ensure_text(bytes(s)), six.ensure_text(ending)))
49 return bytes(s)
50
51
52@@ -135,14 +141,24 @@ def rpc_invoke(sock, method, args):
53 return res['result']
54
55
56-def check_ref_permissions(sock, rpc_key, ref_paths):
57+def split_list(lst, n):
58+ """Splits the given list into chunks of size n."""
59+ lst = list(lst)
60+ return [lst[i:i + n] for i in range(0, len(lst), n)]
61+
62+
63+def check_ref_permissions(sock, rpc_key, ref_paths, page_size=100):
64 ref_paths = [base64.b64encode(path).decode('UTF-8') for path in ref_paths]
65- rule_lines = rpc_invoke(
66- sock, 'check_ref_permissions',
67- {'key': rpc_key, 'paths': ref_paths})
68- return {
69- base64.b64decode(path.encode('UTF-8')): permissions
70- for path, permissions in rule_lines.items()}
71+ permissions = {}
72+ # Paginate the rpc calls to avoid timeouts.
73+ for ref_paths_chunk in split_list(ref_paths, page_size):
74+ rule_lines = rpc_invoke(
75+ sock, 'check_ref_permissions',
76+ {'key': rpc_key, 'paths': ref_paths_chunk})
77+ permissions.update({
78+ base64.b64decode(path.encode('UTF-8')): permissions
79+ for path, permissions in rule_lines.items()})
80+ return permissions
81
82
83 def send_mp_url(received_line):
84diff --git a/turnip/pack/tests/test_hooks.py b/turnip/pack/tests/test_hooks.py
85index 449b9f9..b383d6b 100644
86--- a/turnip/pack/tests/test_hooks.py
87+++ b/turnip/pack/tests/test_hooks.py
88@@ -17,7 +17,12 @@ from fixtures import (
89 MonkeyPatch,
90 TempDir,
91 )
92+try:
93+ from unittest import mock
94+except ImportError:
95+ import mock
96 import pygit2
97+import six
98 from testtools import TestCase
99 from testtools.deferredruntest import AsynchronousDeferredRunTest
100 from twisted.internet import (
101@@ -29,6 +34,7 @@ from twisted.internet import (
102 from turnip.pack import hookrpc
103 from turnip.pack.helpers import ensure_hooks
104 from turnip.pack.hooks import hook
105+from turnip.pack.hooks.hook import split_list, check_ref_permissions
106
107
108 class HookProcessProtocol(protocol.ProcessProtocol):
109@@ -108,15 +114,21 @@ class TestNetstringRecv(TestCase):
110
111 def test_nondigit(self):
112 sock = MockSocket([b'zzz:abc,'])
113- self.assertRaises(AssertionError, hook.netstring_recv, sock)
114+ self.assertRaisesRegex(
115+ ValueError, "Invalid response: zzz:abc,",
116+ hook.netstring_recv, sock)
117
118 def test_short(self):
119 sock = MockSocket([b'4:abc,'])
120- self.assertRaises(AssertionError, hook.netstring_recv, sock)
121+ self.assertRaisesRegex(
122+ ValueError, "Length error for message 'abc,': ending=''",
123+ hook.netstring_recv, sock)
124
125 def test_unterminated(self):
126 sock = MockSocket([b'4:abcd'])
127- self.assertRaises(AssertionError, hook.netstring_recv, sock)
128+ self.assertRaisesRegex(
129+ ValueError, "Length error for message 'abcd': ending=''",
130+ hook.netstring_recv, sock)
131
132 def test_split_data(self):
133 sock = MockSocket([b'12:abcd', b'efgh', b'ijkl,'])
134@@ -446,3 +458,38 @@ class TestDeterminePermissions(TestCase):
135 output = hook.determine_permissions_outcome(
136 pygit2.GIT_OID_HEX_ZERO, 'ref', {'ref': ['force_push']})
137 self.assertEqual(b"You do not have permission to create ref.", output)
138+
139+
140+class TestSplitRefPathsCalls(TestCase):
141+ def test_split_list(self):
142+ self.assertEqual(
143+ split_list(range(10), 3), [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]])
144+
145+ @mock.patch('turnip.pack.hooks.hook.rpc_invoke')
146+ def test_rcp_calls_are_paginated(self, rpc_invoke):
147+ encode = lambda x: six.ensure_text(base64.b64encode(x))
148+ rpc_invoke.side_effect = [
149+ {encode(b'master'): [], encode(b'develop'): []},
150+ {encode(b'head'): []}
151+ ]
152+ sock = mock.Mock()
153+ # Call it with page size = 2
154+ result = check_ref_permissions(
155+ sock, "rpc-key", [b"master", b"develop", b"head"], 2)
156+
157+ # The final result should have been joined into.
158+ self.assertEqual(
159+ result, {b"master": [], b"develop": [], b"head": []})
160+
161+ # Check that it called correctly the rpc_invoke method.
162+ self.assertEqual(rpc_invoke.call_args_list, [
163+ mock.call(
164+ sock, 'check_ref_permissions', {
165+ 'key': 'rpc-key', 'paths': [
166+ encode(b"master"), encode(b"develop")]
167+ }),
168+ mock.call(
169+ sock, 'check_ref_permissions', {
170+ 'key': 'rpc-key', 'paths': [encode(b"head")]
171+ }),
172+ ])

Subscribers

People subscribed via source and target branches