Code review comment for ~pappacena/turnip:paginated-check-refs-permissions

Revision history for this message
Colin Watson (cjwatson) wrote :

(Please attach bugs to your merge proposals where appropriate; it's handy to have things interlinked. The method described in https://help.launchpad.net/Code/Git#Linking_to_bugs is easiest, but you can also link bugs from the merge proposal web UI after the fact.)

I'm not convinced that this is solving the right problem here. My instinct is that checking a couple of thousand refs shouldn't cause a timeout on the appserver side. Now, my instinct could be wrong, but if so we ought to be able to find an OOPS corresponding to the timeout; but when I went hunting for Andy's problem on 2020-05-26, I could find the push attempt (which took over half an hour), but it looked as though the checkRefPermissions request never made it as far as the appserver. That suggests that the problem is a somewhat different one.

So, I had a look at how the netstring protocol is implemented between turnip.pack.hooks.hook and turnip.pack.hookrpc. I found two separate problems:

 * netstring_send calls sock.send, but it doesn't handle the case of short sends. sock.send is perfectly entitled to send fewer bytes than you asked it to send, and in that case it expects you to call it again to attempt delivery of the remaining data. This could account for the problem all by itself: we'd see the described symptoms if sock.send only sent part of the netstring and then hookrpc waited forever (or at least until a timeout somewhere) waiting to read the rest of it.

 * hookrpc uses twisted.protocols.basic.NetstringReceiver, which imposes a maximum length of 99999 on netstrings. I think we should consider either removing this limit entirely (perhaps by overriding _checkStringSize) or setting it to something a couple of orders of magnitude larger. Our netstring receiver isn't on a significant security boundary, and we've already read the full list of ref paths into memory anyway by the time we get to this point. 99999 bytes may well not be enough for the JSON encoding of 2000 ref paths.

I also spotted one problem in the implementation of GitRepository.checkRefPermissions in Launchpad: it doesn't preload grantees of GitGranteeType.PERSON, so if there are many matching rules with grants for different teams of which the user is a member, it will incur on the order of one query per relevant grantee. This is bounded above by the number of teams of which the user is a member, so it's unlikely to be a timeout-level problem in practice, but it might be worth cleaning up.

I have a strong suspicion that just fixing the two netstring-related bugs will solve the problem without needing to do any complicated pagination, and I think we should try that first. If we do need to resort to pagination, then the page size can almost certainly be very much larger: I'd start at more like 250 than 25, and I wouldn't be surprised if we could go quite a bit higher still. With larger page sizes, there'll be many fewer round trips involved, and there should be no need to resort to the considerable extra complexity of parallel checks.

review: Needs Information

« Back to merge proposal