Merge ~ilasc/turnip:stats-collection-point into turnip:master
Status: | Rejected |
---|---|
Rejected by: | Ioana Lasc |
Proposed branch: | ~ilasc/turnip:stats-collection-point |
Merge into: | turnip:master |
Diff against target: |
108 lines (+62/-1) 2 files modified
turnip/pack/git.py (+40/-1) turnip/pack/tests/test_functional.py (+22/-0) |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Colin Watson (community) | Needs Resubmitting | ||
Review via email: mp+391365@code.launchpad.net |
Commit message
Add metrics collection points to GitProcessProtocol
Description of the change
This MP uses print statements to show points in the code where we could potentially collect stats to be sent to statsd. The purpose of the MP is to give us a better starting point in discussing how and where is best to collect stats: in the PackBackendProtocol over pipes, inside the GitProcessProtocol itself, in a separate thread or otherwise.
So far I've tried several approaches:
1. wrap "default_
2. write to GitProcessProtocol from the PackBackendProtocol and back
3. add additional fd to "default_
4. from the failure of option 3 I thought if I could introduce another layer of twisted.
5. spawn a "self written" separate thread to monitor the GitProcessProtocol process from outside
Essentially all 4 major approaches failed by breaking existing PBP <-> Git pipes or by not being able to pass metrics data through, I didn't pursue the 5th as I understand we do not want to go down the multithreading route.
The current proposal uses the GitProcessProtocol itself as a stats collection point.Running the "test_a_
turnip.
PID 4545 rss with psutil: 0 MB
PID 4545 maxrss with getrusage: 88 MB
PID 4545 cpu utilization as a percentage 0.0
PID 4545 hostname: turnip-bionic
PID 4545 snapshot CPU times: pcputimes(user=0.0, system=0.0, children_user=0.0, children_
PID 4545 execution time: 0.856162071228
PID 4545 command: git-upload-pack
PID 4545 repository: /test
PID 4551 rss with psutil: 72 MB
PID 4551 maxrss with getrusage: 88 MB
PID 4551 cpu utilization as a percentage 0.0
PID 4551 hostname: turnip-bionic
PID 4551 snapshot CPU times: pcputimes(user=0.0, system=0.0, children_user=0.0, children_
PID 4551 execution time: 0.970222949982
PID 4551 command: git-receive-pack
PID 4551 repository: /test
ok
You were somewhat close with your option 3, but you had the extra file descriptor backwards. I'm expecting something more along the lines of this:
https:/ /paste. ubuntu. com/p/xB8TshVHj 6/
Things to note:
* the supervising process protocol arrangement remains essentially the same, with only the addition of handling an extra pipe for resource usage data; RUSAGE_ CHILDREN) to collect *only* resource usage information about the git child process (and any descendant processes); erredRunTestFor BrokenTwisted to spin the reactor another couple of times at the end of the test; this isn't ideal, but I wouldn't particularly mind it ending up on master;
* we interpose an extra process between turnip-pack-backend and git, whose job is to collect resource usage information from the child and write it to a dedicated pipe
* because we have an extra *process* (not a thread or something in-process in Twisted), we can reliably use getrusage(
* the data format for sending resource usage information back to turnip-pack-backend is private and can be whatever you like as long as it can be parsed reliably; I just used JSON since that was trivial to implement;
* I had some trouble with the functional tests due to lots of UncleanReactorError exceptions; this looks like some subtle pre-existing race condition rather than something I've introduced, since it had a bad habit of going away when I tried to investigate using strace, so I just whacked it with AsynchronousDef
* I haven't implemented the statsd interaction, nor worked out how to label the metrics with information about the repository in question; that's important to do, but I stopped once I was satisfied that I had the process plumbing essentially right, and you should be able to take it from here.