Merge lp:~jameinel/launchpad/lp-service into lp:launchpad/db-devel
| Status: | Merged |
|---|---|
| Approved by: | Michael Hudson-Doyle on 2010-10-08 |
| Approved revision: | no longer in the source branch. |
| Merged at revision: | 9914 |
| Proposed branch: | lp:~jameinel/launchpad/lp-service |
| Merge into: | lp:launchpad/db-devel |
| Diff against target: |
2134 lines (+1776/-86) 10 files modified
Makefile (+2/-2) bzrplugins/lpserve/__init__.py (+737/-4) bzrplugins/lpserve/test_lpserve.py (+534/-0) configs/development/launchpad-lazr.conf (+1/-0) lib/canonical/config/schema-lazr.conf (+12/-0) lib/canonical/launchpad/scripts/runlaunchpad.py (+47/-0) lib/lp/codehosting/sshserver/session.py (+299/-12) lib/lp/codehosting/sshserver/tests/test_session.py (+73/-0) lib/lp/codehosting/tests/test_acceptance.py (+61/-0) lib/lp/codehosting/tests/test_lpserve.py (+10/-68) |
| To merge this branch: | bzr merge lp:~jameinel/launchpad/lp-service |
| Related bugs: |
| Reviewer | Review Type | Date Requested | Status |
|---|---|---|---|
| Michael Hudson-Doyle | 2010-10-05 | Approve on 2010-10-18 | |
| Gavin Panella (community) | 2010-10-04 | Abstain on 2010-10-07 | |
| Jonathan Lange | 2010-10-05 | Pending | |
|
Review via email:
|
|||
Commit Message
Implement LaunchpadForkin
Description of the Change
Retargetted from https:/
because I accidentally started from db-devel in the beginning. This patch includes all suggestions from the original submission.
The goal of this submission is to improve the time for "bzr+ssh" to connect and be useful. The new method must be activated by setting:
[codehosting]
use_forking_
It is set to be enabled in "development" mode, but the default is still disabled. I can't give a recommendation for the production config, because the branch is private.
This implements a new service (LaunchpadForki
The benefit is seen with: time echo hello | ssh localhost bzr serve --inet ...
Without this patch, it is 2.5s to serve on the loopback. With this patch, it is 0.25s.
I'm very happy to work with someone to smooth out the finer points of this submission. I tried to be explicit about places in the code that I had a decision to make, and why I chose the method I did.
I didn't use the FeatureFlag system, because setting it up via the config file was a lot more straightforward. (globally enabling/disabling the functionality).
As near as I can tell, there should be no impact of rolling this code out on production, if use_forking_daemon: False.
(The make run_codehosting will not actually spawn the daemon, and the twisted Conch code just gets an 'if ' check to see that it should use the old code path.)
I haven't run the full test suite, but I have:
1) Run all of the locally relevant tests "bzr selftest -s bt.lp_serve' and 'bin/test
lp.
2) Manually started the sftp service and run commands through the local instance. Both with
and without the forking service enabled. (And I have confirmed that it isn't used when
disabled, and is used when enabled, etc.)
| Michael Hudson-Doyle (mwhudson) wrote : | # |
| Michael Hudson-Doyle (mwhudson) wrote : | # |
Wow, what a branch. I've been quite picky in this, that's not because
I hate you <wink> but because it's subtle code and has potential to
cause issues, so I wanted to make sure all the comments make sense and
there's no redundant code, etc.
I worry slightly about what would happen if the forking service fell
over -- but clearly, we rely on the sftp server itself staying up, so
this probably isn't such a big deal. I guess we also need to make
sure the forking service is running in production.
All that said, I have no real problems with the code and look forward
to getting my branches onto and off of Launchpad that bit faster!
I'm marking this needs information because I want to see a response to my
comments. But I don't think any of them are deep.
> === modified file 'Makefile'
> --- Makefile 2010-09-07 18:15:01 +0000
> +++ Makefile 2010-10-06 00:19:31 +0000
> @@ -253,7 +253,7 @@
>
> run_codehosting: check_schema inplace stop
> $(RM) thread*.request
> - bin/run -r librarian,
> + bin/run -r librarian,
>
>
> start_librarian: compile
You should add the forker service to the run_all target too.
> === added directory 'bzrplugins/
> === renamed file 'bzrplugins/
> --- bzrplugins/
> +++ bzrplugins/
> @@ -8,15 +8,33 @@
>
> __metaclass__ = type
>
> -__all__ = ['cmd_launchpad
> -
> -
> +__all__ = ['cmd_launchpad
> + 'cmd_launchpad_
> + ]
This shoudl be formatted like this:
__all__ = [
'cmd_
'cmd_
]
> +
> +
> +import errno
> +import os
> import resource
> +import shlex
> +import shutil
> +import signal
> +import socket
> import sys
> +import tempfile
> +import threading
> +import time
>
> from bzrlib.commands import Command, register_command
> from bzrlib.option import Option
> -from bzrlib import lockdir, ui
> +from bzrlib import (
> + commands,
> + errors,
> + lockdir,
> + osutils,
> + trace,
> + ui,
> + )
>
> from bzrlib.smart import medium, server
> from bzrlib.transport import get_transport
> @@ -110,3 +128,724 @@
>
>
> register_
> +
> +
> +class LPForkingServic
> + """A service that can be asked to start a new bzr subprocess via fork.
> +
> + The basic idea is that python startup is very expensive. For example, the
> + original 'lp-serve' command could take 2.5s just to start up, before any
> + actual actions could be performed.
Well, it's not really Python startup, its more the importing thats
slow. Maybe phrase this as "starting Python and importing Launchpad
is very...".
> + This class provides a service sitting on a socket, which can then be
> + requested to fork and run a given bzr command.
> +
> + Clients connect to the socket and make a simple request, which then
> + receives a response. The possible requests are:
> +
> + "hello\n": Trigger a heartbeat to report that the program is still
> + runnin...
| Martin Pool (mbp) wrote : | # |
notes from talking this over with spm:
* at the moment you can see in the process listing which branch is
being accessed; it would be very nice to keep that; but at least we
should log the pid/user/branch
* can run this on staging, then edge, then lpnet
* propose changes into lp:lp-production-configs to gradually turn it on
* might want an rt describing how to monitor the new daemon process,
which just needs to mention the socket location and the hello command;
they can write the script from there
* francis should formally tell tom and charlie we'll be making this change
| Robert Collins (lifeless) wrote : | # |
On Wed, Oct 6, 2010 at 9:43 PM, Martin Pool <email address hidden> wrote:
> * can run this on staging, then edge, then lpnet
There is no edge.
> * francis should formally tell tom and charlie we'll be making this change
The rt describing should mention this; it will get enough airplay ;)
We make architectural changes -all- the time, there's no need to get
particularly cautious about this one.
| Jonathan Lange (jml) wrote : | # |
On Wed, Oct 6, 2010 at 9:52 AM, Robert Collins
<email address hidden> wrote:
> On Wed, Oct 6, 2010 at 9:43 PM, Martin Pool <email address hidden> wrote:
>
>> * can run this on staging, then edge, then lpnet
>
> There is no edge.
Yeah there is.
$ ssh bazaar.
No shells on this server.
Connection to bazaar.
jml
| Robert Collins (lifeless) wrote : | # |
It will be gone extremely soon. I want people to stop thinking of
'edge' as a way to test things.
| Michael Hudson-Doyle (mwhudson) wrote : | # |
On Wed, 06 Oct 2010 08:43:44 -0000, Martin Pool <email address hidden> wrote:
> notes from talking this over with spm:
>
> * at the moment you can see in the process listing which branch is
> being accessed; it would be very nice to keep that; but at least we
> should log the pid/user/branch
This isn't affected by this branch, although the process name ends up
being a bit unwieldy:
21266 pts/1 S+ 0:00 /usr/bin/python /home/mwh/
Maybe inserting a call to setproctitle in become_child would be useful,
especially as we have the dependency already -- one piece of information
we've lost is the user who has connected, although we can probably
figure that out from logs still.
> * can run this on staging, then edge, then lpnet
> * propose changes into lp:lp-production-configs to gradually turn it on
> * might want an rt describing how to monitor the new daemon process,
> which just needs to mention the socket location and the hello command;
> they can write the script from there
> * francis should formally tell tom and charlie we'll be making this change
I still vaguely wonder about controlling this with a feature flag. The
thing is, the codehosting systems don't connect to the database
currently, and that's actually kind of nice in some ways. We could add
an XML-RPC method to query a flag, and have the ssh daemon check that
every 30s or so -- I don't think we want an XML-RPC call delaying the
launch of the lp-servce process on every connection.
Cheers,
mwh
| John A Meinel (jameinel) wrote : | # |
For starters, thanks for the thorough review. It is really nice to get feedback on it, and you have certainly thought a lot about the code.
On 10/6/2010 12:02 AM, Michael Hudson-Doyle wrote:
> Review: Needs Information
> Wow, what a branch. I've been quite picky in this, that's not because
> I hate you <wink> but because it's subtle code and has potential to
> cause issues, so I wanted to make sure all the comments make sense and
> there's no redundant code, etc.
>
> I worry slightly about what would happen if the forking service fell
> over -- but clearly, we rely on the sftp server itself staying up, so
> this probably isn't such a big deal. I guess we also need to make
> sure the forking service is running in production.
We could trap failures to connect to the forking server and fall back to regular spawnProcess. The code is already a simple if/else block. However, I wouldn't really want silent degradation of connection. If we want it, it could probably be done with:
transport = None
if config.
try:
transport = ...
except ???:
pass # We failed, we'll try again
if transport is None:
transport = reactor.
>
> All that said, I have no real problems with the code and look forward
> to getting my branches onto and off of Launchpad that bit faster!
>
> I'm marking this needs information because I want to see a response to my
> comments. But I don't think any of them are deep.
>
>> === modified file 'Makefile'
>> --- Makefile 2010-09-07 18:15:01 +0000
>> +++ Makefile 2010-10-06 00:19:31 +0000
>> @@ -253,7 +253,7 @@
>>
>> run_codehosting: check_schema inplace stop
>> $(RM) thread*.request
>> - bin/run -r librarian,
>> + bin/run -r librarian,
>>
>>
>> start_librarian: compile
>
> You should add the forker service to the run_all target too.
Done.
>
>> === added directory 'bzrplugins/
>> === renamed file 'bzrplugins/
>> --- bzrplugins/
>> +++ bzrplugins/
>> @@ -8,15 +8,33 @@
>>
>> __metaclass__ = type
>>
>> -__all__ = ['cmd_launchpad
>> -
>> -
>> +__all__ = ['cmd_launchpad
>> + 'cmd_launchpad_
>> + ]
>
> This shoudl be formatted like this:
>
> __all__ = [
> 'cmd_launchpad_
> 'cmd_launchpad_
> ]
check
...
>> +class LPForkingServic
>> + """A service that can be asked to start a new bzr subprocess via fork.
>> +
>> + The basic idea is that python startup is very expensive. For example, the
>> + original 'lp-serve' command could take 2.5s just to start up, before any
>> + actual actions could be performed.
>
> Well, it's not really Python startup, its more the importing thats
> slow. Maybe phrase this as "starting Python and importing Launchpad
> is very...".
>
>> + This class provides a service sitting on a socket, which can then be
>> + requested to fork and run a given bzr command.
>> +
>> + Clients connect to the socket and make a simp...
| Martin Pool (mbp) wrote : | # |
On 7 October 2010 07:15, Michael Hudson-Doyle <email address hidden> wrote:
> On Wed, 06 Oct 2010 08:43:44 -0000, Martin Pool <email address hidden> wrote:
>> notes from talking this over with spm:
>>
>> * at the moment you can see in the process listing which branch is
>> being accessed; it would be very nice to keep that; but at least we
>> should log the pid/user/branch
>
> This isn't affected by this branch, although the process name ends up
> being a bit unwieldy:
>
> 21266 pts/1 S+ 0:00 /usr/bin/python /home/mwh/
>
> Maybe inserting a call to setproctitle in become_child would be useful,
> especially as we have the dependency already -- one piece of information
> we've lost is the user who has connected, although we can probably
> figure that out from logs still.
Without the call to setproctitle, we would be losing this thing spm
said he liked, which is to immediately show which user and branch it
corresponds to.
To my mind it's more important to log the username, branch and pid, so
that you can see who it was both while the process is running and
afterwards. It's also nice that jam logs the rusage. But putting
them into the argv is more immediately accessible.
>> * can run this on staging, then edge, then lpnet
>> * propose changes into lp:lp-production-configs to gradually turn it on
>> * might want an rt describing how to monitor the new daemon process,
>> which just needs to mention the socket location and the hello command;
>> they can write the script from there
>> * francis should formally tell tom and charlie we'll be making this change
>
> I still vaguely wonder about controlling this with a feature flag. The
> thing is, the codehosting systems don't connect to the database
> currently, and that's actually kind of nice in some ways. We could add
> an XML-RPC method to query a flag, and have the ssh daemon check that
> every 30s or so -- I don't think we want an XML-RPC call delaying the
> launch of the lp-servce process on every connection.
Actually I was thinking of just having an http URL, accessible only
within the DC, that just presents all the feature flag rules in a
machine readable form: maybe json by default, perhaps others. There's
no need for xmlrpc when we just want to read, and avoiding startup
dependencies is good. Then we can have a FeatureRuleSource that gets
and parses this rather than talking to storm, and it will be fast to
start up for this type of thing; probably fast enough to run on every
request if we want to. bug 656031.
--
Martin
| Michael Hudson-Doyle (mwhudson) wrote : | # |
On Wed, 06 Oct 2010 22:34:40 -0000, Martin Pool <email address hidden> wrote:
> On 7 October 2010 07:15, Michael Hudson-Doyle <email address hidden> wrote:
> > On Wed, 06 Oct 2010 08:43:44 -0000, Martin Pool <email address hidden> wrote:
> >> notes from talking this over with spm:
> >>
> >> Â * at the moment you can see in the process listing which branch is
> >> being accessed; it would be very nice to keep that; but at least we
> >> should log the pid/user/branch
> >
> > This isn't affected by this branch, although the process name ends up
> > being a bit unwieldy:
> >
> > 21266 pts/1 Â Â S+ Â Â 0:00 /usr/bin/python /home/mwh/
> >
> > Maybe inserting a call to setproctitle in become_child would be useful,
> > especially as we have the dependency already -- one piece of information
> > we've lost is the user who has connected, although we can probably
> > figure that out from logs still.
>
> Without the call to setproctitle, we would be losing this thing spm
> said he liked, which is to immediately show which user and branch it
> corresponds to.
Uh? No, that's the bit we retain. The ps output I showed above was
from running with the branch being reviewed.
> To my mind it's more important to log the username, branch and pid, so
> that you can see who it was both while the process is running and
> afterwards. It's also nice that jam logs the rusage. But putting
> them into the argv is more immediately accessible.
I'm not sure if you're recommending this branch be changed before
landing here.
> >> Â * can run this on staging, then edge, then lpnet
> >> Â * propose changes into lp:lp-production-configs to gradually turn it on
> >> Â * might want an rt describing how to monitor the new daemon process,
> >> which just needs to mention the socket location and the hello command;
> >> they can write the script from there
> >> Â * francis should formally tell tom and charlie we'll be making this change
> >
> > I still vaguely wonder about controlling this with a feature flag. Â The
> > thing is, the codehosting systems don't connect to the database
> > currently, and that's actually kind of nice in some ways. Â We could add
> > an XML-RPC method to query a flag, and have the ssh daemon check that
> > every 30s or so -- I don't think we want an XML-RPC call delaying the
> > launch of the lp-servce process on every connection.
>
> Actually I was thinking of just having an http URL, accessible only
> within the DC, that just presents all the feature flag rules in a
> machine readable form: maybe json by default, perhaps others. There's
> no need for xmlrpc when we just want to read, and avoiding startup
> dependencies is good. Then we can have a FeatureRuleSource that gets
> and parses this rather than talking to storm, and it will be fast to
> start up for this type of thing; probably fast enough to run on every
> request if we want to. bug 656031.
That would be interesting. The sftp server being a twisted app, access
to this URL should really be nonblo...
| John A Meinel (jameinel) wrote : | # |
I do log the username and pid, I don't (at that time) have the branch, because we haven't opened it yet. If we changed "bzrlib.
| Martin Pool (mbp) wrote : | # |
On 7 October 2010 10:09, Michael Hudson-Doyle <email address hidden> wrote:
>> > 21266 pts/1 Â Â S+ Â Â 0:00 /usr/bin/python /home/mwh/
>> >
>> > Maybe inserting a call to setproctitle in become_child would be useful,
>> > especially as we have the dependency already -- one piece of information
>> > we've lost is the user who has connected, although we can probably
>> > figure that out from logs still.
>>
>> Without the call to setproctitle, we would be losing this thing spm
>> said he liked, which is to immediately show which user and branch it
>> corresponds to.
>
> Uh? No, that's the bit we retain. The ps output I showed above was
> from running with the branch being reviewed.
The username is not shown. I don't know if it's a big deal; this
branch is old enough already.
>
>> To my mind it's more important to log the username, branch and pid, so
>> that you can see who it was both while the process is running and
>> afterwards. It's also nice that jam logs the rusage. But putting
>> them into the argv is more immediately accessible.
>
> I'm not sure if you're recommending this branch be changed before
> landing here.
what john said.
> That would be interesting. The sftp server being a twisted app, access
> to this URL should really be nonblocking which always adds a wrinkle.
> Maybe that would be fine though, or maybe we can just cheat.
That's actually kind of a feature...
--
Martin
| Michael Hudson-Doyle (mwhudson) wrote : | # |
On Wed, 06 Oct 2010 20:39:01 -0000, John A Meinel <email address hidden> wrote:
> For starters, thanks for the thorough review. It is really nice to get feedback on it, and you have certainly thought a lot about the code.
>
> On 10/6/2010 12:02 AM, Michael Hudson-Doyle wrote:
> > Review: Needs Information
> > Wow, what a branch. I've been quite picky in this, that's not because
> > I hate you <wink> but because it's subtle code and has potential to
> > cause issues, so I wanted to make sure all the comments make sense and
> > there's no redundant code, etc.
> >
> > I worry slightly about what would happen if the forking service fell
> > over -- but clearly, we rely on the sftp server itself staying up, so
> > this probably isn't such a big deal. I guess we also need to make
> > sure the forking service is running in production.
>
>
> We could trap failures to connect to the forking server and fall back
> to regular spawnProcess. The code is already a simple if/else
> block. However, I wouldn't really want silent degradation of
> connection.
Yeah, I don't really like that idea either.
> >> + "fork <command>\n": Request a new subprocess to be started.
> >> + <command> is the bzr command to be run, such as "rocks" or
> >> + "lp-serve --inet 12".
> >> + The immediate response will be the path-on-disk to a directory full
> >> + of named pipes (fifos) that will be the stdout/stderr/stdin of the
> >> + new process.
> >
> > This doesn't quite make it clear what the names of the files will be.
> > The obvious guess is correct, but I'd rather be certain.
> >
> >> + If a client holds the socket open, when the child process exits,
> >> + the exit status (as given by 'wait()') will be written to the
> >> + socket.
> >
> > This appears to be out of date -- there's also a fork-env command now.
> > Is the fork command still useful?
>
> It has been useful in testing, it isn't used in the 'production' code.
If the fork command is still implemented, I think it should still be
documented...
> ...
>
>
> >
> >> + DEFAULT_PATH = '/var/run/
> >
> > I'm not sure of the value of providing a default here, as it seems
> > that its always in practice going to be overridden by the config. If
> > it makes testing easier, it's worth it I guess.
>
> Well, we override it for testing as well. I wanted to support "bzr
> lp-forking-service" as something you could run. Also, that seemed to
> be the 'best' place to put it, which gives hints as to where you would
> want to put it.
OK, let's leave it.
> >> + self._children_
> >> +
> >> + def _create_
> >> + self._server_socket = socket.
> >> + self._server_
> >> + if self._perms is not None:
> >> + os.chmod(
> >
> > The pedant in me thinks this is a bit racy. But I don't think it's
> > important (in fact, I wonder if the permission stuff is necessary
> > really).
>
> I'm sure it isn't necessary ...
| Robert Collins (lifeless) wrote : | # |
On feature flags... I'd suggest an xmlrpc/API for evaluating flags.
evaluating flags requires DB access for group membership of a user and
potentially other things like time limited trials, access to opt-in
features and so forth.
The exact API would need to take a user, possibly feature names, and
return a list of the rules whose scopes need to be locally resolved.
Whether this is json or not is immaterial to me - at least for this
patch, the feature evaluating code would load once.
It would be very interesting to do this though, because we could phase
this in per-user.
-Rob
| Martin Pool (mbp) wrote : | # |
rt 41791 for adding the monitoring service <https:/
| Gavin Panella (allenap) wrote : | # |
Looks like this is taken care of, so abstaining on behalf of the Launchpad code reviewers.
| Michael Hudson-Doyle (mwhudson) wrote : | # |
So I would like to see the code that communicates with the service more twisted-ized, but until it turns out to be an actual problem it's not worth the bother. Let's land this.
| Michael Hudson-Doyle (mwhudson) wrote : | # |
I think your latest changes are basically ok. Some of it feels a bit cargo culted though -- I don't see the reason for the lazy getForker method, just stash the forker on the instance normally? and the atexit calls aren't needed imho. If you inherited the forker helper class from fixture, you'd be able to use addCleanup, which would probably be a bit cleaner. I'm not sure the big TODO comment about paths is warranted -- it's all true, but much of Launchpad suffers this problem. If you can de-cargo-cult a bit, I'll land this asap.
| Robert Collins (lifeless) wrote : | # |
lets just land and iterate.
-Rob
| Michael Hudson-Doyle (mwhudson) wrote : | # |
On Mon, 18 Oct 2010 22:22:39 -0000, Robert Collins <email address hidden> wrote:
> lets just land and iterate.
Yeah fair enough. Throwing it at ec2 now.
Cheers,
mwh

Hi,
This basically looks great and works in local testing. I don't have time for a line-by-line review right now though :(
Cheers,
mwh