Ubuntu CI Engine

Code review comment for lp:~pwlars/uci-engine/versioned-imagebuild

versioned-imagebuild
Merge into trunk

Revision history for this message

Paul Larson (pwlars) wrote on 2014-07-22:

> I have several questions I don't have a good answer for yet:
>
> - what's the upgrade story for the webops ?
For a small enough deployment, I think the way we've always done upgrades would work. The risk here would be that in that window while upgrading, a ticket comes in and makes it all the way through to the imagebuilder before the upgrade is complete. This would require the fastest ppa build ever I guess, but we should still be careful of course. The situation we don't want to end up with is where the lander sends a request for an API version that NONE of the imagebuild workers (all one of them) support. For a more paranoid upgrade, see below.

> - what happens if an old worker receives a request for the new API (that one
> is the most annoying I think ;) ?
I think the idea is that to do the upgrade as carefully as possible, you upgrade things from the bottom up. Upgrade workers first. If they get a ticket for the previous version of the API, that's ok. Once they are all upgraded, you upgrade the lander. Also, this makes me realize that I think we might have much deeper problems in the upgrade don't we? What happens if I the upgrade happens while a component is in the middle of something? Does it get upgraded part-way through a ticket?

>
> - for how long do we keep old lander/workers running ?
I think as short a time as possible for now. Eventually we might want to explore bigger testing scenarios where we have some deployed with the new version and some with the old and ease into the upgrade as we see tickets working. I think we need a lot of things in place before we can really do that though.
>
> - how can we make the "right" workers talk to each other ?
That, I don't know. I've not seen a good solution with rabbit to do this. Best I can suggest is perhaps we should reject the request somehow by sending a message back to the lander, and the lander retries the request, maybe after a delay? It's imprecise, but would hopefully result in either trying a different worker or retrying the same worker after it upgrades properly. Also, if we're really sure the workers get upgraded before the lander, then they are all the "right" one to send the request to.
>
> - do we have different kind of API migrations ?
Not sure what you mean here.
>
> - can we try to sort out the simplest first ? ;-)
This MP seemed like as good a place to start as any.
>
> - is this MP the right place to ask ?
Probably not, I think we should have a broader discussion about this.
>
> None of the above have to be answered right now nor should they block that
> experiment ;)
>
> But a first round of answers can help define the next steps.

> I have several questions I don't have a good answer for yet:
> 
> - what's the upgrade story for the webops ?
For a small enough deployment, I think the way we've always done upgrades would work. The risk here would be that in that window while upgrading, a ticket comes in and makes it all the way through to the imagebuilder before the upgrade is complete. This would require the fastest ppa build ever I guess, but we should still be careful of course. The situation we don't want to end up with is where the lander sends a request for an API version that NONE of the imagebuild workers (all one of them) support. For a more paranoid upgrade, see below.

> - what happens if an old worker receives a request for the new API (that one
> is the most annoying I think ;) ?
I think the idea is that to do the upgrade as carefully as possible, you upgrade things from the bottom up. Upgrade workers first. If they get a ticket for the previous version of the API, that's ok. Once they are all upgraded, you upgrade the lander.  Also, this makes me realize that I think we might have much deeper problems in the upgrade don't we? What happens if I the upgrade happens while a component is in the middle of something? Does it get upgraded part-way through a ticket?

> 
> - for how long do we keep old lander/workers running ?
I think as short a time as possible for now. Eventually we might want to explore bigger testing scenarios where we have some deployed with the new version and some with the old and ease into the upgrade as we see tickets working.  I think we need a lot of things in place before we can really do that though.
> 
> - how can we make the "right" workers talk to each other ?
That, I don't know. I've not seen a good solution with rabbit to do this.  Best I can suggest is perhaps we should reject the request somehow by sending a message back to the lander, and the lander retries the request, maybe after a delay?  It's imprecise, but would hopefully result in either trying a different worker or retrying the same worker after it upgrades properly.  Also, if we're really sure the workers get upgraded before the lander, then they are all the "right" one to send the request to.
> 
> - do we have different kind of API migrations ?
Not sure what you mean here.
> 
> - can we try to sort out the simplest first ? ;-)
This MP seemed like as good a place to start as any.
> 
> - is this MP the right place to ask ?
Probably not, I think we should have a broader discussion about this.
> 
> None of the above have to be answered right now nor should they block that
> experiment ;)
> 
> But a first round of answers can help define the next steps.

review: Needs Resubmitting

« Back to merge proposal