Merge lp:~abentley/launchpad/builder-limits into lp:launchpad
Status: | Merged | ||||
---|---|---|---|---|---|
Approved by: | Данило Шеган | ||||
Approved revision: | no longer in the source branch. | ||||
Merged at revision: | 11943 | ||||
Proposed branch: | lp:~abentley/launchpad/builder-limits | ||||
Merge into: | lp:launchpad | ||||
Diff against target: |
19 lines (+2/-0) 1 file modified
lib/canonical/buildd/buildrecipe (+2/-0) |
||||
To merge this branch: | bzr merge lp:~abentley/launchpad/builder-limits | ||||
Related bugs: |
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Данило Шеган (community) | Approve | ||
Review via email: mp+41211@code.launchpad.net |
Commit message
Memory-limit recipe builds.
Description of the change
= Summary =
Fix bug #676657: recipe builds can use too much memory
== Proposed fix ==
Restrict virtual memory use by a recipe build to 1 GB. This will allow some
swapping, but not excessive swapping.
== Pre-implementation notes ==
None
== Implementation details ==
There are really two problems:
1. Recipe builds use too much memory.
2. The build farm behaves badly when builds use too much memory.
Both issues should be addressed. This change addresses 2 by killing builds
that use excessive amounts of memory before they can cause real harm. (The
builders only have 1GB of memory on average.)
== Tests ==
None
== Demo and Q/A ==
Create a recipe using qtwebkit.
See https:/
Request a build of the recipe. It should die with a memory error.
= Launchpad lint =
Checking for conflicts and issues in changed files.
Linting changed files:
lib/canonical
For reference. It'd be good to keep an eye on https:/ /lpstats. canonical. com/graphs/ CodeRecipeBuild sDailyStatusCou nts after deployment.
<danilos> abentley, isn't addressable memory used for other things than just actual memory? for instance, mmap files can take up a lot of AS /code.dogfood. launchpad. net/~abentley/ +recipe/ test/+build/ 4803 took 1 hour 28 minutes to fail, so I think there is lots of breathing room. /wiki.canonical .com/IncidentRe ports/2010- 11-17-buildd- manager- disabling- builders
abentley, (I don't know much about RLIMIT_AS, so I am just wondering)
<abentley> danilos: that is interesting, but we don't generally mmap things in bzr, and 1 GB is still huge. Our example https:/
<danilos> abentley, sure, I can see this is only a limit for recipe builders, but I wonder what'd happen if somebody tried to do things like language pack builds where source package itself is a few hundred megs (probably just like qtwebkit)
<abentley> danilos: Also, our python doesn't provide RLIMIT_VMEM, so we don't have a lot of choice.
<danilos> abentley, (though, this is unrelated to my first comment: I guess we want to fail early, so perhaps it's good anyway)
abentley, ok, it sounds good, but I wonder if we have a way to find out if we have been too aggressive
abentley, would we just track 'too many builds are getting killed because of this' or should we have something more specific in place?
<abentley> danilos: We have lotsa logs.
<danilos> abentley, I know, but I am sure we don't have a way to track these easily, and that's one thing I suggest: i.e. figure out a way to track these, especially right after it's rolled out
abentley, unless you count something like "grep SIGKILL buildd-manager.log" as "easy" :)
<abentley> danilos: I view this as a necessary evil. The current behaviour is catastrophic: https:/
<danilos> abentley, yes, I agree, I am just thinking a bit more forward into "what if we killed too many builds that would have succeeded"
abentley, basically, I'm giving you my r=danilo, as long as we have some strategy in place to figure out that we were not too aggressive
abentley, i.e. something that will tell us later that 1GB was the right cut-off point (I trust your judgement in choosing it, it's just that it'd be nice to have a way to confirm it as a good choice later, when we can't do it now)
<abentley> danilos: what would you consider an adequate strategy?
<danilos> abentley, I don't know, a graph tracking number of builds failed because of this particular reason for instance, and a promise to look at it in say week's or two-weeks' time
<abentley> danilos: I don't know how to generate a graph of that.
danilos: You'd have to scrape the builder logs.
<danilos> abentley, right, so is there a way to have this fail in a more specific way?
<abentley> danilos: It's conceivable that there might be.
<danilos> abentley, or, alternatively, at least a promise to do a one-time scraping of the logs so we know we haven't cocked up in significant way (if it's too serious we'll know it anyway, but what if we kill something like 15% of the builds that have worked in the past - how will we know?)
<danilos> abentley, it doesn't have to be too formal, ex...