Merge lp:~abentley/launchpad/builder-limits into lp:launchpad

Proposed by Aaron Bentley
Status: Merged
Approved by: Данило Шеган
Approved revision: no longer in the source branch.
Merged at revision: 11943
Proposed branch: lp:~abentley/launchpad/builder-limits
Merge into: lp:launchpad
Diff against target: 19 lines (+2/-0)
1 file modified
lib/canonical/buildd/buildrecipe (+2/-0)
To merge this branch: bzr merge lp:~abentley/launchpad/builder-limits
Reviewer Review Type Date Requested Status
Данило Шеган (community) Approve
Review via email: mp+41211@code.launchpad.net

Commit message

Memory-limit recipe builds.

Description of the change

= Summary =
Fix bug #676657: recipe builds can use too much memory

== Proposed fix ==
Restrict virtual memory use by a recipe build to 1 GB. This will allow some
swapping, but not excessive swapping.

== Pre-implementation notes ==
None

== Implementation details ==
There are really two problems:
1. Recipe builds use too much memory.
2. The build farm behaves badly when builds use too much memory.

Both issues should be addressed. This change addresses 2 by killing builds
that use excessive amounts of memory before they can cause real harm. (The
builders only have 1GB of memory on average.)

== Tests ==
None

== Demo and Q/A ==
Create a recipe using qtwebkit.
See https://code.launchpad.net/~rohangarg/+recipe/qtwebkit

Request a build of the recipe. It should die with a memory error.

= Launchpad lint =

Checking for conflicts and issues in changed files.

Linting changed files:
  lib/canonical/buildd/buildrecipe

To post a comment you must log in.
Revision history for this message
Данило Шеган (danilo) wrote :
Download full text (3.6 KiB)

For reference. It'd be good to keep an eye on https://lpstats.canonical.com/graphs/CodeRecipeBuildsDailyStatusCounts after deployment.

<danilos> abentley, isn't addressable memory used for other things than just actual memory? for instance, mmap files can take up a lot of AS
 abentley, (I don't know much about RLIMIT_AS, so I am just wondering)
<abentley> danilos: that is interesting, but we don't generally mmap things in bzr, and 1 GB is still huge. Our example https://code.dogfood.launchpad.net/~abentley/+recipe/test/+build/4803 took 1 hour 28 minutes to fail, so I think there is lots of breathing room.
<danilos> abentley, sure, I can see this is only a limit for recipe builders, but I wonder what'd happen if somebody tried to do things like language pack builds where source package itself is a few hundred megs (probably just like qtwebkit)
<abentley> danilos: Also, our python doesn't provide RLIMIT_VMEM, so we don't have a lot of choice.
<danilos> abentley, (though, this is unrelated to my first comment: I guess we want to fail early, so perhaps it's good anyway)
 abentley, ok, it sounds good, but I wonder if we have a way to find out if we have been too aggressive
 abentley, would we just track 'too many builds are getting killed because of this' or should we have something more specific in place?
<abentley> danilos: We have lotsa logs.
<danilos> abentley, I know, but I am sure we don't have a way to track these easily, and that's one thing I suggest: i.e. figure out a way to track these, especially right after it's rolled out
 abentley, unless you count something like "grep SIGKILL buildd-manager.log" as "easy" :)
<abentley> danilos: I view this as a necessary evil. The current behaviour is catastrophic: https://wiki.canonical.com/IncidentReports/2010-11-17-buildd-manager-disabling-builders
<danilos> abentley, yes, I agree, I am just thinking a bit more forward into "what if we killed too many builds that would have succeeded"
 abentley, basically, I'm giving you my r=danilo, as long as we have some strategy in place to figure out that we were not too aggressive
 abentley, i.e. something that will tell us later that 1GB was the right cut-off point (I trust your judgement in choosing it, it's just that it'd be nice to have a way to confirm it as a good choice later, when we can't do it now)
<abentley> danilos: what would you consider an adequate strategy?
<danilos> abentley, I don't know, a graph tracking number of builds failed because of this particular reason for instance, and a promise to look at it in say week's or two-weeks' time
<abentley> danilos: I don't know how to generate a graph of that.
 danilos: You'd have to scrape the builder logs.
<danilos> abentley, right, so is there a way to have this fail in a more specific way?
<abentley> danilos: It's conceivable that there might be.
<danilos> abentley, or, alternatively, at least a promise to do a one-time scraping of the logs so we know we haven't cocked up in significant way (if it's too serious we'll know it anyway, but what if we kill something like 15% of the builds that have worked in the past - how will we know?)
<danilos> abentley, it doesn't have to be too formal, ex...

Read more...

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/canonical/buildd/buildrecipe'
2--- lib/canonical/buildd/buildrecipe 2010-09-30 20:22:15 +0000
3+++ lib/canonical/buildd/buildrecipe 2010-11-18 18:23:36 +0000
4@@ -11,6 +11,7 @@
5 import os
6 import pwd
7 import re
8+from resource import RLIMIT_AS, setrlimit
9 import socket
10 from subprocess import call, Popen, PIPE
11 import sys
12@@ -206,6 +207,7 @@
13
14
15 if __name__ == '__main__':
16+ setrlimit(RLIMIT_AS, (1000000000, -1))
17 builder = RecipeBuilder(*sys.argv[1:])
18 if builder.buildTree() != 0:
19 sys.exit(RETCODE_FAILURE_BUILD_TREE)