Merge lp:~pfalcon/linaro-android-mirror/per-host-config into lp:linaro-android-mirror

Proposed by Paul Sokolovsky
Status: Merged
Merged at revision: 46
Proposed branch: lp:~pfalcon/linaro-android-mirror/per-host-config
Merge into: lp:linaro-android-mirror
Diff against target: 91 lines (+39/-4)
2 files modified
linaro_android_mirror/mirrorer.py (+29/-3)
linaro_android_mirror/settings.py (+10/-1)
To merge this branch: bzr merge lp:~pfalcon/linaro-android-mirror/per-host-config
Reviewer Review Type Date Requested Status
Guilherme Salgado (community) Approve
Review via email: mp+63336@code.launchpad.net

Description of the change

This patch allows to specify individual configuration for each upstream git host we mirror. Currently implemented setting for stale check (don't sync with host if was synced recently enough) and number of concurrent sync jobs (-jN).

This should alleviate issues which were seen/reported with mirror service during last week:
 * git.omapzoom.org needs -j1 to sync reliably, but forcing all hosts to -j1 does affect mirroring performance (Patrik Ryd)
 * yesterday, android.git.kernel.org was barely accessible, failing to sync with connection timed out's, and consequently failing all the builds during a day.

To post a comment you must log in.
Revision history for this message
Guilherme Salgado (salgado) wrote :
Download full text (4.2 KiB)

Hi Paul,

This looks good to me; I've just a couple suggestions below.

 review approve

> === modified file 'linaro_android_mirror/mirrorer.py'
> --- linaro_android_mirror/mirrorer.py 2011-06-02 13:34:21 +0000
> +++ linaro_android_mirror/mirrorer.py 2011-06-03 07:52:34 +0000
> @@ -1,6 +1,7 @@
> from collections import defaultdict
> import errno
> import os
> +import time
>
> from twisted.internet.defer import (
> Deferred,
> @@ -19,9 +20,10 @@
>
> _directory_locks = defaultdict(DeferredLock)
>
> -def check_cmd_status(data, target_directory):
> +def check_cmd_status(data, host_config):
> # print "check_cmd_status:", data
> stdout, stderr, exitcode = data
> + target_directory = os.path.join(settings.MIRROR_PREFIX, host_config["host"])
> if exitcode != 0:
> log.msg("error executing command: stdout: '%s', stderr: '%s', exitcode: %d" % (stdout, stderr, exitcode))
> raise Exception(stderr)
> @@ -46,6 +48,26 @@
> symlink('manifests')
> symlink('manifests.git')
> symlink('repo')
> +
> + if hostname in settings.UPSTREAM_HOSTS:
> + host_config = settings.UPSTREAM_HOSTS[hostname]
> + else:
> + host_config = settings.UPSTREAM_HOSTS["*"]
> + host_config["host"] = hostname

I'm slightly concerned about the fact that this is a long-running
process changing a mutable setting. I see that the host will be
overwritten every time this function gets called, so it shouldn't be a
problem, but we might have problems if the recipient of host_config
decides to change it further. We can easily avoid that by making a copy
of the host config (e.g.
copy.copy(settings.UPSTREAM_HOSTS[hostname|"*"])) and passing that on to
the callback.

> +
> + # Get last successful sync timestamp
> + stamp = os.path.join(target_directory, '.synced')
> + try:
> + stamp_mtime = os.stat(stamp).st_mtime
> + except OSError:
> + stamp_mtime = 0
> +
> + # Check for fresh enough
> + if time.time() - stamp_mtime < host_config.get("stale", 0):
> + log.msg("Mirror %s was synced %ds ago, stale threshold (%ds) not reached" %
> + (target_directory, time.time() - stamp_mtime, host_config.get("stale", 0)))
> + return
> +
> manifest_path = os.path.join(target_directory, '.repo/manifest.xml')
> try:
> os.unlink(manifest_path)
> @@ -54,7 +76,7 @@
> raise
> with open(manifest_path, 'w') as manifest_file:
> manifest_file.write(manifest)
> - args = ['sync'] + settings.REPO_SYNC_EXTRA_ARGS
> + args = ['sync'] + settings.REPO_SYNC_EXTRA_ARGS + ['-j%d' % host_config.get("jobs", 1)]

Is it worth allowing host configs to not specify the number of jobs to
use? I'm thinking that it may be better to make it required instead of
providing a default that is far from ideal.

> print "running", settings.REPO, args, 'in', target_directory
> if settings.DRY_RUN:
> d = Deferred()
> @@ -62,7 +84,7 @@
> else:
> d = getProcessOutputAndValue(
> ...

Read more...

review: Approve
Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

Hello Guilherme,

On Fri, 03 Jun 2011 14:24:31 -0000
Guilherme Salgado <email address hidden> wrote:

> Review: Approve
> Hi Paul,
>
> This looks good to me; I've just a couple suggestions below.
>
[]
> > +
> > + if hostname in settings.UPSTREAM_HOSTS:
> > + host_config = settings.UPSTREAM_HOSTS[hostname]
> > + else:
> > + host_config = settings.UPSTREAM_HOSTS["*"]
> > + host_config["host"] = hostname
>
> I'm slightly concerned about the fact that this is a long-running
> process changing a mutable setting. I see that the host will be
> overwritten every time this function gets called, so it shouldn't be a
> problem, but we might have problems if the recipient of host_config
> decides to change it further. We can easily avoid that by making a
> copy of the host config (e.g.
> copy.copy(settings.UPSTREAM_HOSTS[hostname|"*"])) and passing that on
> to the callback.

Good catch. It's probably harmless as it is now, but I have more ideas
what dynamic data to pass to callback, and having that pollute global
config is bad idea.

[]
> > @@ -54,7 +76,7 @@
> > raise
> > with open(manifest_path, 'w') as manifest_file:
> > manifest_file.write(manifest)
> > - args = ['sync'] + settings.REPO_SYNC_EXTRA_ARGS
> > + args = ['sync'] + settings.REPO_SYNC_EXTRA_ARGS + ['-j%d'
> > % host_config.get("jobs", 1)]
>
> Is it worth allowing host configs to not specify the number of jobs to
> use? I'm thinking that it may be better to make it required instead
> of providing a default that is far from ideal.

Yes, I'm a big proponent of sensible defaults ;-). And -j1 is pretty
good as that's what you get when you just type "repo sync". It's also
environment-friendly, because issues with android.git is apparently due
to (relative) lot of people abusing big -j values (I saw -j50 in blogs).

Thanks,
Paul

47. By Paul Sokolovsky

Make sure we don't change UPSTREAM_HOSTS when passing extra data to callback.

Revision history for this message
Guilherme Salgado (salgado) wrote :

> > > @@ -54,7 +76,7 @@
> > > raise
> > > with open(manifest_path, 'w') as manifest_file:
> > > manifest_file.write(manifest)
> > > - args = ['sync'] + settings.REPO_SYNC_EXTRA_ARGS
> > > + args = ['sync'] + settings.REPO_SYNC_EXTRA_ARGS + ['-j%d'
> > > % host_config.get("jobs", 1)]
> >
> > Is it worth allowing host configs to not specify the number of jobs to
> > use? I'm thinking that it may be better to make it required instead
> > of providing a default that is far from ideal.
>
> Yes, I'm a big proponent of sensible defaults ;-). And -j1 is pretty
> good as that's what you get when you just type "repo sync". It's also
> environment-friendly, because issues with android.git is apparently due
> to (relative) lot of people abusing big -j values (I saw -j50 in blogs).

Oh, if that's the default for repo sync then it should be fine. Another
alternative would be to not pass the '-j' argument when it's not
specified in the host config; I think that'd be even better (would make
sure we always use repo's default), but it's not a big deal.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'linaro_android_mirror/mirrorer.py'
2--- linaro_android_mirror/mirrorer.py 2011-06-02 13:34:21 +0000
3+++ linaro_android_mirror/mirrorer.py 2011-06-03 19:35:23 +0000
4@@ -1,6 +1,8 @@
5 from collections import defaultdict
6 import errno
7 import os
8+import time
9+import copy
10
11 from twisted.internet.defer import (
12 Deferred,
13@@ -19,9 +21,10 @@
14
15 _directory_locks = defaultdict(DeferredLock)
16
17-def check_cmd_status(data, target_directory):
18+def check_cmd_status(data, host_data):
19 # print "check_cmd_status:", data
20 stdout, stderr, exitcode = data
21+ target_directory = os.path.join(settings.MIRROR_PREFIX, host_data["host"])
22 if exitcode != 0:
23 log.msg("error executing command: stdout: '%s', stderr: '%s', exitcode: %d" % (stdout, stderr, exitcode))
24 raise Exception(stderr)
25@@ -46,6 +49,29 @@
26 symlink('manifests')
27 symlink('manifests.git')
28 symlink('repo')
29+
30+ if hostname in settings.UPSTREAM_HOSTS:
31+ host_data = settings.UPSTREAM_HOSTS[hostname]
32+ else:
33+ host_data = settings.UPSTREAM_HOSTS["*"]
34+ # We'll use host_data to pass extra data to callback,
35+ # so copy it to make sure we don't pollute settings.UPSTREAM_HOSTS
36+ host_data = copy.copy(host_data)
37+ host_data["host"] = hostname
38+
39+ # Get last successful sync timestamp
40+ stamp = os.path.join(target_directory, '.synced')
41+ try:
42+ stamp_mtime = os.stat(stamp).st_mtime
43+ except OSError:
44+ stamp_mtime = 0
45+
46+ # Check for fresh enough
47+ if time.time() - stamp_mtime < host_data.get("stale", 0):
48+ log.msg("Mirror %s was synced %ds ago, stale threshold (%ds) not reached" %
49+ (target_directory, time.time() - stamp_mtime, host_data.get("stale", 0)))
50+ return
51+
52 manifest_path = os.path.join(target_directory, '.repo/manifest.xml')
53 try:
54 os.unlink(manifest_path)
55@@ -54,7 +80,7 @@
56 raise
57 with open(manifest_path, 'w') as manifest_file:
58 manifest_file.write(manifest)
59- args = ['sync'] + settings.REPO_SYNC_EXTRA_ARGS
60+ args = ['sync'] + settings.REPO_SYNC_EXTRA_ARGS + ['-j%d' % host_data.get("jobs", 1)]
61 print "running", settings.REPO, args, 'in', target_directory
62 if settings.DRY_RUN:
63 d = Deferred()
64@@ -62,7 +88,7 @@
65 else:
66 d = getProcessOutputAndValue(
67 settings.REPO, args, path=target_directory)
68- return d.addCallback(lambda data: check_cmd_status(data, target_directory))
69+ return d.addCallback(lambda data: check_cmd_status(data, host_data))
70 return d.run(_mirror)
71
72
73
74=== modified file 'linaro_android_mirror/settings.py'
75--- linaro_android_mirror/settings.py 2011-05-13 09:30:26 +0000
76+++ linaro_android_mirror/settings.py 2011-06-03 19:35:23 +0000
77@@ -2,4 +2,13 @@
78 MIRROR_PREFIX = '/mnt/mirror'
79 DRY_RUN = False
80 REPO = '/usr/local/bin/repo'
81-REPO_SYNC_EXTRA_ARGS = ['--no-repo-verify', '-j8']
82+REPO_SYNC_EXTRA_ARGS = ['--no-repo-verify']
83+
84+# stale - don't sync if last successfully synced less than this time in seconds ago
85+# jobs - use -jN repo sync param
86+UPSTREAM_HOSTS = {
87+ "android.git.kernel.org": {"stale": 60*60, "jobs": 6},
88+ "git.omapzoom.org": {"stale": 30*60, "jobs": 1},
89+ "git.linaro.org": {"stale": 1*60, "jobs": 8},
90+ "*": {"stale": 10*60, "jobs": 4},
91+}

Subscribers

People subscribed via source and target branches