Merge lp:~le-chi-thu/lava-dispatcher/cache-tarballs-v1 into lp:lava-dispatcher
- cache-tarballs-v1
- Merge into trunk
Status: | Merged |
---|---|
Approved by: | Michael Hudson-Doyle |
Approved revision: | 276 |
Merged at revision: | 277 |
Proposed branch: | lp:~le-chi-thu/lava-dispatcher/cache-tarballs-v1 |
Merge into: | lp:lava-dispatcher |
Diff against target: |
248 lines (+136/-36) 2 files modified
lava_dispatcher/client/master.py (+113/-13) lava_dispatcher/utils.py (+23/-23) |
To merge this branch: | bzr merge lp:~le-chi-thu/lava-dispatcher/cache-tarballs-v1 |
Related bugs: | |
Related blueprints: |
Cache rootfs & boot tarballs
(Medium)
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Zygmunt Krynicki (community) | Approve | ||
Le Chi Thu (community) | Needs Resubmitting | ||
Review via email: mp+102181@code.launchpad.net |
Commit message
Description of the change
BP https:/
The solution I did was only caching the tarballs when the build is of type image which all health check jobs are using. I am not sure how much reuse of tarballs for jobs which are using hwpack and rootfs, they are right now mostly CI jobs. Maybe be we need to add a new blueprint to investigate the hwpack & rootfs case.
Le Chi Thu (le-chi-thu) wrote : | # |
Michael Hudson-Doyle (mwhudson) wrote : | # |
On Mon, 16 Apr 2012 21:01:25 -0000, Le Chi Thu <email address hidden> wrote:
> Le Chi Thu has proposed merging lp:~le-chi-thu/lava-dispatcher/cache-tarballs-v1 into lp:lava-dispatcher.
>
> Requested reviews:
> Linaro Validation Team (linaro-validation)
>
> For more details, see:
> https:/
>
> BP https:/
>
> The solution I did was only caching the tarballs when the build is of
> type image which all health check jobs are using. I am not sure how
> much reuse of tarballs for jobs which are using hwpack and rootfs,
> they are right now mostly CI jobs. Maybe be we need to add a new
> blueprint to investigate the hwpack & rootfs case.
I'm a bit sad that this reuses the existing cache machinery, as I'd
hoped that we could delete all of that when we've got squid running
sensibly. But I don't have any better ideas really.
I don't think it's a safe idea to cache the outputs of all jobs that run
with images -- we'll soon be running tests on images that Andy is
building in Jenkins. I'd rather do something more explicit, like adding
a "cache_tarballs" option to the deploy_linaro action and changing
behaviour based on that...
> https:/
> You are subscribed to branch lp:lava-dispatcher.
> === modified file 'lava_dispatche
> --- lava_dispatcher
> +++ lava_dispatcher
> @@ -37,7 +37,7 @@
> logging_spawn,
> logging_system,
> string_to_list,
> - )
> + url_to_cache, link_or_copy_file)
> from lava_dispatcher
> CommandRunner,
> CriticalError,
> @@ -58,13 +58,13 @@
> :param partno: The index of the partition in the image
> :param tarfile: path and filename of the tgz to output
> """
> +
> with image_partition
> cmd = "sudo tar -C %s -czf %s ." % (mntdir, tarfile)
> rc = logging_system(cmd)
> if rc:
> raise RuntimeError(
>
> -
> def _deploy_
> decompression_char = ''
> if tarball_
> @@ -289,31 +289,79 @@
> return uncompressed_name
> return image_file
I have a couple of English-language requests for the method names...
> + def _tarball_
> + cache_loc = url_to_cache(url, cachedir)
> + return os.path.
> +
> + def _is_tarballs_
_are_tarballs_
> + cache_loc = self._tarball_
> + return os.path.
> + os.path.
> +
> + def _get_cached_
> + ...
Zygmunt Krynicki (zyga) wrote : | # |
I agree with mwhudson on the feel that we should do less caching but at the same time this code probably saves a good deal of time and IO. In the long term I see a way to get that fixed but it's not something I want to talk about now.
I agree on method names, we should keep a close watch on that to ensure they stay readable.
+1
- 276. By Le Chi Thu <email address hidden> <email address hidden>
-
Changed the method names. Cache file creation sync between lava-dispatcher instances.
Le Chi Thu (le-chi-thu) wrote : | # |
Method names are updated. I use a directory to sync between lava-dispatcher instances for creation of tarballs cache. I think it is a safe and simple solution. The worse case is if the lava-dispatcher instance who is creating the tarballs, crashed, other instance will timeout (20 minutes) and continue to download the image and create the tarballs themselves.
The caching of download files can be replace by squid. This solution did not depend on that feature except reusing some help functions in the utils.py
Michael Hudson-Doyle (mwhudson) wrote : | # |
OK then. I still think we'll want to tone down the caching to situations where it's useful, but this is an improvement indeed.
Le Chi Thu (le-chi-thu) wrote : | # |
Fixed a bug when caching the downloaded file. Introduced by earlier refactory of the code.
Preview Diff
1 | === modified file 'lava_dispatcher/client/master.py' |
2 | --- lava_dispatcher/client/master.py 2012-04-17 15:40:25 +0000 |
3 | +++ lava_dispatcher/client/master.py 2012-04-25 13:09:41 +0000 |
4 | @@ -30,6 +30,7 @@ |
5 | import traceback |
6 | |
7 | import pexpect |
8 | +import errno |
9 | |
10 | from lava_dispatcher.utils import ( |
11 | download, |
12 | @@ -37,7 +38,7 @@ |
13 | logging_spawn, |
14 | logging_system, |
15 | string_to_list, |
16 | - ) |
17 | + url_to_cache, link_or_copy_file) |
18 | from lava_dispatcher.client.base import ( |
19 | CommandRunner, |
20 | CriticalError, |
21 | @@ -58,13 +59,13 @@ |
22 | :param partno: The index of the partition in the image |
23 | :param tarfile: path and filename of the tgz to output |
24 | """ |
25 | + |
26 | with image_partition_mounted(image, partno) as mntdir: |
27 | cmd = "sudo tar -C %s -czf %s ." % (mntdir, tarfile) |
28 | rc = logging_system(cmd) |
29 | if rc: |
30 | raise RuntimeError("Failed to create tarball: %s" % tarfile) |
31 | |
32 | - |
33 | def _deploy_tarball_to_board(session, tarball_url, dest, timeout=-1): |
34 | decompression_char = '' |
35 | if tarball_url.endswith('.gz') or tarball_url.endswith('.tgz'): |
36 | @@ -289,31 +290,129 @@ |
37 | return uncompressed_name |
38 | return image_file |
39 | |
40 | + def _tarball_url_to_cache(self, url, cachedir): |
41 | + cache_loc = url_to_cache(url, cachedir) |
42 | + # can't have a folder name same as file name. replacing '.' with '.' |
43 | + return os.path.join(cache_loc.replace('.','-'), "tarballs") |
44 | + |
45 | + def _are_tarballs_cached(self, image, lava_cachedir): |
46 | + cache_loc = self._tarball_url_to_cache(image, lava_cachedir) |
47 | + cached = os.path.exists(os.path.join(cache_loc, "boot.tgz")) and \ |
48 | + os.path.exists(os.path.join(cache_loc, "root.tgz")) |
49 | + |
50 | + if cached: |
51 | + return True; |
52 | + |
53 | + # Check if there is an other lava-dispatch instance have start to cache the same image |
54 | + # see the _about_to_cache_tarballs |
55 | + if not os.path.exists(os.path.join(cache_loc, "tarballs-cache-ongoing")): |
56 | + return False |
57 | + |
58 | + # wait x minute for caching is done. |
59 | + waittime=20 |
60 | + |
61 | + logging.info("Waiting for the other instance of lava-dispatcher to finish the caching of %s", image) |
62 | + while waittime > 0: |
63 | + if not os.path.exists(os.path.join(cache_loc, "tarballs-cache-ongoing")): |
64 | + waittime = 0 |
65 | + else: |
66 | + time.sleep(60) |
67 | + waittime = waittime - 1 |
68 | + if (waittime % 5) == 0: |
69 | + logging.info("%d minute left..." % waittime) |
70 | + |
71 | + return os.path.exists(os.path.join(cache_loc, "boot.tgz")) and \ |
72 | + os.path.exists(os.path.join(cache_loc, "root.tgz")) |
73 | + |
74 | + def _get_cached_tarballs(self, image, tarball_dir, lava_cachedir): |
75 | + cache_loc = self._tarball_url_to_cache(image, lava_cachedir) |
76 | + |
77 | + boot_tgz = os.path.join(tarball_dir,"boot.tgz") |
78 | + root_tgz = os.path.join(tarball_dir,"root.tgz") |
79 | + link_or_copy_file(os.path.join(cache_loc, "root.tgz"), root_tgz) |
80 | + link_or_copy_file(os.path.join(cache_loc, "boot.tgz"), boot_tgz) |
81 | + |
82 | + return (boot_tgz,root_tgz) |
83 | + |
84 | + def _about_to_cache_tarballs(self, image, lava_cachedir): |
85 | + # create this folder to indicate this instance of lava-dispatcher is caching this image. |
86 | + # see _are_tarballs_cached |
87 | + # return false if unable to create the directory. The caller should not cache the tarballs |
88 | + cache_loc = self._tarball_url_to_cache(image, lava_cachedir) |
89 | + path = os.path.join(cache_loc, "tarballs-cache-ongoing") |
90 | + try: |
91 | + os.makedirs(path) |
92 | + except OSError as exc: # Python >2.5 |
93 | + if exc.errno == errno.EEXIST: |
94 | + # other dispatcher process already caching - concurrency issue |
95 | + return False |
96 | + else: |
97 | + raise |
98 | + return True |
99 | + |
100 | + def _cache_tarballs(self, image, boot_tgz, root_tgz, lava_cachedir): |
101 | + cache_loc = self._tarball_url_to_cache(image, lava_cachedir) |
102 | + if not os.path.exists(cache_loc): |
103 | + os.makedirs(cache_loc) |
104 | + c_boot_tgz = os.path.join(cache_loc, "boot.tgz") |
105 | + c_root_tgz = os.path.join(cache_loc, "root.tgz") |
106 | + shutil.copy(boot_tgz, c_boot_tgz) |
107 | + shutil.copy(root_tgz, c_root_tgz) |
108 | + path = os.path.join(cache_loc, "tarballs-cache-ongoing") |
109 | + if os.path.exists(path): |
110 | + shutil.rmtree(path) |
111 | |
112 | def deploy_linaro(self, hwpack=None, rootfs=None, image=None, |
113 | kernel_matrix=None, use_cache=True, rootfstype='ext3'): |
114 | LAVA_IMAGE_TMPDIR = self.context.lava_image_tmpdir |
115 | LAVA_IMAGE_URL = self.context.lava_image_url |
116 | + |
117 | + # validate in parameters |
118 | + if image is None: |
119 | + if hwpack is None or rootfs is None: |
120 | + raise CriticalError( |
121 | + "must specify both hwpack and rootfs when not specifying image") |
122 | + else: |
123 | + if hwpack is not None or rootfs is not None or kernel_matrix is not None: |
124 | + raise CriticalError( |
125 | + "cannot specify hwpack or rootfs when specifying image") |
126 | + |
127 | + # generate image if needed |
128 | try: |
129 | if image is None: |
130 | - if hwpack is None or rootfs is None: |
131 | - raise CriticalError( |
132 | - "must specify both hwpack and rootfs when not specifying image") |
133 | - else: |
134 | - image_file = generate_image(self, hwpack, rootfs, kernel_matrix, use_cache) |
135 | + image_file = generate_image(self, hwpack, rootfs, kernel_matrix, use_cache) |
136 | + boot_tgz, root_tgz = self._generate_tarballs(image_file) |
137 | else: |
138 | - if hwpack is not None or rootfs is not None or kernel_matrix is not None: |
139 | - raise CriticalError( |
140 | - "cannot specify hwpack or rootfs when specifying image") |
141 | tarball_dir = mkdtemp(dir=LAVA_IMAGE_TMPDIR) |
142 | os.chmod(tarball_dir, 0755) |
143 | if use_cache: |
144 | lava_cachedir = self.context.lava_cachedir |
145 | - image_file = download_with_cache(image, tarball_dir, lava_cachedir) |
146 | + if self._are_tarballs_cached(image, lava_cachedir): |
147 | + logging.info("Reusing cached tarballs") |
148 | + boot_tgz, root_tgz = self._get_cached_tarballs(image, tarball_dir, lava_cachedir) |
149 | + else: |
150 | + logging.info("Downloading and caching the tarballs") |
151 | + # in some corner case, there can be more than one lava-dispatchers execute |
152 | + # caching of same tarballs exact at the same time. One of them will successfully |
153 | + # get the lock directory. The rest will skip the caching if _about_to_cache_tarballs |
154 | + # return false. |
155 | + should_cache = self._about_to_cache_tarballs(image, lava_cachedir) |
156 | + image_file = download_with_cache(image, tarball_dir, lava_cachedir) |
157 | + image_file = self.decompress(image_file) |
158 | + boot_tgz, root_tgz = self._generate_tarballs(image_file) |
159 | + if should_cache: |
160 | + self._cache_tarballs(image, boot_tgz, root_tgz, lava_cachedir) |
161 | else: |
162 | image_file = download(image, tarball_dir) |
163 | - image_file = self.decompress(image_file) |
164 | - boot_tgz, root_tgz = self._generate_tarballs(image_file) |
165 | + image_file = self.decompress(image_file) |
166 | + boot_tgz, root_tgz = self._generate_tarballs(image_file) |
167 | + # remove the cached tarballs |
168 | + cache_loc = self._tarball_url_to_cache(image, lava_cachedir) |
169 | + shutil.rmtree(cache_loc, ignore_errors = true) |
170 | + # remove the cached image files |
171 | + cache_loc = url_to_cache |
172 | + shutil.rmtree(cache_loc, ignore_errors = true) |
173 | + |
174 | except CriticalError: |
175 | raise |
176 | except: |
177 | @@ -322,6 +421,7 @@ |
178 | self.sio.write(tb) |
179 | raise CriticalError("Deployment tarballs preparation failed") |
180 | |
181 | + # deploy the boot image and rootfs to target |
182 | logging.info("Booting master image") |
183 | try: |
184 | self.boot_master_image() |
185 | |
186 | === modified file 'lava_dispatcher/utils.py' |
187 | --- lava_dispatcher/utils.py 2012-03-19 18:39:24 +0000 |
188 | +++ lava_dispatcher/utils.py 2012-04-25 13:09:41 +0000 |
189 | @@ -48,6 +48,27 @@ |
190 | raise RuntimeError("Could not retrieve %s" % url) |
191 | return filename |
192 | |
193 | +def link_or_copy_file(src, dest): |
194 | + try: |
195 | + dir = os.path.dirname(dest) |
196 | + if not os.path.exists(dir): |
197 | + os.makedirs(dir) |
198 | + os.link(src, dest) |
199 | + except OSError, err: |
200 | + if err.errno == errno.EXDEV: |
201 | + shutil.copy(src, dest) |
202 | + if err.errno == errno.EEXIST: |
203 | + logging.debug("Cached copy of %s already exists" % dest) |
204 | + else: |
205 | + logging.exception("os.link '%s' with '%s' failed" % (src, dest)) |
206 | + |
207 | +def copy_file(src, dest): |
208 | + dir = os.path.dirname(dest) |
209 | + if not os.path.exists(dir): |
210 | + os.makedirs(dir) |
211 | + shutil.copy(src, dest) |
212 | + |
213 | + |
214 | # XXX: duplication, we have similar code in lava-test, we need to move that to |
215 | # lava.utils -> namespace as standalone package |
216 | def download_with_cache(url, path="", cachedir=""): |
217 | @@ -55,31 +76,10 @@ |
218 | if os.path.exists(cache_loc): |
219 | filename = os.path.basename(cache_loc) |
220 | file_location = os.path.join(path, filename) |
221 | - try: |
222 | - os.link(cache_loc, file_location) |
223 | - except OSError, err: |
224 | - if err.errno == errno.EXDEV: |
225 | - shutil.copy(cache_loc, file_location) |
226 | - if err.errno == errno.EEXIST: |
227 | - logging.debug("Cached copy of %s already exists" % url) |
228 | - else: |
229 | - logging.exception("os.link '%s' with '%s' failed" % (cache_loc, |
230 | - file_location)) |
231 | + link_or_copy_file(cache_loc, file_location) |
232 | else: |
233 | file_location = download(url, path) |
234 | - try: |
235 | - cache_dir = os.path.dirname(cache_loc) |
236 | - if not os.path.exists(cache_dir): |
237 | - os.makedirs(cache_dir) |
238 | - os.link(file_location, cache_loc) |
239 | - except OSError, err: |
240 | - #errno.EXDEV(18) is Invalid cross-device link |
241 | - if err.errno == errno.EXDEV: |
242 | - shutil.copy(file_location, cache_loc) |
243 | - if err.errno == errno.EEXIST: |
244 | - logging.debug("Cached copy of %s already exists" % url) |
245 | - else: |
246 | - logging.exception("os.link failed") |
247 | + copy_file(file_location, cache_loc) |
248 | return file_location |
249 | |
250 |
What about concurrency of multiple instance of lava-dispatcher on the same image file ? Shall we use lockfile ? http:// linux.about. com/library/ cmd/blcmdl1_ lockfile. htm