Merge lp:~le-chi-thu/lava-dispatcher/cache-tarballs-v1 into lp:lava-dispatcher

Proposed by Le Chi Thu
Status: Merged
Approved by: Michael Hudson-Doyle
Approved revision: 276
Merged at revision: 277
Proposed branch: lp:~le-chi-thu/lava-dispatcher/cache-tarballs-v1
Merge into: lp:lava-dispatcher
Diff against target: 248 lines (+136/-36)
2 files modified
lava_dispatcher/client/master.py (+113/-13)
lava_dispatcher/utils.py (+23/-23)
To merge this branch: bzr merge lp:~le-chi-thu/lava-dispatcher/cache-tarballs-v1
Reviewer Review Type Date Requested Status
Zygmunt Krynicki (community) Approve
Le Chi Thu (community) Needs Resubmitting
Review via email: mp+102181@code.launchpad.net

Description of the change

BP https://blueprints.launchpad.net/lava-dispatcher/+spec/cache-rootfs-boot-tarballs

The solution I did was only caching the tarballs when the build is of type image which all health check jobs are using. I am not sure how much reuse of tarballs for jobs which are using hwpack and rootfs, they are right now mostly CI jobs. Maybe be we need to add a new blueprint to investigate the hwpack & rootfs case.

To post a comment you must log in.
Revision history for this message
Le Chi Thu (le-chi-thu) wrote :

What about concurrency of multiple instance of lava-dispatcher on the same image file ? Shall we use lockfile ? http://linux.about.com/library/cmd/blcmdl1_lockfile.htm

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :
Download full text (4.0 KiB)

On Mon, 16 Apr 2012 21:01:25 -0000, Le Chi Thu <email address hidden> wrote:
> Le Chi Thu has proposed merging lp:~le-chi-thu/lava-dispatcher/cache-tarballs-v1 into lp:lava-dispatcher.
>
> Requested reviews:
> Linaro Validation Team (linaro-validation)
>
> For more details, see:
> https://code.launchpad.net/~le-chi-thu/lava-dispatcher/cache-tarballs-v1/+merge/102181
>
> BP https://blueprints.launchpad.net/lava-dispatcher/+spec/cache-rootfs-boot-tarballs
>
> The solution I did was only caching the tarballs when the build is of
> type image which all health check jobs are using. I am not sure how
> much reuse of tarballs for jobs which are using hwpack and rootfs,
> they are right now mostly CI jobs. Maybe be we need to add a new
> blueprint to investigate the hwpack & rootfs case.

I'm a bit sad that this reuses the existing cache machinery, as I'd
hoped that we could delete all of that when we've got squid running
sensibly. But I don't have any better ideas really.

I don't think it's a safe idea to cache the outputs of all jobs that run
with images -- we'll soon be running tests on images that Andy is
building in Jenkins. I'd rather do something more explicit, like adding
a "cache_tarballs" option to the deploy_linaro action and changing
behaviour based on that...

> https://code.launchpad.net/~le-chi-thu/lava-dispatcher/cache-tarballs-v1/+merge/102181
> You are subscribed to branch lp:lava-dispatcher.
> === modified file 'lava_dispatcher/client/master.py'
> --- lava_dispatcher/client/master.py 2012-04-02 11:36:56 +0000
> +++ lava_dispatcher/client/master.py 2012-04-16 21:00:29 +0000
> @@ -37,7 +37,7 @@
> logging_spawn,
> logging_system,
> string_to_list,
> - )
> + url_to_cache, link_or_copy_file)
> from lava_dispatcher.client.base import (
> CommandRunner,
> CriticalError,
> @@ -58,13 +58,13 @@
> :param partno: The index of the partition in the image
> :param tarfile: path and filename of the tgz to output
> """
> +
> with image_partition_mounted(image, partno) as mntdir:
> cmd = "sudo tar -C %s -czf %s ." % (mntdir, tarfile)
> rc = logging_system(cmd)
> if rc:
> raise RuntimeError("Failed to create tarball: %s" % tarfile)
>
> -
> def _deploy_tarball_to_board(session, tarball_url, dest, timeout=-1):
> decompression_char = ''
> if tarball_url.endswith('.gz') or tarball_url.endswith('.tgz'):
> @@ -289,31 +289,79 @@
> return uncompressed_name
> return image_file

I have a couple of English-language requests for the method names...

> + def _tarball_url_to_cache(self, url, cachedir):
> + cache_loc = url_to_cache(url, cachedir)
> + return os.path.join(cache_loc.replace('.','-'), "tarballs")
> +
> + def _is_tarballs_cached(self, image, lava_cachedir):

_are_tarballs_cached seems better to me.

> + cache_loc = self._tarball_url_to_cache(image, lava_cachedir)
> + return os.path.exists(os.path.join(cache_loc, "boot.tgz")) and \
> + os.path.exists(os.path.join(cache_loc, "root.tgz"))
> +
> + def _get_cached_tarballs(self, image, tarball_dir, lava_cachedir):
> + ...

Read more...

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

I agree with mwhudson on the feel that we should do less caching but at the same time this code probably saves a good deal of time and IO. In the long term I see a way to get that fixed but it's not something I want to talk about now.

I agree on method names, we should keep a close watch on that to ensure they stay readable.

+1

review: Approve
276. By Le Chi Thu <email address hidden> <email address hidden>

Changed the method names. Cache file creation sync between lava-dispatcher instances.

Revision history for this message
Le Chi Thu (le-chi-thu) wrote :

Method names are updated. I use a directory to sync between lava-dispatcher instances for creation of tarballs cache. I think it is a safe and simple solution. The worse case is if the lava-dispatcher instance who is creating the tarballs, crashed, other instance will timeout (20 minutes) and continue to download the image and create the tarballs themselves.

The caching of download files can be replace by squid. This solution did not depend on that feature except reusing some help functions in the utils.py

review: Needs Resubmitting
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

OK then. I still think we'll want to tone down the caching to situations where it's useful, but this is an improvement indeed.

Revision history for this message
Le Chi Thu (le-chi-thu) wrote :

Fixed a bug when caching the downloaded file. Introduced by earlier refactory of the code.

review: Needs Resubmitting
277. By Le Chi Thu <email address hidden> <email address hidden>

Fixed a bug when cache the downloaded files

278. By Le Chi Thu <email address hidden> <email address hidden>

Remove race condition in tarballs caching

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

+1

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lava_dispatcher/client/master.py'
2--- lava_dispatcher/client/master.py 2012-04-17 15:40:25 +0000
3+++ lava_dispatcher/client/master.py 2012-04-25 13:09:41 +0000
4@@ -30,6 +30,7 @@
5 import traceback
6
7 import pexpect
8+import errno
9
10 from lava_dispatcher.utils import (
11 download,
12@@ -37,7 +38,7 @@
13 logging_spawn,
14 logging_system,
15 string_to_list,
16- )
17+ url_to_cache, link_or_copy_file)
18 from lava_dispatcher.client.base import (
19 CommandRunner,
20 CriticalError,
21@@ -58,13 +59,13 @@
22 :param partno: The index of the partition in the image
23 :param tarfile: path and filename of the tgz to output
24 """
25+
26 with image_partition_mounted(image, partno) as mntdir:
27 cmd = "sudo tar -C %s -czf %s ." % (mntdir, tarfile)
28 rc = logging_system(cmd)
29 if rc:
30 raise RuntimeError("Failed to create tarball: %s" % tarfile)
31
32-
33 def _deploy_tarball_to_board(session, tarball_url, dest, timeout=-1):
34 decompression_char = ''
35 if tarball_url.endswith('.gz') or tarball_url.endswith('.tgz'):
36@@ -289,31 +290,129 @@
37 return uncompressed_name
38 return image_file
39
40+ def _tarball_url_to_cache(self, url, cachedir):
41+ cache_loc = url_to_cache(url, cachedir)
42+ # can't have a folder name same as file name. replacing '.' with '.'
43+ return os.path.join(cache_loc.replace('.','-'), "tarballs")
44+
45+ def _are_tarballs_cached(self, image, lava_cachedir):
46+ cache_loc = self._tarball_url_to_cache(image, lava_cachedir)
47+ cached = os.path.exists(os.path.join(cache_loc, "boot.tgz")) and \
48+ os.path.exists(os.path.join(cache_loc, "root.tgz"))
49+
50+ if cached:
51+ return True;
52+
53+ # Check if there is an other lava-dispatch instance have start to cache the same image
54+ # see the _about_to_cache_tarballs
55+ if not os.path.exists(os.path.join(cache_loc, "tarballs-cache-ongoing")):
56+ return False
57+
58+ # wait x minute for caching is done.
59+ waittime=20
60+
61+ logging.info("Waiting for the other instance of lava-dispatcher to finish the caching of %s", image)
62+ while waittime > 0:
63+ if not os.path.exists(os.path.join(cache_loc, "tarballs-cache-ongoing")):
64+ waittime = 0
65+ else:
66+ time.sleep(60)
67+ waittime = waittime - 1
68+ if (waittime % 5) == 0:
69+ logging.info("%d minute left..." % waittime)
70+
71+ return os.path.exists(os.path.join(cache_loc, "boot.tgz")) and \
72+ os.path.exists(os.path.join(cache_loc, "root.tgz"))
73+
74+ def _get_cached_tarballs(self, image, tarball_dir, lava_cachedir):
75+ cache_loc = self._tarball_url_to_cache(image, lava_cachedir)
76+
77+ boot_tgz = os.path.join(tarball_dir,"boot.tgz")
78+ root_tgz = os.path.join(tarball_dir,"root.tgz")
79+ link_or_copy_file(os.path.join(cache_loc, "root.tgz"), root_tgz)
80+ link_or_copy_file(os.path.join(cache_loc, "boot.tgz"), boot_tgz)
81+
82+ return (boot_tgz,root_tgz)
83+
84+ def _about_to_cache_tarballs(self, image, lava_cachedir):
85+ # create this folder to indicate this instance of lava-dispatcher is caching this image.
86+ # see _are_tarballs_cached
87+ # return false if unable to create the directory. The caller should not cache the tarballs
88+ cache_loc = self._tarball_url_to_cache(image, lava_cachedir)
89+ path = os.path.join(cache_loc, "tarballs-cache-ongoing")
90+ try:
91+ os.makedirs(path)
92+ except OSError as exc: # Python >2.5
93+ if exc.errno == errno.EEXIST:
94+ # other dispatcher process already caching - concurrency issue
95+ return False
96+ else:
97+ raise
98+ return True
99+
100+ def _cache_tarballs(self, image, boot_tgz, root_tgz, lava_cachedir):
101+ cache_loc = self._tarball_url_to_cache(image, lava_cachedir)
102+ if not os.path.exists(cache_loc):
103+ os.makedirs(cache_loc)
104+ c_boot_tgz = os.path.join(cache_loc, "boot.tgz")
105+ c_root_tgz = os.path.join(cache_loc, "root.tgz")
106+ shutil.copy(boot_tgz, c_boot_tgz)
107+ shutil.copy(root_tgz, c_root_tgz)
108+ path = os.path.join(cache_loc, "tarballs-cache-ongoing")
109+ if os.path.exists(path):
110+ shutil.rmtree(path)
111
112 def deploy_linaro(self, hwpack=None, rootfs=None, image=None,
113 kernel_matrix=None, use_cache=True, rootfstype='ext3'):
114 LAVA_IMAGE_TMPDIR = self.context.lava_image_tmpdir
115 LAVA_IMAGE_URL = self.context.lava_image_url
116+
117+ # validate in parameters
118+ if image is None:
119+ if hwpack is None or rootfs is None:
120+ raise CriticalError(
121+ "must specify both hwpack and rootfs when not specifying image")
122+ else:
123+ if hwpack is not None or rootfs is not None or kernel_matrix is not None:
124+ raise CriticalError(
125+ "cannot specify hwpack or rootfs when specifying image")
126+
127+ # generate image if needed
128 try:
129 if image is None:
130- if hwpack is None or rootfs is None:
131- raise CriticalError(
132- "must specify both hwpack and rootfs when not specifying image")
133- else:
134- image_file = generate_image(self, hwpack, rootfs, kernel_matrix, use_cache)
135+ image_file = generate_image(self, hwpack, rootfs, kernel_matrix, use_cache)
136+ boot_tgz, root_tgz = self._generate_tarballs(image_file)
137 else:
138- if hwpack is not None or rootfs is not None or kernel_matrix is not None:
139- raise CriticalError(
140- "cannot specify hwpack or rootfs when specifying image")
141 tarball_dir = mkdtemp(dir=LAVA_IMAGE_TMPDIR)
142 os.chmod(tarball_dir, 0755)
143 if use_cache:
144 lava_cachedir = self.context.lava_cachedir
145- image_file = download_with_cache(image, tarball_dir, lava_cachedir)
146+ if self._are_tarballs_cached(image, lava_cachedir):
147+ logging.info("Reusing cached tarballs")
148+ boot_tgz, root_tgz = self._get_cached_tarballs(image, tarball_dir, lava_cachedir)
149+ else:
150+ logging.info("Downloading and caching the tarballs")
151+ # in some corner case, there can be more than one lava-dispatchers execute
152+ # caching of same tarballs exact at the same time. One of them will successfully
153+ # get the lock directory. The rest will skip the caching if _about_to_cache_tarballs
154+ # return false.
155+ should_cache = self._about_to_cache_tarballs(image, lava_cachedir)
156+ image_file = download_with_cache(image, tarball_dir, lava_cachedir)
157+ image_file = self.decompress(image_file)
158+ boot_tgz, root_tgz = self._generate_tarballs(image_file)
159+ if should_cache:
160+ self._cache_tarballs(image, boot_tgz, root_tgz, lava_cachedir)
161 else:
162 image_file = download(image, tarball_dir)
163- image_file = self.decompress(image_file)
164- boot_tgz, root_tgz = self._generate_tarballs(image_file)
165+ image_file = self.decompress(image_file)
166+ boot_tgz, root_tgz = self._generate_tarballs(image_file)
167+ # remove the cached tarballs
168+ cache_loc = self._tarball_url_to_cache(image, lava_cachedir)
169+ shutil.rmtree(cache_loc, ignore_errors = true)
170+ # remove the cached image files
171+ cache_loc = url_to_cache
172+ shutil.rmtree(cache_loc, ignore_errors = true)
173+
174 except CriticalError:
175 raise
176 except:
177@@ -322,6 +421,7 @@
178 self.sio.write(tb)
179 raise CriticalError("Deployment tarballs preparation failed")
180
181+ # deploy the boot image and rootfs to target
182 logging.info("Booting master image")
183 try:
184 self.boot_master_image()
185
186=== modified file 'lava_dispatcher/utils.py'
187--- lava_dispatcher/utils.py 2012-03-19 18:39:24 +0000
188+++ lava_dispatcher/utils.py 2012-04-25 13:09:41 +0000
189@@ -48,6 +48,27 @@
190 raise RuntimeError("Could not retrieve %s" % url)
191 return filename
192
193+def link_or_copy_file(src, dest):
194+ try:
195+ dir = os.path.dirname(dest)
196+ if not os.path.exists(dir):
197+ os.makedirs(dir)
198+ os.link(src, dest)
199+ except OSError, err:
200+ if err.errno == errno.EXDEV:
201+ shutil.copy(src, dest)
202+ if err.errno == errno.EEXIST:
203+ logging.debug("Cached copy of %s already exists" % dest)
204+ else:
205+ logging.exception("os.link '%s' with '%s' failed" % (src, dest))
206+
207+def copy_file(src, dest):
208+ dir = os.path.dirname(dest)
209+ if not os.path.exists(dir):
210+ os.makedirs(dir)
211+ shutil.copy(src, dest)
212+
213+
214 # XXX: duplication, we have similar code in lava-test, we need to move that to
215 # lava.utils -> namespace as standalone package
216 def download_with_cache(url, path="", cachedir=""):
217@@ -55,31 +76,10 @@
218 if os.path.exists(cache_loc):
219 filename = os.path.basename(cache_loc)
220 file_location = os.path.join(path, filename)
221- try:
222- os.link(cache_loc, file_location)
223- except OSError, err:
224- if err.errno == errno.EXDEV:
225- shutil.copy(cache_loc, file_location)
226- if err.errno == errno.EEXIST:
227- logging.debug("Cached copy of %s already exists" % url)
228- else:
229- logging.exception("os.link '%s' with '%s' failed" % (cache_loc,
230- file_location))
231+ link_or_copy_file(cache_loc, file_location)
232 else:
233 file_location = download(url, path)
234- try:
235- cache_dir = os.path.dirname(cache_loc)
236- if not os.path.exists(cache_dir):
237- os.makedirs(cache_dir)
238- os.link(file_location, cache_loc)
239- except OSError, err:
240- #errno.EXDEV(18) is Invalid cross-device link
241- if err.errno == errno.EXDEV:
242- shutil.copy(file_location, cache_loc)
243- if err.errno == errno.EEXIST:
244- logging.debug("Cached copy of %s already exists" % url)
245- else:
246- logging.exception("os.link failed")
247+ copy_file(file_location, cache_loc)
248 return file_location
249
250

Subscribers

People subscribed via source and target branches