LAVA Dispatcher

Merge lp:~le-chi-thu/lava-dispatcher/cache-tarballs-v1 into lp:lava-dispatcher

cache-tarballs-v1
Merge into trunk

Proposed by Le Chi Thu on 2012-04-16

Status:	Merged
Approved by:	Michael Hudson-Doyle on 2012-04-18
Approved revision:	276
Merged at revision:	277
Proposed branch:	lp:~le-chi-thu/lava-dispatcher/cache-tarballs-v1
Merge into:	lp:lava-dispatcher
Diff against target:	248 lines (+136/-36) 2 files modified lava_dispatcher/client/master.py (+113/-13) lava_dispatcher/utils.py (+23/-23)
To merge this branch:	bzr merge lp:~le-chi-thu/lava-dispatcher/cache-tarballs-v1
Related bugs:	Link a bug report
Related blueprints:	Cache rootfs & boot tarballs (Medium)

Reviewer	Date Requested	Status
Zygmunt Krynicki (community)	2012-04-16	Approve on 2012-04-25
Le Chi Thu (community)		Needs Resubmitting on 2012-04-18
Review via email: mp+102181@code.launchpad.net

Description of the change

BP https://blueprints.launchpad.net/lava-dispatcher/+spec/cache-rootfs-boot-tarballs

The solution I did was only caching the tarballs when the build is of type image which all health check jobs are using. I am not sure how much reuse of tarballs for jobs which are using hwpack and rootfs, they are right now mostly CI jobs. Maybe be we need to add a new blueprint to investigate the hwpack & rootfs case.

Revision history for this message

Le Chi Thu (le-chi-thu) wrote on 2012-04-16:

What about concurrency of multiple instance of lava-dispatcher on the same image file ? Shall we use lockfile ? http://linux.about.com/library/cmd/blcmdl1_lockfile.htm

Revision history for this message

Michael Hudson-Doyle (mwhudson) wrote on 2012-04-17:

Download full text (4.0 KiB)

On Mon, 16 Apr 2012 21:01:25 -0000, Le Chi Thu <email address hidden> wrote:
> Le Chi Thu has proposed merging lp:~le-chi-thu/lava-dispatcher/cache-tarballs-v1 into lp:lava-dispatcher.
>
> Requested reviews:
> Linaro Validation Team (linaro-validation)
>
> For more details, see:
> https://code.launchpad.net/~le-chi-thu/lava-dispatcher/cache-tarballs-v1/+merge/102181
>
> BP https://blueprints.launchpad.net/lava-dispatcher/+spec/cache-rootfs-boot-tarballs
>
> The solution I did was only caching the tarballs when the build is of
> type image which all health check jobs are using. I am not sure how
> much reuse of tarballs for jobs which are using hwpack and rootfs,
> they are right now mostly CI jobs. Maybe be we need to add a new
> blueprint to investigate the hwpack & rootfs case.

I'm a bit sad that this reuses the existing cache machinery, as I'd
hoped that we could delete all of that when we've got squid running
sensibly. But I don't have any better ideas really.

I don't think it's a safe idea to cache the outputs of all jobs that run
with images -- we'll soon be running tests on images that Andy is
building in Jenkins. I'd rather do something more explicit, like adding
a "cache_tarballs" option to the deploy_linaro action and changing
behaviour based on that...

> https://code.launchpad.net/~le-chi-thu/lava-dispatcher/cache-tarballs-v1/+merge/102181
> You are subscribed to branch lp:lava-dispatcher.
> === modified file 'lava_dispatcher/client/master.py'
> --- lava_dispatcher/client/master.py 2012-04-02 11:36:56 +0000
> +++ lava_dispatcher/client/master.py 2012-04-16 21:00:29 +0000
> @@ -37,7 +37,7 @@
> logging_spawn,
> logging_system,
> string_to_list,
> - )
> + url_to_cache, link_or_copy_file)
> from lava_dispatcher.client.base import (
> CommandRunner,
> CriticalError,
> @@ -58,13 +58,13 @@
> :param partno: The index of the partition in the image
> :param tarfile: path and filename of the tgz to output
> """
> +
> with image_partition_mounted(image, partno) as mntdir:
> cmd = "sudo tar -C %s -czf %s ." % (mntdir, tarfile)
> rc = logging_system(cmd)
> if rc:
> raise RuntimeError("Failed to create tarball: %s" % tarfile)
>
> -
> def _deploy_tarball_to_board(session, tarball_url, dest, timeout=-1):
> decompression_char = ''
> if tarball_url.endswith('.gz') or tarball_url.endswith('.tgz'):
> @@ -289,31 +289,79 @@
> return uncompressed_name
> return image_file

I have a couple of English-language requests for the method names...

> + def _tarball_url_to_cache(self, url, cachedir):
> + cache_loc = url_to_cache(url, cachedir)
> + return os.path.join(cache_loc.replace('.','-'), "tarballs")
> +
> + def _is_tarballs_cached(self, image, lava_cachedir):

_are_tarballs_cached seems better to me.

On Mon, 16 Apr 2012 21:01:25 -0000, Le Chi Thu <le.chi.thu@linaro.org> wrote:
> Le Chi Thu has proposed merging lp:~le-chi-thu/lava-dispatcher/cache-tarballs-v1 into lp:lava-dispatcher.
>
> Requested reviews:
>   Linaro Validation Team (linaro-validation)
>
> For more details, see:
> https://code.launchpad.net/~le-chi-thu/lava-dispatcher/cache-tarballs-v1/+merge/102181
>
> BP https://blueprints.launchpad.net/lava-dispatcher/+spec/cache-rootfs-boot-tarballs
>
> The solution I did was only caching the tarballs when the build is of
> type image which all health check jobs are using. I am not sure how
> much reuse of tarballs for jobs which are using hwpack and rootfs,
> they are right now mostly CI jobs. Maybe be we need to add a new
> blueprint to investigate the hwpack & rootfs case.

I'm a bit sad that this reuses the existing cache machinery, as I'd
hoped that we could delete all of that when we've got squid running
sensibly.  But I don't have any better ideas really.

I don't think it's a safe idea to cache the outputs of all jobs that run
with images -- we'll soon be running tests on images that Andy is
building in Jenkins.  I'd rather do something more explicit, like adding
a "cache_tarballs" option to the deploy_linaro action and changing
behaviour based on that...

> https://code.launchpad.net/~le-chi-thu/lava-dispatcher/cache-tarballs-v1/+merge/102181
> You are subscribed to branch lp:lava-dispatcher.
> === modified file 'lava_dispatcher/client/master.py'
> --- lava_dispatcher/client/master.py	2012-04-02 11:36:56 +0000
> +++ lava_dispatcher/client/master.py	2012-04-16 21:00:29 +0000
> @@ -37,7 +37,7 @@
>      logging_spawn,
>      logging_system,
>      string_to_list,
> -    )
> +    url_to_cache, link_or_copy_file)
>  from lava_dispatcher.client.base import (
>      CommandRunner,
>      CriticalError,
> @@ -58,13 +58,13 @@
>      :param partno: The index of the partition in the image
>      :param tarfile: path and filename of the tgz to output
>      """
> +
>      with image_partition_mounted(image, partno) as mntdir:
>          cmd = "sudo tar -C %s -czf %s ." % (mntdir, tarfile)
>          rc = logging_system(cmd)
>          if rc:
>              raise RuntimeError("Failed to create tarball: %s" % tarfile)
>  
> -
>  def _deploy_tarball_to_board(session, tarball_url, dest, timeout=-1):
>      decompression_char = ''
>      if tarball_url.endswith('.gz') or tarball_url.endswith('.tgz'):
> @@ -289,31 +289,79 @@
>                  return uncompressed_name
>          return image_file

I have a couple of English-language requests for the method names...

> +    def _tarball_url_to_cache(self, url, cachedir):
> +        cache_loc = url_to_cache(url, cachedir)
> +        return os.path.join(cache_loc.replace('.','-'), "tarballs")
> +
> +    def _is_tarballs_cached(self, image, lava_cachedir):

_are_tarballs_cached seems better to me.

> +        cache_loc = self._tarball_url_to_cache(image, lava_cachedir)
> +        return os.path.exists(os.path.join(cache_loc, "boot.tgz")) and \
> +               os.path.exists(os.path.join(cache_loc, "root.tgz"))
> +
> +    def _get_cached_tarballs(self, image, tarball_dir, lava_cachedir):
> +        cache_loc = self._tarball_url_to_cache(image, lava_cachedir)
> +
> +        boot_tgz = os.path.join(tarball_dir,"boot.tgz")
> +        root_tgz = os.path.join(tarball_dir,"root.tgz")
> +        link_or_copy_file(os.path.join(cache_loc, "root.tgz"), root_tgz)
> +        link_or_copy_file(os.path.join(cache_loc, "boot.tgz"), boot_tgz)
> +
> +        return (boot_tgz,root_tgz)
> +
> +    def _cached_tarballs(self, image, boot_tgz, root_tgz, lava_cachedir):

This should be _cache_tarballs.

> +        cache_loc = self._tarball_url_to_cache(image, lava_cachedir)
> +        if not os.path.exists(cache_loc):
> +              os.makedirs(cache_loc)
> +        c_boot_tgz = os.path.join(cache_loc, "boot.tgz")
> +        c_root_tgz = os.path.join(cache_loc, "root.tgz")
> +        shutil.copy(boot_tgz, c_boot_tgz)
> +        shutil.copy(root_tgz, c_root_tgz)
>

Cheers,
mwh

Revision history for this message

Zygmunt Krynicki (zyga) wrote on 2012-04-17:

I agree with mwhudson on the feel that we should do less caching but at the same time this code probably saves a good deal of time and IO. In the long term I see a way to get that fixed but it's not something I want to talk about now.

I agree on method names, we should keep a close watch on that to ensure they stay readable.

review: Approve

lp:~le-chi-thu/lava-dispatcher/cache-tarballs-v1 updated on 2012-04-17

276. By Le Chi Thu <email address hidden> <email address hidden> on 2012-04-17: Changed the method names. Cache file creation sync between lava-dispatcher instances.

Revision history for this message

Le Chi Thu (le-chi-thu) wrote on 2012-04-17:

Method names are updated. I use a directory to sync between lava-dispatcher instances for creation of tarballs cache. I think it is a safe and simple solution. The worse case is if the lava-dispatcher instance who is creating the tarballs, crashed, other instance will timeout (20 minutes) and continue to download the image and create the tarballs themselves.

The caching of download files can be replace by squid. This solution did not depend on that feature except reusing some help functions in the utils.py

review: Needs Resubmitting

Revision history for this message

Michael Hudson-Doyle (mwhudson) wrote on 2012-04-18:

OK then. I still think we'll want to tone down the caching to situations where it's useful, but this is an improvement indeed.

Revision history for this message

Le Chi Thu (le-chi-thu) wrote on 2012-04-18:

Fixed a bug when caching the downloaded file. Introduced by earlier refactory of the code.

review: Needs Resubmitting

lp:~le-chi-thu/lava-dispatcher/cache-tarballs-v1 updated on 2012-04-25

277. By Le Chi Thu <email address hidden> <email address hidden> on 2012-04-18: Fixed a bug when cache the downloaded files
278. By Le Chi Thu <email address hidden> <email address hidden> on 2012-04-25: Remove race condition in tarballs caching

Revision history for this message

Zygmunt Krynicki (zyga) wrote on 2012-04-25:

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Ash Charles

Le Chi Thu

Michael Hudson-Doyle

Spring Zhang

 === modified file 'lava_dispatcher/client/master.py'
 --- lava_dispatcher/client/master.py	2012-04-17 15:40:25 +0000
 +++ lava_dispatcher/client/master.py	2012-04-25 13:09:41 +0000
@@ -30,6 +30,7 @@
  import traceback
  import pexpect
++import errno
  from lava_dispatcher.utils import (
      download,
@@ -37,7 +38,7 @@
      logging_spawn,
      logging_system,
      string_to_list,
--    )
++    url_to_cache, link_or_copy_file)
  from lava_dispatcher.client.base import (
      CommandRunner,
      CriticalError,
@@ -58,13 +59,13 @@
      :param partno: The index of the partition in the image
      :param tarfile: path and filename of the tgz to output
      """
++
      with image_partition_mounted(image, partno) as mntdir:
          cmd = "sudo tar -C %s -czf %s ." % (mntdir, tarfile)
          rc = logging_system(cmd)
          if rc:
              raise RuntimeError("Failed to create tarball: %s" % tarfile)
--
  def _deploy_tarball_to_board(session, tarball_url, dest, timeout=-1):
      decompression_char = ''
      if tarball_url.endswith('.gz') or tarball_url.endswith('.tgz'):
@@ -289,31 +290,129 @@
                  return uncompressed_name
          return image_file
++    def _tarball_url_to_cache(self, url, cachedir):
++        cache_loc = url_to_cache(url, cachedir)
++        # can't have a folder name same as file name. replacing '.' with '.'
++        return os.path.join(cache_loc.replace('.','-'), "tarballs")
++
++    def _are_tarballs_cached(self, image, lava_cachedir):
++        cache_loc = self._tarball_url_to_cache(image, lava_cachedir)
++        cached = os.path.exists(os.path.join(cache_loc, "boot.tgz")) and \
++               os.path.exists(os.path.join(cache_loc, "root.tgz"))
++
++        if cached:
++            return True;
++
++        # Check if there is an other lava-dispatch instance have start to cache the same image
++        # see the _about_to_cache_tarballs
++        if not os.path.exists(os.path.join(cache_loc, "tarballs-cache-ongoing")):
++            return False
++
++        # wait x minute for caching is done.
++        waittime=20
++
++        logging.info("Waiting for the other instance of lava-dispatcher to finish the caching of %s", image)
++        while waittime > 0:
++            if not os.path.exists(os.path.join(cache_loc, "tarballs-cache-ongoing")):
++                waittime = 0
++            else:
++                time.sleep(60)
++                waittime = waittime - 1
++                if (waittime % 5) == 0:
++                    logging.info("%d minute left..." % waittime)
++
++        return os.path.exists(os.path.join(cache_loc, "boot.tgz")) and \
++               os.path.exists(os.path.join(cache_loc, "root.tgz"))
++
++    def _get_cached_tarballs(self, image, tarball_dir, lava_cachedir):
++        cache_loc = self._tarball_url_to_cache(image, lava_cachedir)
++
++        boot_tgz = os.path.join(tarball_dir,"boot.tgz")
++        root_tgz = os.path.join(tarball_dir,"root.tgz")
++        link_or_copy_file(os.path.join(cache_loc, "root.tgz"), root_tgz)
++        link_or_copy_file(os.path.join(cache_loc, "boot.tgz"), boot_tgz)
++
++        return (boot_tgz,root_tgz)
++
++    def _about_to_cache_tarballs(self, image, lava_cachedir):
++        # create this folder to indicate this instance of lava-dispatcher is caching this image.
++        # see _are_tarballs_cached
++        # return false if unable to create the directory. The caller should not cache the tarballs
++        cache_loc = self._tarball_url_to_cache(image, lava_cachedir)
++        path = os.path.join(cache_loc, "tarballs-cache-ongoing")
++        try:
++          os.makedirs(path)
++        except OSError as exc: # Python >2.5
++            if exc.errno == errno.EEXIST:
++                # other dispatcher process already caching - concurrency issue
++                return False
++            else:
++                raise
++        return True
++
++    def _cache_tarballs(self, image, boot_tgz, root_tgz, lava_cachedir):
++        cache_loc = self._tarball_url_to_cache(image, lava_cachedir)
++        if not os.path.exists(cache_loc):
++              os.makedirs(cache_loc)
++        c_boot_tgz = os.path.join(cache_loc, "boot.tgz")
++        c_root_tgz = os.path.join(cache_loc, "root.tgz")
++        shutil.copy(boot_tgz, c_boot_tgz)
++        shutil.copy(root_tgz, c_root_tgz)
++        path = os.path.join(cache_loc, "tarballs-cache-ongoing")
++        if os.path.exists(path):
++            shutil.rmtree(path)
      def deploy_linaro(self, hwpack=None, rootfs=None, image=None,
                        kernel_matrix=None, use_cache=True, rootfstype='ext3'):
          LAVA_IMAGE_TMPDIR = self.context.lava_image_tmpdir
          LAVA_IMAGE_URL = self.context.lava_image_url
++
++        # validate in parameters
++        if image is None:
++            if hwpack is None or rootfs is None:
++                raise CriticalError(
++                    "must specify both hwpack and rootfs when not specifying image")
++        else:
++            if hwpack is not None or rootfs is not None or kernel_matrix is not None:
++                raise CriticalError(
++                        "cannot specify hwpack or rootfs when specifying image")
++
++        # generate image if needed
          try:
              if image is None:
--                if hwpack is None or rootfs is None:
--                    raise CriticalError(
--                        "must specify both hwpack and rootfs when not specifying image")
--                else:
--                    image_file = generate_image(self, hwpack, rootfs, kernel_matrix, use_cache)
++                image_file = generate_image(self, hwpack, rootfs, kernel_matrix, use_cache)
++                boot_tgz, root_tgz = self._generate_tarballs(image_file)
              else:
--                if hwpack is not None or rootfs is not None or kernel_matrix is not None:
--                    raise CriticalError(
--                        "cannot specify hwpack or rootfs when specifying image")
                  tarball_dir = mkdtemp(dir=LAVA_IMAGE_TMPDIR)
                  os.chmod(tarball_dir, 0755)
                  if use_cache:
                      lava_cachedir = self.context.lava_cachedir
--                    image_file = download_with_cache(image, tarball_dir, lava_cachedir)
++                    if self._are_tarballs_cached(image, lava_cachedir):
++                        logging.info("Reusing cached tarballs")
++                        boot_tgz, root_tgz = self._get_cached_tarballs(image, tarball_dir, lava_cachedir)
++                    else:
++                        logging.info("Downloading and caching the tarballs")
++                        # in some corner case, there can be more than one lava-dispatchers execute
++                        # caching of same tarballs exact at the same time. One of them will successfully
++                        # get the lock directory. The rest will skip the caching if _about_to_cache_tarballs
++                        # return false.
++                        should_cache = self._about_to_cache_tarballs(image, lava_cachedir)
++                        image_file = download_with_cache(image, tarball_dir, lava_cachedir)
++                        image_file = self.decompress(image_file)
++                        boot_tgz, root_tgz = self._generate_tarballs(image_file)
++                        if should_cache:
++                            self._cache_tarballs(image, boot_tgz, root_tgz, lava_cachedir)
                  else:
                      image_file = download(image, tarball_dir)
--                image_file = self.decompress(image_file)
--            boot_tgz, root_tgz = self._generate_tarballs(image_file)
++                    image_file = self.decompress(image_file)
++                    boot_tgz, root_tgz = self._generate_tarballs(image_file)
++                    # remove the cached tarballs
++                    cache_loc = self._tarball_url_to_cache(image, lava_cachedir)
++                    shutil.rmtree(cache_loc, ignore_errors = true)
++                    # remove the cached image files
++                    cache_loc = url_to_cache
++                    shutil.rmtree(cache_loc, ignore_errors = true)
++
          except CriticalError:
              raise
          except:
@@ -322,6 +421,7 @@
              self.sio.write(tb)
              raise CriticalError("Deployment tarballs preparation failed")
++        # deploy the boot image and rootfs to target
          logging.info("Booting master image")
          try:
              self.boot_master_image()
 === modified file 'lava_dispatcher/utils.py'
 --- lava_dispatcher/utils.py	2012-03-19 18:39:24 +0000
 +++ lava_dispatcher/utils.py	2012-04-25 13:09:41 +0000
@@ -48,6 +48,27 @@
          raise RuntimeError("Could not retrieve %s" % url)
      return filename
++def link_or_copy_file(src, dest):
++    try:
++        dir = os.path.dirname(dest)
++        if not os.path.exists(dir):
++            os.makedirs(dir)
++        os.link(src, dest)
++    except OSError, err:
++        if err.errno == errno.EXDEV:
++            shutil.copy(src, dest)
++        if err.errno == errno.EEXIST:
++            logging.debug("Cached copy of %s already exists" % dest)
++        else:
++            logging.exception("os.link '%s' with '%s' failed" % (src, dest))
++
++def copy_file(src, dest):
++    dir = os.path.dirname(dest)
++    if not os.path.exists(dir):
++        os.makedirs(dir)
++    shutil.copy(src, dest)
++
++
  # XXX: duplication, we have similar code in lava-test, we need to move that to
  # lava.utils -> namespace as standalone package
  def download_with_cache(url, path="", cachedir=""):
@@ -55,31 +76,10 @@
      if os.path.exists(cache_loc):
          filename = os.path.basename(cache_loc)
          file_location = os.path.join(path, filename)
--        try:
--            os.link(cache_loc, file_location)
--        except OSError, err:
--            if err.errno == errno.EXDEV:
--                shutil.copy(cache_loc, file_location)
--            if err.errno == errno.EEXIST:
--                logging.debug("Cached copy of %s already exists" % url)
--            else:
--                logging.exception("os.link '%s' with '%s' failed" % (cache_loc,
--                                                                file_location))
++        link_or_copy_file(cache_loc, file_location)
      else:
          file_location = download(url, path)
--        try:
--            cache_dir = os.path.dirname(cache_loc)
--            if not os.path.exists(cache_dir):
--                os.makedirs(cache_dir)
--            os.link(file_location, cache_loc)
--        except OSError, err:
--            #errno.EXDEV(18) is Invalid cross-device link
--            if err.errno == errno.EXDEV:
--                shutil.copy(file_location, cache_loc)
--            if err.errno == errno.EEXIST:
--                logging.debug("Cached copy of %s already exists" % url)
--            else:
--                logging.exception("os.link failed")
++        copy_file(file_location, cache_loc)
      return file_location

LAVA Dispatcher

Merge lp:~le-chi-thu/lava-dispatcher/cache-tarballs-v1 into lp:lava-dispatcher

Commit message

Description of the change

Preview Diff

Subscribers