Merge ~chad.smith/cloud-init:cyaml-loading into cloud-init:master

Proposed by Chad Smith on 2017-04-25
Status: Work in progress
Proposed branch: ~chad.smith/cloud-init:cyaml-loading
Merge into: cloud-init:master
Diff against target: 110 lines (+36/-6)
4 files modified
cloudinit/safeyaml.py (+7/-3)
packages/debian/control.in (+1/-1)
requirements.txt (+6/-2)
tests/unittests/test_util.py (+22/-0)
Reviewer Review Type Date Requested Status
Server Team CI bot continuous-integration Needs Fixing on 2017-04-25
cloud-init Commiters 2017-04-25 Pending
Review via email: mp+323088@code.launchpad.net

Description of the change

[Leaving this branch as work in progress until we have more concrete timing analysis of the impact of loading yam's C-bindings instead of the python implementation]

Use CSafeLoader C-bindings instead of python for a faster yaml.load when processing yaml files.

For faster yaml processing import CSafeLoader in cloudinit/safeyaml.py during yaml loads. Also adds a comment to requirements.txt about the libyaml-dev packages required for unit tests to properly build. Previous test deployments may need to run tox -r to rebuild the virtual env cache if necessary libyaml-dev python-dev packages were absent.

LP: # 1685939

To test:
 tox -r
 tox

To post a comment you must log in.
Chad Smith (chad.smith) wrote :

From Ryan:
Some thoughts on this:

1) we don't want users who apt install cloud-init to also pull down python-devel, gcc and have to compile the extension, so please don't change package deps here

2) this is currently an issue for the tox/venv environment, so let's focus on how to enable the SafeLoader library for that path

3) Once we make (2) an optional testing path (env or parameter can trigger the dep install prior to calling tox/venv which triggers the compile during pip install of the module IIUC), we can have two travis paths, one without the library (representing how things work today) and one *with* the library enabled. This allows us to compare results, unittests all pass, and time it takes to complete (some estimate of the improvement it may provide).

4) With results from (3), we may propose changes to the underlying images (server seed, cloud-image seed, etc) which could ensure that the library is present in the default cloud images.

~chad.smith/cloud-init:cyaml-loading updated on 2017-05-04
35bb6bd... by Chad Smith on 2017-05-04

on CSafeLoader ImportError, load SafeLoader. Add unit tests to execise both paths.

b1833ad... by Chad Smith on 2017-05-04

add libyaml-0-2 as Recommends for cloud-init package

Chad Smith (chad.smith) wrote :

1) We won't have libyaml-dev as a dependency just the shared lib of libyaml-0-2. I've added it as a Recommends and put in logic to handle ImportErrors on CSafeLoader which fall back to SafeLoader

2) I've got a branch up in github which installs tox-needed system debs that we can review as part of the migration to github https://github.com/blackboxsw/cloud-init/pull/3/files

3) I have added a skipUnles decorator on unit tests to ensure we exercise the CSafeLoader path only if that module is present

4) depending on what we want to do with a strict yaml C-binding dependency, we might end up tweaking the debian/control file to just expect this as a mandatory deb (libyaml-0-2).

We might need further team discussion on whether that's the approach we want

Scott Moser (smoser) wrote :

> From Ryan:
> Some thoughts on this:
>
> 1) we don't want users who apt install cloud-init to also pull down python-
> devel, gcc and have to compile the extension, so please don't change package
> deps here

I don't follow this. In ubuntu at least cloud-init does depend on python3-yaml and python3-yaml on libyaml-0-2.

$ apt-cache show python3-yaml | grep Dep
Depends: python3 (<< 3.6), python3 (>= 3.5~), libc6 (>= 2.14), libyaml-0-2

There is no reason to explicitly depend on libyaml-0-2, and we have a fallback
path in the code.

Why did you think we need python-devel ? This is where tox falls apart honestly, when you get into C python extensions. But we already have some of those in the test path (pylxd).

> 2) this is currently an issue for the tox/venv environment, so let's focus on
> how to enable the SafeLoader library for that path

if we do skipIf, then the package build environment will run though the CLoader path, and the tox path would not. We could put in some way of failing if the package build environment didn't take the CLoader path. That'd ensure that we were getting tests on both paths.

I think that might be over-engineering.

> 3) Once we make (2) an optional testing path (env or parameter can trigger the
> dep install prior to calling tox/venv which triggers the compile during pip
> install of the module IIUC), we can have two travis paths, one without the
> library (representing how things work today) and one *with* the library
> enabled. This allows us to compare results, unittests all pass, and time it
> takes to complete (some estimate of the improvement it may provide).
>
> 4) With results from (3), we may propose changes to the underlying images
> (server seed, cloud-image seed, etc) which could ensure that the library is
> present in the default cloud images.

I think over-engineering, especially for the size of the gain. Its entirely possible that there is overhead of loading c that makes this path slower, meaning we'd be doing this work for negative result.

Best case, i think the amount of yaml that cloud-init loads is probably small enough to look like noise here.

Ryan Harper (raharper) wrote :

On Fri, May 5, 2017 at 8:21 AM, Scott Moser <email address hidden> wrote:

> > From Ryan:
> > Some thoughts on this:
> >
> > 1) we don't want users who apt install cloud-init to also pull down
> python-
> > devel, gcc and have to compile the extension, so please don't change
> package
> > deps here
>
> I don't follow this. In ubuntu at least cloud-init does depend on
> python3-yaml and python3-yaml on libyaml-0-2.
>
> $ apt-cache show python3-yaml | grep Dep
> Depends: python3 (<< 3.6), python3 (>= 3.5~), libc6 (>= 2.14), libyaml-0-2
>
> There is no reason to explicitly depend on libyaml-0-2, and we have a
> fallback
> path in the code.
>
> Why did you think we need python-devel ? This is where tox falls apart
> honestly, when you get into C python extensions. But we already have some
> of those in the test path (pylxd).
>

tox compiling

>
> > 2) this is currently an issue for the tox/venv environment, so let's
> focus on
> > how to enable the SafeLoader library for that path
>
> if we do skipIf, then the package build environment will run though the
> CLoader path, and the tox path would not. We could put in some way of
> failing if the package build environment didn't take the CLoader path.
> That'd ensure that we were getting tests on both paths.
>
> I think that might be over-engineering.
>
> > 3) Once we make (2) an optional testing path (env or parameter can
> trigger the
> > dep install prior to calling tox/venv which triggers the compile during
> pip
> > install of the module IIUC), we can have two travis paths, one without
> the
> > library (representing how things work today) and one *with* the library
> > enabled. This allows us to compare results, unittests all pass, and time
> it
> > takes to complete (some estimate of the improvement it may provide).
> >
> > 4) With results from (3), we may propose changes to the underlying images
> > (server seed, cloud-image seed, etc) which could ensure that the library
> is
> > present in the default cloud images.
>
> I think over-engineering, especially for the size of the gain. Its
> entirely possible that there is overhead of loading c that makes this path
> slower, meaning we'd be doing this work for negative result.
>
> Best case, i think the amount of yaml that cloud-init loads is probably
> small enough to look like noise here.
>

Ideally we'll generate some actual numbers here; if it's not worth it, then
we can dump the whole thing. If it is, ideally we'd have
a path through both w.r.t tox/unittesting; I can't say that it's
over-engineering to want to exercise both paths we might expect
cloud-init users might execute.

>
>
> --
> https://code.launchpad.net/~chad.smith/cloud-init/+git/
> cloud-init/+merge/323088
> Your team cloud init development team is requested to review the proposed
> merge of ~chad.smith/cloud-init:cyaml-loading into cloud-init:master.
>
> _______________________________________________
> Mailing list: https://launchpad.net/~cloud-init-dev
> Post to : <email address hidden>
> Unsubscribe : https://launchpad.net/~cloud-init-dev
> More help : https://help.launchpad.net/ListHelp
>

Scott Moser (smoser) wrote :

On Fri, 5 May 2017, Ryan Harper wrote:

> > Best case, i think the amount of yaml that cloud-init loads is probably
> > small enough to look like noise here.
> >
>
> Ideally we'll generate some actual numbers here; if it's not worth it, then
> we can dump the whole thing. If it is, ideally we'd have
> a path through both w.r.t tox/unittesting; I can't say that it's
> over-engineering to want to exercise both paths we might expect
> cloud-init users might execute.

Maybe.
There are literally hundreds of different paths that software under
cloud-init might take based on some condition. We should have reasonable
code coverage on code we write, but we can't possibly excercise all
possible paths of code underneith us.

As an example, consider the affect on code path through common libraries
based on locale. Should cloud-init run unit tests with multiple different
locales? If that is a non-trivial operation in development time or in
runtime, should we still do that?

It is quite reasonable that pyyaml project should test their coverage of
CLoader and the python path. It is also quite reasonable that cloud-init
assume that path function as it is expected to when it is available.

~chad.smith/cloud-init:cyaml-loading updated on 2017-05-08
7439b57... by Chad Smith on 2017-05-08

drop stray pdb from unit tests

Unmerged commits

7439b57... by Chad Smith on 2017-05-08

drop stray pdb from unit tests

b1833ad... by Chad Smith on 2017-05-04

add libyaml-0-2 as Recommends for cloud-init package

35bb6bd... by Chad Smith on 2017-05-04

on CSafeLoader ImportError, load SafeLoader. Add unit tests to execise both paths.

4813c9f... by Chad Smith on 2017-04-25

use CSafeLoader instead of SafeLoader for faster yaml processing speeds

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/cloudinit/safeyaml.py b/cloudinit/safeyaml.py
2index 7bcf9dd..43ad674 100644
3--- a/cloudinit/safeyaml.py
4+++ b/cloudinit/safeyaml.py
5@@ -4,10 +4,14 @@
6 #
7 # This file is part of cloud-init. See LICENSE file for license information.
8
9-import yaml
10+try:
11+ from yaml import CSafeLoader as SafeLoader # Speed things up if we can
12+except ImportError:
13+ from yaml import SafeLoader as SafeLoader
14+from yaml import load as yaml_load
15
16
17-class _CustomSafeLoader(yaml.SafeLoader):
18+class _CustomSafeLoader(SafeLoader):
19 def construct_python_unicode(self, node):
20 return self.construct_scalar(node)
21
22@@ -18,6 +22,6 @@ _CustomSafeLoader.add_constructor(
23
24
25 def load(blob):
26- return(yaml.load(blob, Loader=_CustomSafeLoader))
27+ return(yaml_load(blob, Loader=_CustomSafeLoader))
28
29 # vi: ts=4 expandtab
30diff --git a/packages/debian/control.in b/packages/debian/control.in
31index b58561e..cd08bda 100644
32--- a/packages/debian/control.in
33+++ b/packages/debian/control.in
34@@ -22,7 +22,7 @@ Depends: procps,
35 ${python},
36 ${misc:Depends},
37 ${${python}:Depends}
38-Recommends: eatmydata, sudo, software-properties-common, gdisk
39+Recommends: eatmydata, sudo, software-properties-common, gdisk, libyaml-0-2
40 XB-Python-Version: ${python:Versions}
41 Description: Init scripts for cloud instances
42 Cloud instances need special scripts to run during initialisation
43diff --git a/requirements.txt b/requirements.txt
44index 0c4951f..157592a 100644
45--- a/requirements.txt
46+++ b/requirements.txt
47@@ -24,10 +24,14 @@ oauthlib
48 # section)...
49 configobj>=5.0.2
50
51-# All new style configurations are in the yaml format
52+# cloudinit/safeyaml.py depends on libyaml c-bindings which are built by tox
53+# when libyaml-(dev|devel), libpython2.7-(dev|devel) and libpython3-(dev|devel)
54+# headers packages are installed. If tox fails importing CSafeLoader, we need
55+# these dev packages.
56+# All new style configurations are in the yaml format.
57 pyyaml
58
59-# The new main entrypoint uses argparse instead of optparse
60+# The new main entrypoint uses argparse instead of optparse
61 argparse
62
63 # Requests handles ssl correctly!
64diff --git a/tests/unittests/test_util.py b/tests/unittests/test_util.py
65index 5d21b4b..569cc5f 100644
66--- a/tests/unittests/test_util.py
67+++ b/tests/unittests/test_util.py
68@@ -7,6 +7,7 @@ import os
69 import shutil
70 import stat
71 import tempfile
72+from unittest import skipUnless
73
74 import six
75 import yaml
76@@ -19,6 +20,12 @@ try:
77 except ImportError:
78 import mock
79
80+try:
81+ from yaml import CSafeLoader
82+ _has_yaml_cbindings = True
83+except ImportError:
84+ _has_yaml_cbindings = False
85+
86
87 class FakeSelinux(object):
88
89@@ -263,6 +270,21 @@ class TestGetCmdline(helpers.TestCase):
90 class TestLoadYaml(helpers.TestCase):
91 mydefault = "7b03a8ebace993d806255121073fed52"
92
93+ @skipUnless(_has_yaml_cbindings, "No yaml CSafeloader present")
94+ def test_simple_with_csafeloader(self):
95+ """load_yaml uses CSafeLoader instead of SafeLoader when present."""
96+ mydata = {'1': "one", '2': "two"}
97+ # Setting SafeLoader as None to ensure we aren't loading SafeLoader
98+ with mock.patch.dict("sys.modules", values={"yaml.SafeLoader": None}):
99+ self.assertEqual(util.load_yaml(yaml.dump(mydata)), mydata)
100+
101+ def test_simple_without_csafeloader(self):
102+ """load_yaml falls back to SafeLoader when of CSafeLoader is absent."""
103+ mydata = {'1': "one", '2': "two"}
104+ # Setting CSafeLoader as None to ensure we try to load SafeLoader
105+ with mock.patch.dict("sys.modules", values={"yaml.CSafeLoader": None}):
106+ self.assertEqual(util.load_yaml(yaml.dump(mydata)), mydata)
107+
108 def test_simple(self):
109 mydata = {'1': "one", '2': "two"}
110 self.assertEqual(util.load_yaml(yaml.dump(mydata)), mydata)

Subscribers

People subscribed via source and target branches