Merge ~mgerdts/cloud-init:lp1746605 into cloud-init:master

Proposed by Mike Gerdts
Status: Merged
Approved by: Scott Moser
Approved revision: c24efa09f7a3bc00157eabd5761c307af7229d3a
Merge reported by: Scott Moser
Merged at revision: 4ed164592fe8cb15758cacf3cb3f8c7d5ab7c82e
Proposed branch: ~mgerdts/cloud-init:lp1746605
Merge into: cloud-init:master
Diff against target: 118 lines (+67/-2)
2 files modified
cloudinit/sources/ (+2/-0)
tests/unittests/test_datasource/ (+65/-2)
Reviewer Review Type Date Requested Status
Scott Moser Approve
Server Team CI bot continuous-integration Approve
Review via email:

Commit message

DataSourceSmartOS: add locking of serial device.

cloud-init and mdata-get each have their own implementation of the
SmartOS metadata protocol. If cloud-init and other services that call
mdata-get are run concurrently, crosstalk on the serial port can cause
them both to become confused.

This change makes it so that cloud-init uses the same cooperative
locking scheme that's used by mdata-get, thus preventing cross-talk
between mdata-get and cloud-init.

For testing, a VM running on a SmartOS host and pyserial are required.
If the tests are run on a platform other than SmartOS, those that use a
real serial port are skipped. pyserial remains commented in
requirements.txt because most testers will not be running atop SmartOS.

LP: #1746605

To post a comment you must log in.
Revision history for this message
Mike Gerdts (mgerdts) wrote :
Download full text (4.8 KiB)

The fix for bug #1667735 is a better fix for the retries that was originally part of this via

This merge proposal obsoletes merge proposal 337271.

The change to DataSourceSmartOS is now quite small. The new tests will only run on VMs running on a SmartOS host and then only if they have write access to /dev/ttyS1. This test code includes Python 2.6 fixes offered by smoser as part of the previous merge proposal.

Proof that the new test runs under tox:

$ tox tests.unittests.test_datasource.test_smartos.TestSerialConcurrency
GLOB sdist-make: /home/mgerdts/cloud-init/
py27 inst-nodeps: /home/mgerdts/cloud-init/.tox/dist/
py27 installed: certifi==2018.1.18,chardet==3.0.4,cloud-init==18.2,configobj==5.0.6,contextlib2==0.5.5,coverage==4.5.1,funcsigs==1.0.2,functools32==3.2.3.post2,httpretty==0.8.14,idna==2.6,Jinja2==2.10,jsonpatch==1.23,jsonpointer==2.0,jsonschema==2.6.0,linecache2==1.0.0,MarkupSafe==1.0,mock==2.0.0,nose==1.3.7,oauthlib==2.0.7,pbr==4.0.2,pkg-resources==0.0.0,pyserial==3.4,PyYAML==3.12,requests==2.18.4,six==1.11.0,traceback2==1.4.0,unittest2==1.1.0,urllib3==1.22
py27 runtests: PYTHONHASHSEED='1425861219'
py27 runtests: commands[0] | python -m nose tests.unittests.test_datasource.test_smartos.TestSerialConcurrency

Ran 0 tests in 0.000s

py3 inst-nodeps: /home/mgerdts/cloud-init/.tox/dist/
py3 installed: certifi==2018.1.18,chardet==3.0.4,cloud-init==18.2,configobj==5.0.6,contextlib2==0.5.5,coverage==4.5.1,httpretty==0.8.14,idna==2.6,Jinja2==2.10,jsonpatch==1.23,jsonpointer==2.0,jsonschema==2.6.0,linecache2==1.0.0,MarkupSafe==1.0,mock==2.0.0,nose==1.3.7,nose-timer==0.7.1,oauthlib==2.0.7,pbr==4.0.2,pkg-resources==0.0.0,pyserial==3.4,PyYAML==3.12,requests==2.18.4,six==1.11.0,termcolor==1.1.0,traceback2==1.4.0,unittest2==1.1.0,urllib3==1.22
py3 runtests: PYTHONHASHSEED='1425861219'
py3 runtests: commands[0] | /home/mgerdts/cloud-init/.tox/py3/bin/python -m nose --with-timer --timer-top-n 10 tests.unittests.test_datasource.test_smartos.TestSerialConcurrency
[success] 100.00% tests.unittests.test_datasource.test_smartos.TestSerialConcurrency.test_all_keys: 22.6319s
Ran 1 test in 22.633s

flake8 inst-nodeps: /home/mgerdts/cloud-init/.tox/dist/
flake8 installed: certifi==2018.1.18,chardet==3.0.4,cloud-init==18.2,configobj==5.0.6,flake8==3.3.0,hacking==0.13.0,idna==2.6,Jinja2==2.10,jsonpatch==1.23,jsonpointer==2.0,jsonschema==2.6.0,MarkupSafe==1.0,mccabe==0.6.1,oauthlib==2.0.7,pbr==4.0.2,pep8==1.5.7,pkg-resources==0.0.0,pycodestyle==2.3.1,pyflakes==1.5.0,PyYAML==3.12,requests==2.18.4,six==1.11.0,urllib3==1.22
flake8 runtests: PYTHONHASHSEED='1425861219'
flake8 runtests: commands[0] | /home/mgerdts/cloud-init/.tox/flake8/bin/python -m flake8 tests.unittests.test_datasource.test_smartos.TestSerialConcurrency
xenial inst-nodeps: /home/mgerdts/cloud-init/.tox/dist/
xenial installed: cloud-init==18.2,configobj==5.0.6,contextlib2==0.5.1,...


Revision history for this message
Server Team CI bot (server-team-bot) wrote :

PASSED: Continuous integration, rev:
Executed test runs:
    SUCCESS: Checkout
    SUCCESS: Unit & Style Tests
    SUCCESS: Ubuntu LTS: Build
    SUCCESS: Ubuntu LTS: Integration
    SUCCESS: MAAS Compatability Testing
    IN_PROGRESS: Declarative: Post Actions

Click here to trigger a rebuild:

review: Approve (continuous-integration)
Revision history for this message
Scott Moser (smoser) wrote :

fcntl.lockf(fp, fcntl.LOCK_EX)

Do we not need to unlock at some point ? (fcntl.lockf(fp, fcntl.LOCK_UN)
Or are you relying on the lock being let go of on process exit?
or am I just missing something entirely?

Revision history for this message
Mike Gerdts (mgerdts) wrote :

The lock is automatically released on close() or process exit.

On Fri, Apr 20, 2018 at 1:33 PM, Scott Moser <email address hidden>

> fcntl.lockf(fp, fcntl.LOCK_EX)
> Do we not need to unlock at some point ? (fcntl.lockf(fp, fcntl.LOCK_UN)
> Or are you relying on the lock being let go of on process exit?
> or am I just missing something entirely?
> --
> init/+merge/343576
> You are the owner of ~mgerdts/cloud-init:lp1746605.

Revision history for this message
Scott Moser (smoser) :
review: Approve
Revision history for this message
Scott Moser (smoser) wrote :


I enabled unused variables warnings in pylint a few commits
back. if you rebase this you'll see the errors.

the warning looks like this:
  tests/unittests/test_datasource/ [W0612(unused-variable), TestSerialConcurrency.test_all_keys] Unused variable 'it'

The fix is just to name that variable '_it' rather than 'it'. Or
even just '_'.

then it wont' complain.

Easiest thing is for you to commit that and push.
as it is right nwo our tools rebase yo to trunk and run tox and then merge... and if i want to use the tools (I do) then you fixing this branch is the easiest thing.

~mgerdts/cloud-init:lp1746605 updated
2b4bdbb... by Mike Gerdts

feedback from smoser: unused variable

Revision history for this message
Scott Moser (smoser) wrote :

An upstream commit landed for this bug.

To view that commit see the following URL:

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/cloudinit/sources/ b/cloudinit/sources/
2index 0ef1003..4ea00eb 100644
3--- a/cloudinit/sources/
4+++ b/cloudinit/sources/
5@@ -23,6 +23,7 @@
6 import base64
7 import binascii
8 import errno
9+import fcntl
10 import json
11 import os
12 import random
13@@ -526,6 +527,7 @@ class JoyentMetadataSerialClient(JoyentMetadataClient):
14 if not ser.isOpen():
15 raise SystemError("Unable to open %s" % self.device)
16 self.fp = ser
17+ fcntl.lockf(ser, fcntl.LOCK_EX)
18 self._flush()
19 self._negotiate()
21diff --git a/tests/unittests/test_datasource/ b/tests/unittests/test_datasource/
22index b926263..706e8eb 100644
23--- a/tests/unittests/test_datasource/
24+++ b/tests/unittests/test_datasource/
25@@ -16,23 +16,27 @@ from __future__ import print_function
27 from binascii import crc32
28 import json
29+import multiprocessing
30 import os
31 import os.path
32 import re
33 import shutil
34+import signal
35 import stat
36 import tempfile
37+import unittest2
38 import uuid
40 from cloudinit import serial
41 from cloudinit.sources import DataSourceSmartOS
42 from cloudinit.sources.DataSourceSmartOS import (
43- convert_smartos_network_data as convert_net)
44+ convert_smartos_network_data as convert_net,
45+ SMARTOS_ENV_KVM, SERIAL_DEVICE, get_smartos_environ)
47 import six
49 from cloudinit import helpers as c_helpers
50-from cloudinit.util import b64e
51+from cloudinit.util import (b64e, subp)
53 from cloudinit.tests.helpers import mock, FilesystemMockingTestCase, TestCase
55@@ -1023,4 +1027,63 @@ class TestNetworkConversion(TestCase):
56 found = convert_net(SDC_NICS_SINGLE_GATEWAY)
57 self.assertEqual(expected, found)
60+@unittest2.skipUnless(get_smartos_environ() == SMARTOS_ENV_KVM,
61+ "Only supported on KVM and bhyve guests under SmartOS")
62+@unittest2.skipUnless(os.access(SERIAL_DEVICE, os.W_OK),
63+ "Requires write access to " + SERIAL_DEVICE)
64+class TestSerialConcurrency(TestCase):
65+ """
66+ This class tests locking on an actual serial port, and as such can only
67+ be run in a kvm or bhyve guest running on a SmartOS host. A test run on
68+ a metadata socket will not be valid because a metadata socket ensures
69+ there is only one session over a connection. In contrast, in the
70+ absence of proper locking multiple processes opening the same serial
71+ port can corrupt each others' exchanges with the metadata server.
72+ """
73+ def setUp(self):
74+ self.mdata_proc = multiprocessing.Process(target=self.start_mdata_loop)
75+ self.mdata_proc.start()
76+ super(TestSerialConcurrency, self).setUp()
78+ def tearDown(self):
79+ # os.kill() rather than mdata_proc.terminate() to avoid console spam.
80+ os.kill(, signal.SIGKILL)
81+ self.mdata_proc.join()
82+ super(TestSerialConcurrency, self).tearDown()
84+ def start_mdata_loop(self):
85+ """
86+ The mdata-get command is repeatedly run in a separate process so
87+ that it may try to race with metadata operations performed in the
88+ main test process. Use of mdata-get is better than two processes
89+ using the protocol implementation in DataSourceSmartOS because we
90+ are testing to be sure that cloud-init and mdata-get respect each
91+ others locks.
92+ """
93+ rcs = list(range(0, 256))
94+ while True:
95+ subp(['mdata-get', 'sdc:routes'], rcs=rcs)
97+ def test_all_keys(self):
98+ self.assertIsNotNone(
99+ ds = DataSourceSmartOS
100+ keys = [tup[0] for tup in ds.SMARTOS_ATTRIB_MAP.values()]
101+ keys.extend(ds.SMARTOS_ATTRIB_JSON.values())
103+ client = ds.jmc_client_factory()
104+ self.assertIsNotNone(client)
106+ # The behavior that we are testing for was observed mdata-get running
107+ # 10 times at roughly the same time as cloud-init fetched each key
108+ # once. cloud-init would regularly see failures before making it
109+ # through all keys once.
110+ for _ in range(0, 3):
111+ for key in keys:
112+ # We don't care about the return value, just that it doesn't
113+ # thrown any exceptions.
114+ client.get(key)
116+ self.assertIsNone(self.mdata_proc.exitcode)
118 # vi: ts=4 expandtab


People subscribed via source and target branches