Merge lp:~edouardb/cloud-init/scaleway-datasource into lp:~cloud-init-dev/cloud-init/trunk

Proposed by edouardb
Status: Rejected
Rejected by: Scott Moser
Proposed branch: lp:~edouardb/cloud-init/scaleway-datasource
Merge into: lp:~cloud-init-dev/cloud-init/trunk
Diff against target: 443 lines (+414/-2)
3 files modified
cloudinit/sources/DataSourceScaleway.py (+216/-0)
cloudinit/url_helper.py (+5/-2)
tests/unittests/test_datasource/test_scaleway.py (+193/-0)
To merge this branch: bzr merge lp:~edouardb/cloud-init/scaleway-datasource
Reviewer Review Type Date Requested Status
cloud-init Commiters Pending
Review via email: mp+274861@code.launchpad.net

Description of the change

Add Datasource for Scaleway's metadata service

To post a comment you must log in.
Revision history for this message
Julien Castets (jcastets) wrote :

Unlike other providers, the Scaleway user-data API is restricted to privileged ports (< 1024) to prevent non-root users accessing to it.

We added a new parameter to readurl to specify the requests session object, to bind on a specific port.

Revision history for this message
Scott Moser (smoser) wrote :

Hey,

This looks well done, thanks.
A couple comments

a.) we'll need some unit tests to ensure we dont inadvertently break this.
b.) is there some way (anyway) we can detect if we're on scaleway? As it is right now, it looks like we're just going to block and retry for the availability of the MD. That is much less than ideal, and the only current "on by default" datasource thtat does that is EC2 (which only gets that privilege from being first). Other vendors provide some dmi data or another quick local test.
c.) you'll need to sign the Canonical Contributors License Agreement (http://www.ubuntu.com/legal/contributors)
d.) vendor-data would be nice (and helpful to you as the operator of the cloud.

again, though. Thanks, it looks really good.
Feel free to ping in #cloud-init if you have questions.

1152. By Edouard Bonlieu <email address hidden>

Add a check to ensure we are on Scaleway

1153. By Edouard Bonlieu <email address hidden>

Pass userdata url and retries as parameters

1154. By Edouard Bonlieu <email address hidden>

Merge jcastets PR

Revision history for this message
Julien Castets (jcastets) wrote :

Hi,

a) Done
b) Done. Unfortunately, there's no way to ensure you're running on Scaleway without hitting a network resource
c) Done
d) Indeed. Can we consider adding them later?

Revision history for this message
Scott Moser (smoser) wrote :

Julien,

Thanks.
for 'd', sure you can add vendor-data later.
if its not in the cloud provider anyway, not much use for cluod-init to support it.

The big issue is just 'b'. We can't enable by default without solid way of knowing that we should hit a http source that might hang indefinitely.

I'll revewi shortly.

Revision history for this message
Julien Castets (jcastets) wrote :

Great, thanks :) Waiting for your review then.

Revision history for this message
Scott Moser (smoser) wrote :

two nitpicks. but this looks good other than failing tests.
b

Revision history for this message
Manfred Touron (moul) wrote :

To check if you are on scaleway:

    $ test -f /run/oc-metadata.cache; echo $?
    0

This file is populated when something fetches the api metadata.

Our initrd (https://github.com/scaleway/initrd/tree/master/Linux) will create this file on each boots automatically.

Revision history for this message
Scott Moser (smoser) wrote :

Hi Eduardo,
Looking at this again.

Could you please sign the contributors agreement please feel free to contact me if you have any questions (freenode 'smoser') http://www.ubuntu.com/legal/contributors

Second, i think the check for /run/oc-metadata.cache is probably ok, as long as that is created early enough in boot (prior to cloud-init.service or upstart job running).

Revision history for this message
Scott Moser (smoser) wrote :

Hello,
Thank you for taking the time to contribute to cloud-init. Cloud-init has moved its revision control system to git. As a result, we are marking all bzr merge proposals as 'rejected'. If you would like to re-submit this proposal for review, please do so by following the current HACKING documentation at http://cloudinit.readthedocs.io/en/latest/topics/hacking.html .

Unmerged revisions

1154. By Edouard Bonlieu <email address hidden>

Merge jcastets PR

1153. By Edouard Bonlieu <email address hidden>

Pass userdata url and retries as parameters

1152. By Edouard Bonlieu <email address hidden>

Add a check to ensure we are on Scaleway

1151. By Edouard Bonlieu <email address hidden>

Add Datasource for Scaleway's metadata service (https://www.scaleway.com)

1150. By Edouard Bonlieu <email address hidden>

Add optional session parameter to readurl

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== added file 'cloudinit/sources/DataSourceScaleway.py'
2--- cloudinit/sources/DataSourceScaleway.py 1970-01-01 00:00:00 +0000
3+++ cloudinit/sources/DataSourceScaleway.py 2015-10-28 09:51:17 +0000
4@@ -0,0 +1,216 @@
5+# vi: ts=4 expandtab
6+#
7+# Author: Edouard Bonlieu <ebonlieu@ocs.online.net>
8+#
9+# This program is free software: you can redistribute it and/or modify
10+# it under the terms of the GNU General Public License version 3, as
11+# published by the Free Software Foundation.
12+#
13+# This program is distributed in the hope that it will be useful,
14+# but WITHOUT ANY WARRANTY; without even the implied warranty of
15+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
16+# GNU General Public License for more details.
17+#
18+# You should have received a copy of the GNU General Public License
19+# along with this program. If not, see <http://www.gnu.org/licenses/>.
20+
21+import functools
22+import errno
23+import json
24+import time
25+
26+from requests.packages.urllib3.poolmanager import PoolManager
27+import requests
28+
29+from cloudinit import log as logging
30+from cloudinit import sources
31+from cloudinit import url_helper
32+from cloudinit import util
33+
34+
35+LOG = logging.getLogger(__name__)
36+
37+BUILTIN_DS_CONFIG = {
38+ 'metadata_url': 'http://169.254.42.42/conf?format=json',
39+ 'userdata_url': 'http://169.254.42.42/user_data/cloud-init'
40+}
41+
42+DEF_MD_RETRIES = 5
43+DEF_MD_TIMEOUT = 10
44+
45+
46+def on_scaleway(user_data_url, retries=5):
47+ """ Check if we are on Scaleway.
48+
49+ If Scaleway's user-data API isn't queried from a privileged source port
50+ (ie. below 1024), it returns HTTP/403.
51+ """
52+ for _ in range(retries):
53+ try:
54+ code = requests.head(user_data_url).status_code
55+ if code not in (403, 429) and code < 500:
56+ return False
57+ if code == 403:
58+ return True
59+ except (requests.exceptions.ConnectionError,
60+ requests.exceptions.Timeout):
61+ return False
62+
63+ time.sleep(1) # be nice, and wait a bit before retrying
64+ return False
65+
66+
67+class SourceAddressAdapter(requests.adapters.HTTPAdapter):
68+ """ Adapter for requests to choose the local address to bind to.
69+ """
70+
71+ def __init__(self, source_address, **kwargs):
72+ self.source_address = source_address
73+ super(SourceAddressAdapter, self).__init__(**kwargs)
74+
75+ def init_poolmanager(self, connections, maxsize, block=False):
76+ self.poolmanager = PoolManager(num_pools=connections,
77+ maxsize=maxsize,
78+ block=block,
79+ source_address=self.source_address)
80+
81+
82+def _get_user_data(userdata_address, timeout, retries, session):
83+ """ Retrieve user data.
84+
85+ Scaleway userdata API returns HTTP/404 if user data is not set.
86+
87+ This function wraps `url_helper.readurl` but instead of considering
88+ HTTP/404 as an error that requires a retry, it considers it as empty user
89+ data.
90+
91+ Also, user data API require the source port to be below 1024. If requests
92+ raises ConnectionError (EADDRINUSE), we raise immediately instead of
93+ retrying. This way, the caller can retry to call this function on an other
94+ port.
95+ """
96+ try:
97+ # exception_cb is used to re-raise the exception if the API responds
98+ # HTTP/404.
99+ resp = url_helper.readurl(
100+ userdata_address,
101+ data=None,
102+ timeout=timeout,
103+ retries=retries,
104+ session=session,
105+ exception_cb=lambda _, exc: exc.code == 404 or isinstance(
106+ exc.cause, requests.exceptions.ConnectionError
107+ )
108+ )
109+ return util.decode_binary(resp.contents)
110+ except url_helper.UrlError as exc:
111+ # Empty user data
112+ if exc.code == 404:
113+ return None
114+
115+ # `retries` is reached, re-raise
116+ raise
117+
118+
119+class DataSourceScaleway(sources.DataSource):
120+
121+ def __init__(self, sys_cfg, distro, paths):
122+ LOG.debug('Init scw')
123+ sources.DataSource.__init__(self, sys_cfg, distro, paths)
124+
125+ self.metadata = {}
126+ self.ds_cfg = util.mergemanydict([
127+ util.get_cfg_by_path(sys_cfg, ["datasource", "Scaleway"], {}),
128+ BUILTIN_DS_CONFIG
129+ ])
130+
131+ self.metadata_address = self.ds_cfg['metadata_url']
132+ self.userdata_address = self.ds_cfg['userdata_url']
133+
134+ self.retries = self.ds_cfg.get('retries', DEF_MD_RETRIES)
135+ self.timeout = self.ds_cfg.get('timeout', DEF_MD_TIMEOUT)
136+
137+ def _get_metadata(self):
138+ resp = url_helper.readurl(
139+ self.metadata_address,
140+ timeout=self.timeout,
141+ retries=self.retries
142+ )
143+ metadata = json.loads(util.decode_binary(resp.contents))
144+ LOG.debug('metadata downloaded')
145+
146+ # try to make a request on the first privileged port available
147+ for port in range(1, 1024):
148+ try:
149+ LOG.debug(
150+ 'Trying to get user data (bind on port %d)...' % port
151+ )
152+ session = requests.Session()
153+ session.mount(
154+ 'http://',
155+ SourceAddressAdapter(source_address=('0.0.0.0', port))
156+ )
157+ user_data = _get_user_data(
158+ self.userdata_address,
159+ timeout=self.timeout,
160+ retries=self.retries,
161+ session=session
162+ )
163+ LOG.debug('user-data downloaded')
164+ return metadata, user_data
165+
166+ except url_helper.UrlError as exc:
167+ # local port already in use, try next port
168+ if isinstance(exc.cause, requests.exceptions.ConnectionError):
169+ continue
170+
171+ # unexpected exception
172+ raise
173+
174+ def get_data(self):
175+ if on_scaleway(self.ds_cfg['userdata_url'], self.retries) is False:
176+ return False
177+
178+ metadata, metadata['user-data'] = self._get_metadata()
179+ self.metadata = {
180+ 'id': metadata['id'],
181+ 'hostname': metadata['hostname'],
182+ 'user-data': metadata['user-data'],
183+ 'ssh_public_keys': [
184+ key['key'] for key in metadata['ssh_public_keys']
185+ ]
186+ }
187+ return True
188+
189+ @property
190+ def launch_index(self):
191+ return None
192+
193+ def get_instance_id(self):
194+ return self.metadata['id']
195+
196+ def get_public_ssh_keys(self):
197+ return self.metadata['ssh_public_keys']
198+
199+ def get_hostname(self, fqdn=False, resolve_ip=False):
200+ return self.metadata['hostname']
201+
202+ def get_userdata_raw(self):
203+ return self.metadata['user-data']
204+
205+ @property
206+ def availability_zone(self):
207+ return None
208+
209+ @property
210+ def region(self):
211+ return None
212+
213+
214+datasources = [
215+ (DataSourceScaleway, (sources.DEP_FILESYSTEM, sources.DEP_NETWORK)),
216+]
217+
218+
219+def get_datasource_list(depends):
220+ return sources.list_from_depends(depends, datasources)
221
222=== modified file 'cloudinit/url_helper.py'
223--- cloudinit/url_helper.py 2015-09-29 21:17:49 +0000
224+++ cloudinit/url_helper.py 2015-10-28 09:51:17 +0000
225@@ -183,7 +183,8 @@
226
227 def readurl(url, data=None, timeout=None, retries=0, sec_between=1,
228 headers=None, headers_cb=None, ssl_details=None,
229- check_status=True, allow_redirects=True, exception_cb=None):
230+ check_status=True, allow_redirects=True, exception_cb=None,
231+ session=None):
232 url = _cleanurl(url)
233 req_args = {
234 'url': url,
235@@ -242,7 +243,9 @@
236 LOG.debug("[%s/%s] open '%s' with %s configuration", i,
237 manual_tries, url, filtered_req_args)
238
239- r = requests.request(**req_args)
240+ if session is None:
241+ session = requests.Session()
242+ r = session.request(**req_args)
243 if check_status:
244 r.raise_for_status()
245 LOG.debug("Read from %s (%s, %sb) after %s attempts", url,
246
247=== added file 'tests/unittests/test_datasource/test_scaleway.py'
248--- tests/unittests/test_datasource/test_scaleway.py 1970-01-01 00:00:00 +0000
249+++ tests/unittests/test_datasource/test_scaleway.py 2015-10-28 09:51:17 +0000
250@@ -0,0 +1,193 @@
251+#
252+# Copyright (C) 2015 Julien Castets
253+#
254+# Author: Julien Castets <castets.j@gmail.com>
255+#
256+# This program is free software: you can redistribute it and/or modify
257+# it under the terms of the GNU General Public License version 3, as
258+# published by the Free Software Foundation.
259+#
260+# This program is distributed in the hope that it will be useful,
261+# but WITHOUT ANY WARRANTY; without even the implied warranty of
262+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
263+# GNU General Public License for more details.
264+#
265+# You should have received a copy of the GNU General Public License
266+# along with this program. If not, see <http://www.gnu.org/licenses/>.
267+
268+import json
269+
270+import requests
271+
272+from cloudinit import settings
273+from cloudinit import helpers
274+from cloudinit.sources import DataSourceScaleway
275+
276+from .. import helpers as test_helpers
277+
278+
279+httpretty = test_helpers.import_httpretty()
280+
281+
282+class UserDataResponses(object):
283+ """ Possible responses of the API endpoint
284+ 169.254.42.42/user_data/cloud-init.
285+
286+ HEAD requests are made to check if the server is on Scaleway.
287+ GET requests are made to get user data.
288+ """
289+
290+ FAKE_USER_DATA = '#!/bin/bash\necho "user-data"'
291+
292+ @staticmethod
293+ def head_ok(method, uri, headers):
294+ """ To ensure it's running on Scaleway, the datasource makes a HEAD
295+ request to 169.254.42.42/user_data/cloud-init and expects a HTTP/403
296+ response, because this endpoint needs to be queried with a privileged
297+ source port (below 2014).
298+ """
299+ return 403, headers, ''
300+
301+ @staticmethod
302+ def connection_error(method, uri, headers):
303+ """ Unable to connect to the user data API.
304+ """
305+ raise requests.exceptions.ConnectionError()
306+
307+ @staticmethod
308+ def rate_limited(method, uri, headers):
309+ return 429, headers, ''
310+
311+ @staticmethod
312+ def api_error(method, uri, headers):
313+ return 500, headers, ''
314+
315+ @classmethod
316+ def get_ok(cls, method, uri, headers):
317+ return 200, headers, cls.FAKE_USER_DATA
318+
319+ @staticmethod
320+ def empty(method, uri, headers):
321+ """ No user data for this server.
322+ """
323+ return 404, headers, ''
324+
325+
326+class MetadataResponses(object):
327+ """ Possible responses of the metadata API.
328+ """
329+
330+ FAKE_METADATA = {
331+ 'id': '00000000-0000-0000-0000-000000000000',
332+ 'hostname': 'scaleway.host',
333+ 'ssh_public_keys': [{
334+ 'key': 'ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABA',
335+ 'fingerprint': '2048 06:ae:... login (RSA)'
336+ }, {
337+ 'key': 'ssh-rsa AAAAB3NzaC1yc2EAAAADAQABCCCCC',
338+ 'fingerprint': '2048 06:ff:... login2 (RSA)'
339+ }]
340+ }
341+
342+ @classmethod
343+ def get_ok(cls, method, uri, headers):
344+ return 200, headers, json.dumps(cls.FAKE_METADATA)
345+
346+
347+class TestDataSourceScaleway(test_helpers.HttprettyTestCase):
348+
349+ def setUp(self):
350+ self.datasource = DataSourceScaleway.DataSourceScaleway(
351+ settings.CFG_BUILTIN, None, helpers.Paths({})
352+ )
353+ super(TestDataSourceScaleway, self).setUp()
354+
355+ self.metadata_url = \
356+ DataSourceScaleway.BUILTIN_DS_CONFIG['metadata_url']
357+ self.userdata_url = \
358+ DataSourceScaleway.BUILTIN_DS_CONFIG['userdata_url']
359+
360+ @httpretty.activate
361+ @test_helpers.mock.patch('time.sleep', return_value=None)
362+ def test_on_scaleway(self, sleep):
363+ # Test API ok
364+ httpretty.register_uri(httpretty.HEAD, self.userdata_url,
365+ body=UserDataResponses.head_ok)
366+ self.assertTrue(DataSourceScaleway.on_scaleway(self.userdata_url))
367+
368+ # API returns something else than 403: we're not on scaleway
369+ httpretty.register_uri(httpretty.HEAD, self.userdata_url,
370+ body='ok')
371+ self.assertFalse(DataSourceScaleway.on_scaleway(self.userdata_url))
372+
373+ # Connection error
374+ httpretty.register_uri(httpretty.HEAD, self.userdata_url,
375+ body=UserDataResponses.connection_error)
376+ self.assertFalse(DataSourceScaleway.on_scaleway(self.userdata_url))
377+
378+ # Rate limited 2 times, then API error, then ok
379+ httpretty.register_uri(
380+ httpretty.HEAD, self.userdata_url,
381+ responses=[
382+ httpretty.Response(body=UserDataResponses.rate_limited),
383+ httpretty.Response(body=UserDataResponses.rate_limited),
384+ httpretty.Response(body=UserDataResponses.api_error),
385+ httpretty.Response(body=UserDataResponses.head_ok),
386+ ]
387+ )
388+ self.assertTrue(DataSourceScaleway.on_scaleway(self.userdata_url))
389+ self.assertEqual(sleep.call_count, 3)
390+
391+ @httpretty.activate
392+ @test_helpers.mock.patch('time.sleep', return_value=None)
393+ def test_metadata(self, sleep):
394+ # Not on scaleway
395+ httpretty.register_uri(httpretty.HEAD, self.userdata_url,
396+ body=UserDataResponses.connection_error)
397+ self.assertFalse(self.datasource.get_data())
398+
399+ # Make on_scaleway return true
400+ httpretty.register_uri(httpretty.HEAD, self.userdata_url,
401+ body=UserDataResponses.head_ok)
402+
403+ # Make user data API return a valid response
404+ httpretty.register_uri(httpretty.GET, self.metadata_url,
405+ body=MetadataResponses.get_ok)
406+ httpretty.register_uri(httpretty.GET, self.userdata_url,
407+ body=UserDataResponses.get_ok)
408+ self.datasource.get_data()
409+
410+ self.assertEqual(self.datasource.get_instance_id(),
411+ MetadataResponses.FAKE_METADATA['id'])
412+ self.assertEqual(self.datasource.get_public_ssh_keys(), [
413+ elem['key'] for elem in
414+ MetadataResponses.FAKE_METADATA['ssh_public_keys']
415+ ])
416+ self.assertEqual(self.datasource.get_hostname(),
417+ MetadataResponses.FAKE_METADATA['hostname'])
418+ self.assertEqual(self.datasource.get_userdata_raw(),
419+ UserDataResponses.FAKE_USER_DATA)
420+ self.assertIsNone(self.datasource.availability_zone)
421+ self.assertIsNone(self.datasource.region)
422+
423+ # Make user data API return HTTP/404, which means there is no user data
424+ # for the server.
425+ httpretty.register_uri(httpretty.GET, self.userdata_url,
426+ body=UserDataResponses.empty)
427+ self.datasource.get_data()
428+ self.assertIsNone(self.datasource.get_userdata_raw())
429+
430+ # Make user data API rate limit 2 times, then ConnectionError (ie.
431+ # local port is used), then API ok
432+ httpretty.register_uri(
433+ httpretty.GET, self.userdata_url,
434+ responses=[
435+ httpretty.Response(body=UserDataResponses.rate_limited),
436+ httpretty.Response(body=UserDataResponses.rate_limited),
437+ httpretty.Response(body=UserDataResponses.connection_error),
438+ httpretty.Response(body=UserDataResponses.get_ok),
439+ ]
440+ )
441+ self.datasource.get_data()
442+ self.assertEqual(self.datasource.get_userdata_raw(),
443+ UserDataResponses.FAKE_USER_DATA)