Merge ~ogayot/curtin:sfdisk-accept-utf8 into curtin:master

Proposed by Olivier Gayot
Status: Merged
Merged at revision: fb70e57035375abc63c7c2c757c4c64ab928d5b4
Proposed branch: ~ogayot/curtin:sfdisk-accept-utf8
Merge into: curtin:master
Diff against target: 192 lines (+87/-13)
5 files modified
curtin/commands/block_meta_v2.py (+22/-1)
curtin/udev.py (+17/-3)
tests/integration/test_block_meta.py (+6/-2)
tests/unittests/helpers.py (+2/-2)
tests/unittests/test_commands_block_meta.py (+40/-5)
Reviewer Review Type Date Requested Status
Dan Bungert Approve
Server Team CI bot continuous-integration Approve
curtin developers Pending
Review via email: mp+442225@code.launchpad.net

Commit message

block_meta_v2: make GPT partition name support more robust

    In some cases, the sfdisk script will include GPT partition names. GPT
    partition names support characters outside the ASCII range (they are
    stored in UTF-16 in the GPT metadata). Therefore, it is currently a
    wrong assumption that the sfdisk input is always valid ASCII.

    Furthermore, we inject the name straight inside the sfdisk input script
    (surrounded with double quotes)..
    This can result in various problems if the name contains a '"' character
    (the sfdisk field delimiter) or other characters such as '\\'.

    Fixed by converting all the characters of the partition names into their
    utf-8 \x notation. sfdisk natively supports reading this format. This
    will have a slight implact on the readability of the sfdisk script, but
    should make it way more robust.

udev: fix crash when PARTNAME is invalid utf-8

    (We already merged a similar fix in probert)

    GPT partition names are basically free-text fields but the udev PARTNAME
    field ends up butchered by the kernel when it contains characters
    outside the ASCII range.

    When using pyudev, the values are all expected to be decodable using the
    system's default encoding.

    In curtin, it can result in a crash when iterating over all the
    properties of a block device, that are mapped by pyudev.

    To work around the issue, we now stop iterating over all the values. We
    iterate over the keys and only access the value if we need it.

tests: make random_string a staticmethod instead of classmethod

    CiTestCase.random_string was decorated with @classmethod.
    I guess the purpose was to make the function usable without a CiTestCase
    instance. However, @classmethod is used when a function should return an
    instance of the class.

    This is not what we want. @staticmethod is the right decorator in this
    context.

Description of the change

GPT partition names are basically free-text, but curtin expects the sfdisk input script to be ASCII.

We also inject the partition name in the sfdisk script without properly sanitizing the name. This results in problems if the name contains '"' or other characters that have a special meaning in sfdisk.

Fixed by converting the partition name to a utf-8 hex notation.

e.g.

name="name is réservé"

becomes:

name="name\x20is\x20r\xc3\xa9serv\xc3\xa9"

To post a comment you must log in.
Revision history for this message
Server Team CI bot (server-team-bot) wrote :
review: Approve (continuous-integration)
Revision history for this message
Server Team CI bot (server-team-bot) wrote :
review: Needs Fixing (continuous-integration)
Revision history for this message
Server Team CI bot (server-team-bot) wrote :
review: Approve (continuous-integration)
Revision history for this message
Dan Bungert (dbungert) wrote :

Thanks, this all looks reasonable.
An integration test would be nice - I think it should play out similar to test_gpt_uuid_persistent. WDYT?

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Off that was a bit of a brain-o on my part. As Dan says, an integration test would be nice.

Revision history for this message
Olivier Gayot (ogayot) wrote :

> Thanks, this all looks reasonable.
An integration test would be nice - I think it should play out similar to test_gpt_uuid_persistent. WDYT?

Thanks! There was test_gpt_name_persistent that existed already. But since you asked, I've added a new test that covers more use-cases (accents + other unicode characters). Hope that's better.

> Off that was a bit of a brain-o on my part. As Dan says, an integration test would be nice.

Done!

NOTE: I'm reluctant to let the CI squash the 3 commits together. If you don't mind, let's merge using the merge guidelines rather than leaving that up to the CI. Thanks!

Revision history for this message
Olivier Gayot (ogayot) :
Revision history for this message
Server Team CI bot (server-team-bot) wrote :
review: Approve (continuous-integration)
Revision history for this message
Server Team CI bot (server-team-bot) wrote :
review: Approve (continuous-integration)
Revision history for this message
Dan Bungert (dbungert) wrote :

> I've added a new test that covers more use-cases (accents + other unicode characters). Hope that's better.

Absolutely, thank you.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/curtin/commands/block_meta_v2.py b/curtin/commands/block_meta_v2.py
2index ae0b058..7f5ebce 100644
3--- a/curtin/commands/block_meta_v2.py
4+++ b/curtin/commands/block_meta_v2.py
5@@ -26,6 +26,23 @@ from curtin.storage_config import (
6 from curtin.udev import udevadm_settle
7
8
9+def to_utf8_hex_notation(string: str) -> str:
10+ ''' Convert a string into a valid ASCII string where all characters outside
11+ the alphanumerical range (according to bytes.isalnum()) are translated to
12+ their corresponding \\x notation. E.g.:
13+ to_utf8_hex_notation("hello") => "hello"
14+ to_utf8_hex_notation("réservée") => "r\\xc3\\xa9serv\\xc3\\xa9e"
15+ to_utf8_hex_notation("sp ace") => "sp\\x20ace"
16+ '''
17+ result = ''
18+ for c in bytearray(string, 'utf-8'):
19+ if bytes([c]).isalnum():
20+ result += bytes([c]).decode()
21+ else:
22+ result += f'\\x{c:02x}'
23+ return result
24+
25+
26 @attr.s(auto_attribs=True)
27 class PartTableEntry:
28 # The order listed here matches the order sfdisk represents these fields
29@@ -49,7 +66,11 @@ class PartTableEntry:
30 if v is not None:
31 r += ' {}={}'.format(a, v)
32 if self.name is not None:
33- r += ' name="{}"'.format(self.name)
34+ # Partition names are basically free-text fields. Injecting some
35+ # characters such as '"', '\' and '\n' will result in lots of
36+ # trouble. Fortunately, sfdisk supports \x notation, so we can
37+ # rely on it.
38+ r += ' name="{}"'.format(to_utf8_hex_notation(self.name))
39 if self.attrs:
40 r += ' attrs="{}"'.format(' '.join(self.attrs))
41 if self.bootable:
42diff --git a/curtin/udev.py b/curtin/udev.py
43index fa4f940..c43e7a5 100644
44--- a/curtin/udev.py
45+++ b/curtin/udev.py
46@@ -131,11 +131,25 @@ def udevadm_info(path=None):
47
48 def udev_all_block_device_properties():
49 import pyudev
50- props = []
51+ devices_props = []
52 c = pyudev.Context()
53 for device in c.list_devices(subsystem='block'):
54- props.append(dict(device.properties))
55- return props
56+ # When dereferencing device[prop], pyudev calls bytes.decode(), which
57+ # can fail if the value is invalid utf-8. We don't want a single
58+ # invalid value to completely prevent probing. So we iterate
59+ # over each value manually and ignore those which are invalid. We know
60+ # that PARTNAME is subject to failures when accents and other special
61+ # characters are used in a GPT partition name.
62+ # See LP: 2017862
63+ props = {}
64+ for prop in device.properties:
65+ try:
66+ props[prop] = device.properties[prop]
67+ except UnicodeDecodeError:
68+ LOG.warning('ignoring property %s because it is not valid'
69+ ' utf-8', prop)
70+ devices_props.append(props)
71+ return devices_props
72
73
74 # vi: ts=4 expandtab syntax=python
75diff --git a/tests/integration/test_block_meta.py b/tests/integration/test_block_meta.py
76index b25d2ce..d701ca0 100644
77--- a/tests/integration/test_block_meta.py
78+++ b/tests/integration/test_block_meta.py
79@@ -1084,9 +1084,13 @@ class TestBlockMeta(IntegrationTestCase):
80 actual_name = sfdisk_info['partitions'][0]['name']
81 self.assertEqual(name, actual_name)
82
83- def test_gpt_name_persistent(self):
84+ @parameterized.expand([
85+ ('random', CiTestCase.random_string(),),
86+ # "écrasé" means "overwritten"
87+ ('unicode', "'name' must not be écrasé/덮어쓴!"),
88+ ])
89+ def test_gpt_name_persistent(self, title, name):
90 self.img = self.tmp_path('image.img')
91- name = self.random_string()
92 config = StorageConfigBuilder(version=2)
93 config.add_image(path=self.img, size='20M', ptable='gpt')
94 p1 = config.add_part(number=1, offset=1 << 20, size=18 << 20,
95diff --git a/tests/unittests/helpers.py b/tests/unittests/helpers.py
96index 819e2c5..93b5b5b 100644
97--- a/tests/unittests/helpers.py
98+++ b/tests/unittests/helpers.py
99@@ -152,8 +152,8 @@ class CiTestCase(TestCase):
100 return os.path.normpath(
101 os.path.abspath(os.path.sep.join((_dir, path))))
102
103- @classmethod
104- def random_string(cls, length=8):
105+ @staticmethod
106+ def random_string(length=8):
107 """ return a random lowercase string with default length of 8"""
108 return ''.join(
109 random.choice(string.ascii_lowercase) for _ in range(length))
110diff --git a/tests/unittests/test_commands_block_meta.py b/tests/unittests/test_commands_block_meta.py
111index dc03359..5599886 100644
112--- a/tests/unittests/test_commands_block_meta.py
113+++ b/tests/unittests/test_commands_block_meta.py
114@@ -25,6 +25,34 @@ def random_uuid():
115 empty_context = block_meta.BlockMetaContext({})
116
117
118+class TestToUTF8HexNotation(CiTestCase):
119+ def test_alpha(self):
120+ self.assertEqual(
121+ block_meta_v2.to_utf8_hex_notation("HelloWorld"), "HelloWorld")
122+
123+ def test_alphanum(self):
124+ self.assertEqual(
125+ block_meta_v2.to_utf8_hex_notation("Hello1234"), "Hello1234")
126+
127+ def test_alnum_space(self):
128+ self.assertEqual(
129+ block_meta_v2.to_utf8_hex_notation("Hello 1234"),
130+ "Hello\\x201234")
131+
132+ def test_with_accent(self):
133+ # '\xe9'.isalpha(), which is equivalent to 'é'.isalpha(), is True
134+ # because 'é' is considered a letter according to the unicode standard.
135+ # b'\xe9'.isalpha(), on the other hand, is False.
136+ self.assertEqual(
137+ block_meta_v2.to_utf8_hex_notation("réservée"),
138+ "r\\xc3\\xa9serv\\xc3\\xa9e")
139+
140+ def test_hangul(self):
141+ self.assertEqual(
142+ block_meta_v2.to_utf8_hex_notation("리눅스"),
143+ '\\xeb\\xa6\\xac\\xeb\\x88\\x85\\xec\\x8a\\xa4')
144+
145+
146 class TestGetPathToStorageVolume(CiTestCase):
147
148 def setUp(self):
149@@ -3050,14 +3078,20 @@ label: gpt
150 table.add(dict(number=1, offset=1 << 20, size=9 << 20, flag='boot',
151 partition_name=name))
152 type_id = "C12A7328-F81F-11D2-BA4B-00A0C93EC93B"
153+ to_hex = block_meta_v2.to_utf8_hex_notation
154 expected = f'''\
155 label: gpt
156
157-1: start=2048 size=18432 type={type_id} name="{name}"'''
158+1: start=2048 size=18432 type={type_id} name="{to_hex(name)}"'''
159 self.assertEqual(expected, table.render())
160
161- def test_gpt_name_spaces(self):
162- name = self.random_string() + " " + self.random_string()
163+ def test_gpt_name_free_text(self):
164+ name = 'my "분할" réservée'
165+ expected_name = ''.join([
166+ 'my',
167+ '\\x20\\x22\\xeb\\xb6\\x84\\xed\\x95\\xa0\\x22',
168+ '\\x20r\\xc3\\xa9serv\\xc3\\xa9e',
169+ ])
170 table = block_meta_v2.GPTPartTable(512)
171 table.add(dict(number=1, offset=1 << 20, size=9 << 20, flag='boot',
172 partition_name=name))
173@@ -3065,7 +3099,7 @@ label: gpt
174 expected = f'''\
175 label: gpt
176
177-1: start=2048 size=18432 type={type_id} name="{name}"'''
178+1: start=2048 size=18432 type={type_id} name="{expected_name}"'''
179 self.assertEqual(expected, table.render())
180
181 def test_gpt_attrs_none(self):
182@@ -3255,8 +3289,9 @@ label: dos
183 number=1, start=2, size=3, type='04', bootable=False,
184 uuid=None, name=None, attrs=None)
185 pte.preserve({'uuid': uuid, 'name': name, 'attrs': attrs})
186+ to_hex = block_meta_v2.to_utf8_hex_notation
187 expected = f'1: start=2 size=3 type=04 uuid={uuid} ' + \
188- f'name="{name}" attrs="{attrs}"'
189+ f'name="{to_hex(name)}" attrs="{attrs}"'
190 self.assertEqual(expected, pte.render())
191
192 def test_v2_dos_is_logical(self):

Subscribers

People subscribed via source and target branches