Merge ~raharper/curtin:fix/udevadm-info-shlex-quote into curtin:master

Proposed by Ryan Harper
Status: Merged
Approved by: Ryan Harper
Approved revision: 5ceb185c76810d3052190b73451cd3e02479a966
Merge reported by: Server Team CI bot
Merged at revision: not available
Proposed branch: ~raharper/curtin:fix/udevadm-info-shlex-quote
Merge into: curtin:master
Diff against target: 198 lines (+122/-8)
3 files modified
curtin/udev.py (+25/-2)
tests/data/udevadm_info_sandisk_cruzer.txt (+54/-0)
tests/unittests/test_udev.py (+43/-6)
Reviewer Review Type Date Requested Status
Dan Watkins (community) Approve
Server Team CI bot continuous-integration Approve
Shawn Weeks (community) Needs Information
Review via email: mp+382993@code.launchpad.net

Commit message

udev: use shlex.quote when shlex.split errors on shell-escape chars

The udev database may include shell escape characters in the output.
We want to avoid using shlex_quote unless needed as curtin does not
use the quoted value in our code and applying it to all of the values
breaks parsing of the udevadm info content.

If we encounter a ValueError while invoking shlex.split() then we
first use shlex_quote() and try parsing again and if that fails we
will try replacing shell-quote values with '_'. We log warning
whenever we have to quote or replace values.

- Python2.7 does not have shlex.quote, use pipes.quote (which was
  renamed to shlex.quote in Py3:
  https://docs.python.org/2/library/pipes.html#pipes.quote)
- Add unittest with original info dump from bug submitters system

LP: #1875085

To post a comment you must log in.
Revision history for this message
Server Team CI bot (server-team-bot) wrote :
review: Needs Fixing (continuous-integration)
Revision history for this message
Shawn Weeks (absolutesantaja) wrote :

I wrote a little program to test this and the output may not be what you want. I feel like this has added too many quotes and still has an uneven number of single quotes.

file = open("output.txt","r")
fileString = file.read()
output = "'" + fileString.replace("'", "\'\"\'\"\'") + "'"
print output

Here is a small sample of the output.

SCSI_TPGS='"'"'0'"'"'
SCSI_TYPE='"'"'disk'"'"'
SCSI_VENDOR='"'"'SanDisk'"'"''"'"'
SCSI_VENDOR_ENC='"'"'SanDisk'"'"''"'"'
SCSI_MODEL='"'"'Cruzer_Fit'"'"'
SCSI_MODEL_ENC='"'"'Cruzer\x20Fit\x20\x20\x20\x20\x20\x20'"'"'
SCSI_REVISION='"'"'1.00'"'"'
ID_TYPE='"'"'disk'"'"'

review: Needs Information
Revision history for this message
Ryan Harper (raharper) wrote :

Thanks. shlex.quote() is designed to make the string usable directly in shell, like so:

% lxc exec x1 bash
root@x1:~# lsb_release -rd
Description: Ubuntu 16.04.6 LTS
Release: 16.04
root@x1:~# python3
Python 3.5.2 (default, Apr 16 2020, 17:47:17)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import shlex
>>> def shlex_quote(v):
... return "'" + v.replace("'", "\'\"\'\"\'") + "'"
...
>>> value="SanDisk'"
>>> value
"SanDisk'"
>>> shlex.quote(value)
'\'SanDisk\'"\'"\'\''
>>> shlex_quote(value)
'\'SanDisk\'"\'"\'\''
>>> print(shlex.quote(value))
'SanDisk'"'"''
>>>
root@x1:~# echo 'SanDisk'"'"''
SanDisk'

Revision history for this message
Shawn Weeks (absolutesantaja) wrote :

Thanks for the better test. I didn't fully understand what shlex was aiming for. I see the correct output now.

#!/bin/bash

outputVar=$(
cat << EOF | python
fileString = "'SanDisk''"
output = "'" + fileString.replace("'", "\'\"\'\"\'") + "'"
print output
quit()
EOF
)

sh -c "echo $outputVar"

'SanDisk''

Revision history for this message
Ryan Harper (raharper) wrote :

This unittest is not exposing the path we're seeing. I suspect there's some python string stuff happening here, I'll likely need to dump udevadm info --query=property --export output into a file and provide the decoded file output into the return of the mocked subp to have it trigger what I see a runtime.

First we shouldn't shlex.quote() everything, that leaves shell-escape characters in the values we'd like to use as they are, DEVTYPE="disk" for example, instead of DEVTYPE="'disk'".

>>> shlex.split("'disk'")
['disk']
>>> shlex.split(shlex.quote("'disk'"))
["'disk'"]

Revision history for this message
Shawn Weeks (absolutesantaja) wrote :

> This unittest is not exposing the path we're seeing. I suspect there's some
> python string stuff happening here, I'll likely need to dump udevadm info
> --query=property --export output into a file and provide the decoded file
> output into the return of the mocked subp to have it trigger what I see a
> runtime.
>
> First we shouldn't shlex.quote() everything, that leaves shell-escape
> characters in the values we'd like to use as they are, DEVTYPE="disk" for
> example, instead of DEVTYPE="'disk'".
>
> >>> shlex.split("'disk'")
> ['disk']
> >>> shlex.split(shlex.quote("'disk'"))
> ["'disk'"]

You're more than welcome to use output.txt on the bug report It was made by just piping the output of udevadm to a text file.

Revision history for this message
Ryan Harper (raharper) wrote :

Heh, that's exactly what I did =)

Revision history for this message
Server Team CI bot (server-team-bot) wrote :
review: Approve (continuous-integration)
Revision history for this message
Dan Watkins (oddbloke) wrote :

Broadly speaking this looks good, thanks for your work! I do have a few comments inline (though none of them are enormous).

review: Needs Fixing
Revision history for this message
Ryan Harper (raharper) wrote :

Thanks

a9c7968... by Ryan Harper

Use pipes.quote for py27

cf07a93... by Ryan Harper

Rework ValueError, update comment, conslidate trimmed value use

Revision history for this message
Server Team CI bot (server-team-bot) wrote :
review: Approve (continuous-integration)
Revision history for this message
Dan Watkins (oddbloke) wrote :

On Tue, Apr 28, 2020 at 04:53:30PM -0000, Ryan Harper wrote:
> > +try:
> > + shlex_quote = shlex.quote
> > +except AttributeError:
> > + # python2.7 shlex does not have quote, give it a try
> > + def shlex_quote(value):
>
> Yes, that seems *much* better. Thanks for tracking it down.

No problem!

> > + # strip the leading/ending single tick from udev output
>
> I'll rework.

Thanks!

> > + self.assertEqual('SanDisk'"'"'', info['SCSI_VENDOR'])
>
> What's wrong with matching the exact output from shlex_quote() ?
>
> % python3
> Python 3.8.2 (default, Mar 13 2020, 10:14:16)
> [GCC 9.3.0] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import shlex
> >>> shlex.quote("SanDisk'")
> '\'SanDisk\'"\'"\'\''
> >>> print(shlex.quote("SanDisk'"))
> 'SanDisk'"'"''
> >>>

I don't think that's what you're doing:

    In [4]: 'SanDisk'"'"'' == shlex.quote("SanDisk'")
    Out[4]: False

'SanDisk'"'"'' is equivalent to `'SanDisk' + "'" + ''`, which:

    In [5]: 'SanDisk'"'"'' == "SanDisk'"
    Out[5]: True

effectively undoes the escaping. This is a little easier to see in
triple quotes (where all the double and single quotes do not need
escaping, and I added newlines and a .strip() for further clarity):

    In [13]: """
        ...: 'SanDisk'"'"''
        ...: """.strip() == shlex.quote("SanDisk'")
    Out[13]: True

5ceb185... by Ryan Harper

Rework unittest and document expected result, being careful with quoting

Revision history for this message
Server Team CI bot (server-team-bot) wrote :
review: Approve (continuous-integration)
Revision history for this message
Dan Watkins (oddbloke) wrote :

LGTM, thanks! I've updated the commit message for the pipes change, but will leave it to you to mark Approved.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/curtin/udev.py b/curtin/udev.py
2index e2e3dd0..f83f216 100644
3--- a/curtin/udev.py
4+++ b/curtin/udev.py
5@@ -4,7 +4,14 @@ import shlex
6 import os
7
8 from curtin import util
9-from curtin.log import logged_call
10+from curtin.log import logged_call, LOG
11+
12+try:
13+ shlex_quote = shlex.quote
14+except AttributeError:
15+ # python2.7 uses pipes.quote
16+ import pipes
17+ shlex_quote = pipes.quote
18
19
20 def compose_udev_equality(key, value):
21@@ -90,7 +97,23 @@ def udevadm_info(path=None):
22 value = None
23 if value:
24 # preserve spaces in values to match udev database
25- parsed = shlex.split(value)
26+ try:
27+ parsed = shlex.split(value)
28+ except ValueError:
29+ # strip the leading/ending single tick from udev output before
30+ # escaping the value to prevent their inclusion in the result.
31+ trimmed_value = value[1:-1]
32+ try:
33+ quoted = shlex_quote(trimmed_value)
34+ LOG.debug('udevadm_info: quoting shell-escape chars '
35+ 'in %s=%s -> %s', key, value, quoted)
36+ parsed = shlex.split(quoted)
37+ except ValueError:
38+ escaped_value = (
39+ trimmed_value.replace("'", "_").replace('"', "_"))
40+ LOG.debug('udevadm_info: replacing shell-escape chars '
41+ 'in %s=%s -> %s', key, value, escaped_value)
42+ parsed = shlex.split(escaped_value)
43 if ' ' not in value:
44 info[key] = parsed[0]
45 else:
46diff --git a/tests/data/udevadm_info_sandisk_cruzer.txt b/tests/data/udevadm_info_sandisk_cruzer.txt
47new file mode 100644
48index 0000000..a605afe
49--- /dev/null
50+++ b/tests/data/udevadm_info_sandisk_cruzer.txt
51@@ -0,0 +1,54 @@
52+DEVPATH='/devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.5/2-1.5.1/2-1.5.1:1.0/host6/target6:0:0/6:0:0:0/block/sdc/sdc1'
53+DEVNAME='/dev/sdc1'
54+DEVTYPE='partition'
55+PARTN='1'
56+MAJOR='8'
57+MINOR='33'
58+SUBSYSTEM='block'
59+USEC_INITIALIZED='5265867'
60+SCSI_TPGS='0'
61+SCSI_TYPE='disk'
62+SCSI_VENDOR='SanDisk''
63+SCSI_VENDOR_ENC='SanDisk''
64+SCSI_MODEL='Cruzer_Fit'
65+SCSI_MODEL_ENC='Cruzer\x20Fit\x20\x20\x20\x20\x20\x20'
66+SCSI_REVISION='1.00'
67+ID_SCSI='1'
68+ID_VENDOR='SanDisk_'
69+ID_VENDOR_ENC='SanDisk\x27'
70+ID_MODEL='Cruzer_Fit'
71+ID_MODEL_ENC='Cruzer\x20Fit\x20\x20\x20\x20\x20\x20'
72+ID_REVISION='1.00'
73+ID_TYPE='disk'
74+ID_SCSI_INQUIRY='1'
75+ID_VENDOR_ID='0781'
76+ID_MODEL_ID='5571'
77+ID_SERIAL='SanDisk__Cruzer_Fit_4C530000140118216265-0:0'
78+ID_SERIAL_SHORT='4C530000140118216265'
79+ID_INSTANCE='0:0'
80+ID_BUS='usb'
81+ID_USB_INTERFACES=':080650:'
82+ID_USB_INTERFACE_NUM='00'
83+ID_USB_DRIVER='usb-storage'
84+ID_PATH='pci-0000:00:1d.0-usb-0:1.5.1:1.0-scsi-0:0:0:0'
85+ID_PATH_TAG='pci-0000_00_1d_0-usb-0_1_5_1_1_0-scsi-0_0_0_0'
86+ID_PART_TABLE_UUID='36b64baf'
87+ID_PART_TABLE_TYPE='dos'
88+ID_FS_UUID='2020-04-23-08-02-07-00'
89+ID_FS_UUID_ENC='2020-04-23-08-02-07-00'
90+ID_FS_BOOT_SYSTEM_ID='EL\x20TORITO\x20SPECIFICATION'
91+ID_FS_VERSION='Joliet Extension'
92+ID_FS_LABEL='Ubuntu-Server_20.04_LTS_amd64'
93+ID_FS_LABEL_ENC='Ubuntu-Server\x2020.04\x20LTS\x20amd64'
94+ID_FS_TYPE='iso9660'
95+ID_FS_USAGE='filesystem'
96+ID_PART_ENTRY_SCHEME='dos'
97+ID_PART_ENTRY_UUID='36b64baf-01'
98+ID_PART_ENTRY_TYPE='0x0'
99+ID_PART_ENTRY_FLAGS='0x80'
100+ID_PART_ENTRY_NUMBER='1'
101+ID_PART_ENTRY_OFFSET='0'
102+ID_PART_ENTRY_SIZE='1859584'
103+ID_PART_ENTRY_DISK='8:32'
104+DEVLINKS='/dev/disk/by-path/pci-0000:00:1d.0-usb-0:1.5.1:1.0-scsi-0:0:0:0-part1 /dev/disk/by-id/usb-SanDisk__Cruzer_Fit_4C530000140118216265-0:0-part1 /dev/disk/by-label/Ubuntu-Server\x2020.04\x20LTS\x20amd64 /dev/disk/by-partuuid/36b64baf-01 /dev/disk/by-uuid/2020-04-23-08-02-07-00'
105+TAGS=':systemd:'
106diff --git a/tests/unittests/test_udev.py b/tests/unittests/test_udev.py
107index 33d5f44..919c7c0 100644
108--- a/tests/unittests/test_udev.py
109+++ b/tests/unittests/test_udev.py
110@@ -1,14 +1,18 @@
111 # This file is part of curtin. See LICENSE file for copyright and license info.
112
113 import mock
114+import shlex
115
116-from curtin import udev
117+from curtin.udev import (
118+ udevadm_info,
119+ shlex_quote,
120+ )
121 from curtin import util
122 from .helpers import CiTestCase
123
124
125 UDEVADM_INFO_QUERY = """\
126-DEVLINKS='/dev/disk/by-id/nvme-eui.0025388b710116a1'
127+DEVLINKS='/dev/disk/by-id/nvme-eui.0025388b710116a1 /dev/disk/by-id/nvme-n1'
128 DEVNAME='/dev/nvme0n1'
129 DEVPATH='/devices/pci0000:00/0000:00:1c.4/0000:05:00.0/nvme/nvme0/nvme0n1'
130 DEVTYPE='disk'
131@@ -24,7 +28,8 @@ USEC_INITIALIZED='2026691'
132 """
133
134 INFO_DICT = {
135- 'DEVLINKS': ['/dev/disk/by-id/nvme-eui.0025388b710116a1'],
136+ 'DEVLINKS': ['/dev/disk/by-id/nvme-eui.0025388b710116a1',
137+ '/dev/disk/by-id/nvme-n1'],
138 'DEVNAME': '/dev/nvme0n1',
139 'DEVPATH':
140 '/devices/pci0000:00/0000:00:1c.4/0000:05:00.0/nvme/nvme0/nvme0n1',
141@@ -48,17 +53,49 @@ class TestUdevInfo(CiTestCase):
142 """ udevadm_info returns dictionary for specified device """
143 mypath = '/dev/nvme0n1'
144 m_subp.return_value = (UDEVADM_INFO_QUERY, "")
145- info = udev.udevadm_info(mypath)
146+ info = udevadm_info(mypath)
147 m_subp.assert_called_with(
148 ['udevadm', 'info', '--query=property', '--export', mypath],
149 capture=True)
150 self.assertEqual(sorted(INFO_DICT), sorted(info))
151
152+ @mock.patch('curtin.util.subp')
153+ def test_udevadm_info_escape_quotes(self, m_subp):
154+ """verify we escape quotes when we fail to split. """
155+ mypath = '/dev/sdz'
156+ datafile = 'tests/data/udevadm_info_sandisk_cruzer.txt'
157+ m_subp.return_value = (util.load_file(datafile), "")
158+ info = udevadm_info(mypath)
159+ m_subp.assert_called_with(
160+ ['udevadm', 'info', '--query=property', '--export', mypath],
161+ capture=True)
162+
163+ """
164+ Replicate what udevadm_info parsing does and use pdb to examine what's
165+ happening.
166+
167+ (Pdb) original_value
168+ "SanDisk'"
169+ (Pdb) quoted_value
170+ '\'SanDisk\'"\'"\'\''
171+ (Pdb) split_value
172+ ["SanDisk'"]
173+ (Pdb) expected_value
174+ "SanDisk'"
175+ """
176+ original_value = "SanDisk'"
177+ quoted_value = shlex_quote(original_value)
178+ split_value = shlex.split(quoted_value)
179+ expected_value = split_value if ' ' in split_value else split_value[0]
180+
181+ self.assertEqual(expected_value, info['SCSI_VENDOR'])
182+ self.assertEqual(expected_value, info['SCSI_VENDOR_ENC'])
183+
184 def test_udevadm_info_no_path(self):
185 """ udevadm_info raises ValueError for invalid path value"""
186 mypath = None
187 with self.assertRaises(ValueError):
188- udev.udevadm_info(mypath)
189+ udevadm_info(mypath)
190
191 @mock.patch('curtin.util.subp')
192 def test_udevadm_info_path_not_exists(self, m_subp):
193@@ -66,4 +103,4 @@ class TestUdevInfo(CiTestCase):
194 mypath = self.random_string()
195 m_subp.side_effect = util.ProcessExecutionError()
196 with self.assertRaises(util.ProcessExecutionError):
197- udev.udevadm_info(mypath)
198+ udevadm_info(mypath)

Subscribers

People subscribed via source and target branches