Merge ppa-dev-tools:add_unpack_to_dict into ppa-dev-tools:main

Proposed by Bryce Harrington
Status: Merged
Merge reported by: Bryce Harrington
Merged at revision: 8a1b18cebbfd84db7011df6569febb757531fa9e
Proposed branch: ppa-dev-tools:add_unpack_to_dict
Merge into: ppa-dev-tools:main
Diff against target: 312 lines (+300/-0)
2 files modified
ppa/dict.py (+183/-0)
tests/test_dict.py (+117/-0)
Reviewer Review Type Date Requested Status
Lena Voytek (community) Approve
Canonical Server Pending
Canonical Server Reporter Pending
Review via email: mp+437970@code.launchpad.net

Description of the change

ppa-dev-tools needs to be able to parse strings of comma-separated values. There are three different types of these strings: Argument values (i.e. --architectures=...); Debian control fields (i.e. "Depends: foo, bar (>= 1.2), baw | bay | baz"); and autopkgtest trigger lists.

This branch adds an "unpack_to_dict()" helper routine that strives to solve all these needs. It supports overriding the markers to allow customizing the parsing behavior to cover the aforementioned use cases. Hopefully it'll be useful in other situations too.

I recognize one issue with trying to solve multiple special cases with one generic routine is there can be exceptions for one case that interfere with others, and handling them can make the generic routine less useful than just having separate routines for each case. Thankfully, I haven't yet run into any exceptions this can't handle, but admittedly I haven't used this in practice very widely. To mitigate this problem, I've implemented an extensive set of test cases that can be expanded as exceptions and new use cases are discovered to ensure that the existing use cases remain correctly handled.

The tests can be invoked via:

    $ pytest-3 ./tests/test_dict.py

There is also a smoketest that demonstrates parsing of some different real-world examples, to give a better feel for how this will be used in practice.

    $ python3 ./ppa/dict.py

To post a comment you must log in.
Revision history for this message
Lena Voytek (lvoytek) wrote :

The additions here look good to me. I also ran the unit tests on my system and everything passed. I added some comments for nitpicks + code clarity below

review: Approve
Revision history for this message
Bryce Harrington (bryce) wrote :

Thanks! I've incorporated all your feedback (and some from Athos on another MP). I've force-pushed the branch with all the corrections here, and will land the same branch to main.

To git+ssh://git.launchpad.net/ppa-dev-tools
 + 8a1b18c...555b089 add_unpack_to_dict -> add_unpack_to_dict (forced update)

To git+ssh://git.launchpad.net/ppa-dev-tools
   8ca9a46..555b089 main -> main

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/ppa/dict.py b/ppa/dict.py
2new file mode 100644
3index 0000000..b4c74f1
4--- /dev/null
5+++ b/ppa/dict.py
6@@ -0,0 +1,183 @@
7+#!/usr/bin/env python3
8+# -*- Mode: Python; coding: utf-8; indent-tabs-mode: nil; tab-width: 4 -*-
9+
10+# Copyright (C) 2022 Bryce W. Harrington
11+#
12+# Released under GNU GPLv2 or later, read the file 'LICENSE.GPLv2+' for
13+# more information.
14+#
15+# Authors:
16+# Bryce Harrington <bryce@canonical.com>
17+
18+'''
19+Routines for dealing with dictionary objects.
20+'''
21+
22+
23+def unpack_to_dict(text, key_cut=':', key_sep='=') -> dict:
24+ """Converts comma-delimited data into a dictionary.
25+
26+ For each item, if @param key_sep is present split on it to get the
27+ key and value. If @param key_sep is not present, then the item will
28+ be stored as the key with an empty string as the value.
29+
30+ The key is further processed by excluding anything after @param
31+ key_cut. For example, with the default values for @param key_sep
32+ and @param key_cut, the string "a,b=1.2.3,c:xyz=42" will unpack
33+ into:
34+
35+ {
36+ 'a': '',
37+ 'b': '1.2.3',
38+ 'c': '42',
39+ }
40+
41+ A basic use case of this routine is to split out comma-separated
42+ items in a command line parameter's value string. The @param
43+ key_sep parameter facilitates parsing lists of key=value items. The
44+ @param key_cut provides a mechanism for filtering out unwanted
45+ portions of key names; for example, in "mypackage/universe=1.2.3" we
46+ may want to ignore the 'universe' detail.
47+
48+ This routine is intended to handle parsing of Debian control fields
49+ sufficiently to determine the package names. This routine is NOT
50+ intended to be a specification-compliant parser, and in particular
51+ is neither idempotent nor thorough in its parsing. See
52+ https://wiki.debian.org/BuildProfileSpec for details about Debian's
53+ control field format.
54+
55+ To support these control fields, the '|' symbol is recognized as a
56+ way of representing multiple alternative items via a tuple of keys
57+ and a dict for the value. For example, the string 'a,b=1|c=2,d=3'
58+ will unpack into:
59+
60+ {
61+ 'a': '',
62+ ('b', 'c'): {'b': '1', 'c': '2'},
63+ 'd': '3'
64+ }
65+
66+ :param str text: Comma-separated textual data collection.
67+ :param str key_cut: Ignore anything in key after this character.
68+ :param str key_sep: Character used to separate an item's key from its value.
69+ :returns: Dictionary of data->value items
70+ :rtype: dict[str, str]
71+ """
72+ if not text or not key_sep or not key_cut:
73+ # None is not handled for any of the parameters so far.
74+ raise ValueError("unpack_to_dict() requires non-None values for all arguments")
75+ elif key_sep.strip() == ',':
76+ # Comma is used internally as the separator character, and
77+ # cannot currently be configured differently, thus it can't be
78+ # used by key_sep as a secondary separator.
79+ raise ValueError("comma is reserved and cannot be used for key_sep")
80+ elif not key_cut.strip() or key_cut.strip() == ',':
81+ # Whitespace is permitted for key_sep, but not for key_cut.
82+ # Comma is not allowed for key_cut, for same reason as for key_sep.
83+ raise ValueError("key_cut must be at least one (non-comma, non-whitespace) character")
84+ elif key_sep.strip() == key_cut.strip():
85+ # Since we're splitting on key_sep, using the same string to then split
86+ # key_cut would be redundant and ineffective.
87+ raise ValueError("key_sep and key_cut must not be the same string")
88+
89+ def _split_item(item, key_sep, key_cut):
90+ if key_sep in item:
91+ key, value = item.split(key_sep, 1)
92+ else:
93+ key = item
94+ value = ''
95+ if key_cut:
96+ key = key.split(key_cut, 1)[0]
97+
98+ # Blank value is allowed, but not key. Neither can be None.
99+ if not key or value is None:
100+ raise ValueError
101+ return key.strip(), value.strip()
102+
103+ d = {}
104+ for item in text.split(','):
105+ if not item:
106+ raise ValueError
107+
108+ item = item.strip()
109+ if '|' in item:
110+ # Handled items with are actually multiple items OR-ed together.
111+ subitems = {}
112+ for subitem in item.split('|'):
113+ subitem = subitem.strip()
114+ if not subitem:
115+ raise ValueError("Undefined element of OR ('|') clause")
116+ subitems.update(dict([_split_item(subitem, key_sep, key_cut)]))
117+
118+ # Store multi-values using a tuple key rather than a simple string.
119+ d[tuple(subitems.keys())] = subitems
120+ else:
121+ # Store single-values as a simple string key.
122+ d.update(dict([_split_item(item, key_sep, key_cut)]))
123+ return d
124+
125+
126+if __name__ == "__main__":
127+ # This is a smoketest
128+
129+ import pprint
130+ pp = pprint.PrettyPrinter(indent=4)
131+
132+ text = "a, b=1.2.3, c:x=4"
133+ print(text)
134+ pp.pprint(unpack_to_dict(text))
135+ print()
136+
137+ print("Conflicts:")
138+ text = "binutils-mingw-w64-i686 (<< 2.23.52.20130612-1+3), zlib1g (>= 1:1.1.4)"
139+ print(text)
140+ pp.pprint(unpack_to_dict(text, key_sep=' '))
141+ print()
142+
143+ print("Build-Depends:")
144+ text = "autoconf (>= 2.64), dpkg-dev (>= 1.19.0.5), bison, flex, gettext, texinfo, dejagnu, quilt, chrpath, dwz, debugedit (>= 4.16), python3:any, file, xz-utils, lsb-release, zlib1g-dev, procps, g++-aarch64-linux-gnu [amd64 i386 x32] <!nocheck>, g++-arm-linux-gnueabi [amd64 arm64 i386 x32] <!nocheck>, g++-arm-linux-gnueabihf [amd64 arm64 i386 x32] <!nocheck>, g++-powerpc64le-linux-gnu [amd64 arm64 i386 ppc64 x32] <!nocheck>, g++-s390x-linux-gnu [amd64 arm64 i386 ppc64el x32] <!nocheck>, g++-alpha-linux-gnu [amd64 i386 x32] <!nocheck>, g++-hppa-linux-gnu [amd64 i386 x32] <!nocheck>, g++-m68k-linux-gnu [amd64 i386 x32] <!nocheck>, g++-powerpc-linux-gnu [amd64 i386 ppc64el x32] <!nocheck>, g++-powerpc64-linux-gnu [amd64 i386 x32] <!nocheck>, g++-riscv64-linux-gnu [amd64 arm64 i386 ppc64el x32] <!nocheck>, g++-sh4-linux-gnu [amd64 i386 x32] <!nocheck>, g++-sparc64-linux-gnu [amd64 i386 x32] <!nocheck>, g++-i686-linux-gnu [amd64 arm64 ppc64el x32] <!nocheck>, g++-x86-64-linux-gnu [arm64 i386 ppc64el] <!nocheck>, g++-x86-64-linux-gnux32 [amd64 arm64 i386 ppc64el] <!nocheck>"
145+ print(text)
146+ pp.pprint(unpack_to_dict(text, key_sep=' '))
147+ print()
148+
149+ print("Depends:")
150+ text = "binutils-common (= 2.38.50.20220707-1ubuntu1), libbinutils (= 2.38.50.20220707-1ubuntu1), binutils-x86-64-linux-gnu (= 2.38.50.20220707-1ubuntu1)"
151+ print(text)
152+ pp.pprint(unpack_to_dict(text, key_cut='#', key_sep=' '))
153+ print()
154+
155+ print("Depends:")
156+ text = """ adduser,
157+ libpam-runtime,
158+ lsb-base,
159+ openssl,
160+ ssl-cert,
161+ ucf,
162+ ${misc:Depends},
163+ ${shlibs:Depends}
164+ """
165+ print(text)
166+ pp.pprint(unpack_to_dict(text, key_cut='#', key_sep=' '))
167+ print()
168+
169+ print("Depends:")
170+ text = """ ${misc:Depends},
171+ debhelper-compat (= 13),
172+ dpkg-dev (>= 1.15.5),
173+ nginx-core (<< 5.1~) | nginx-light (<< 5.1~) | nginx-extras (<< 5.1~),
174+ nginx-core (>= 5) | nginx-light (>= 5) | nginx-extras (>= 5)
175+ """
176+ print(text)
177+ pp.pprint(unpack_to_dict(text, key_cut='#', key_sep=' '))
178+ print()
179+
180+ print("Depends:")
181+ text = """ ${misc:Depends},
182+ debhelper-compat (= 13),
183+ dpkg-dev (>= 1.15.5),
184+ nginx-core (<< 5.1~) | nginx-light (<< 5.1~) | nginx-extras (<< 5.1~),
185+ nginx-core (>= 5) | nginx-light (>= 5) | nginx-extras (>= 5)
186+ """
187+ print(text)
188+ pp.pprint(unpack_to_dict(text, key_sep=' ', key_cut='#'))
189+ print()
190diff --git a/tests/test_dict.py b/tests/test_dict.py
191new file mode 100644
192index 0000000..8650d69
193--- /dev/null
194+++ b/tests/test_dict.py
195@@ -0,0 +1,117 @@
196+#!/usr/bin/env python3
197+# -*- Mode: Python; coding: utf-8; indent-tabs-mode: nil; tab-width: 4 -*-
198+
199+# Author: Bryce Harrington <bryce@canonical.com>
200+#
201+# Copyright (C) 2023 Bryce W. Harrington
202+#
203+# Released under GNU GPLv2 or later, read the file 'LICENSE.GPLv2+' for
204+# more information.
205+
206+"""Tests helper routines from the dict module."""
207+
208+import os
209+import sys
210+import pytest
211+
212+sys.path.insert(0, os.path.realpath(
213+ os.path.join(os.path.dirname(os.path.realpath(__file__)), "..")))
214+
215+from ppa.dict import unpack_to_dict
216+
217+
218+@pytest.mark.parametrize('text, expected', [
219+ ('a', {'a': ''}),
220+ ('a=1', {'a': '1'}),
221+ ('a=1:2', {'a': '1:2'}),
222+ ('a = 1', {'a': '1'}),
223+ ('a:x', {'a': ''}),
224+ ('a:x:', {'a': ''}),
225+ ('a:x:y', {'a': ''}),
226+ ('a:x=1', {'a': '1'}),
227+ ('a:x = 1', {'a': '1'}),
228+ ('a=x=1', {'a': 'x=1'}),
229+ ('a : x=1', {'a': '1'}),
230+ ('a,b', {'a': '', 'b': ''}),
231+ ('a, b', {'a': '', 'b': ''}),
232+ ('a,b=1.2.3,c:x=4', {'a': '', 'b': '1.2.3', 'c': '4'}),
233+ ('a, b=1.2.3, c:x=4', {'a': '', 'b': '1.2.3', 'c': '4'}),
234+ ('a, b|c', {'a': '', ('b', 'c'): {'b': '', 'c': ''}}),
235+ ('a, b=1|c=2', {'a': '', ('b', 'c'): {'b': '1', 'c': '2'}}),
236+ ('a, b=1|c:x=2, d=3', {'a': '', ('b', 'c'): {'b': '1', 'c': '2'}, 'd': '3'}),
237+])
238+def test_unpack_to_dict(text, expected):
239+ """Checks the unpack_to_dict() routine's string unpacking."""
240+ result = unpack_to_dict(text)
241+
242+ assert result
243+ assert isinstance(result, dict)
244+ assert result == expected
245+
246+
247+@pytest.mark.parametrize('text, expected_exception', [
248+ (None, ValueError),
249+ ('', ValueError),
250+ (',', ValueError),
251+ (',z', ValueError),
252+ (':', ValueError),
253+ (':z', ValueError),
254+ ('=', ValueError),
255+ ('=z', ValueError),
256+])
257+def test_unpack_to_dict_error(text, expected_exception):
258+ """Checks the unpack_to_dict() routine's string unpacking."""
259+ with pytest.raises(expected_exception):
260+ unpack_to_dict(text)
261+
262+
263+@pytest.mark.parametrize('text, key_cut, sep, expected', [
264+ ('a:x=1', ':', '=', {'a': '1'}),
265+
266+ ('a.x=1', '.', '=', {'a': '1'}),
267+ ('a+x=1', '+', '=', {'a': '1'}),
268+ ('a-x=1', '-', '=', {'a': '1'}),
269+ ('a~x=1', '~', '=', {'a': '1'}),
270+ ('a_x=1', '_', '=', {'a': '1'}),
271+ ('a!x=1', '!', '=', {'a': '1'}),
272+ ('a;x=1', ';', '=', {'a': '1'}),
273+ ('a/x=1', '/', '=', {'a': '1'}),
274+
275+ ('a:x.1', ':', '.', {'a': '1'}),
276+ ('a:x+1', ':', '+', {'a': '1'}),
277+ ('a:x-1', ':', '-', {'a': '1'}),
278+ ('a:x~1', ':', '~', {'a': '1'}),
279+ ('a:x_1', ':', '_', {'a': '1'}),
280+ ('a:x!1', ':', '!', {'a': '1'}),
281+ ('a:x;1', ':', ';', {'a': '1'}),
282+ ('a:x/1', ':', '/', {'a': '1'}),
283+
284+ # Spaces are allowed as separators
285+ ('a 1', ':', ' ', {'a': '1'}),
286+ ('a 1, b, c 3', ':', ' ', {'a': '1', 'b': '', 'c': '3'}),
287+
288+])
289+def test_unpack_to_dict_parameters(text, sep, key_cut, expected):
290+ """Checks the unpack_to_dict() routine's string unpacking."""
291+ result = unpack_to_dict(text, key_sep=sep, key_cut=key_cut)
292+
293+ assert result
294+ assert isinstance(result, dict)
295+ assert result == expected
296+
297+
298+@pytest.mark.parametrize('text, key_cut, sep, expected_exception', [
299+ # Commas are reserved as the item separator
300+ ('a:x=1', ',', '=', ValueError),
301+ ('a:x=1', ':', ',', ValueError),
302+ ('a:x=1', ',', ',', ValueError),
303+
304+ # key_cut and sep have to be different
305+ ('a:x=1', ':', ':', ValueError),
306+ ('a:x=1', '=', '=', ValueError),
307+ ('a:x=1', '.', '.', ValueError),
308+])
309+def test_unpack_to_dict_parameters_error(text, sep, key_cut, expected_exception):
310+ """Checks the unpack_to_dict() error handling, with invalid parameters."""
311+ with pytest.raises(expected_exception):
312+ unpack_to_dict(text, key_sep=sep, key_cut=key_cut)

Subscribers

People subscribed via source and target branches

to all changes: