Merge ppa-dev-tools:add_unpack_to_dict into ppa-dev-tools:main

Proposed by Bryce Harrington
Status: Merged
Merge reported by: Bryce Harrington
Merged at revision: 8a1b18cebbfd84db7011df6569febb757531fa9e
Proposed branch: ppa-dev-tools:add_unpack_to_dict
Merge into: ppa-dev-tools:main
Diff against target: 312 lines (+300/-0)
2 files modified
ppa/dict.py (+183/-0)
tests/test_dict.py (+117/-0)
Reviewer Review Type Date Requested Status
Lena Voytek (community) Approve
Canonical Server Pending
Canonical Server Reporter Pending
Review via email: mp+437970@code.launchpad.net

Description of the change

ppa-dev-tools needs to be able to parse strings of comma-separated values. There are three different types of these strings: Argument values (i.e. --architectures=...); Debian control fields (i.e. "Depends: foo, bar (>= 1.2), baw | bay | baz"); and autopkgtest trigger lists.

This branch adds an "unpack_to_dict()" helper routine that strives to solve all these needs. It supports overriding the markers to allow customizing the parsing behavior to cover the aforementioned use cases. Hopefully it'll be useful in other situations too.

I recognize one issue with trying to solve multiple special cases with one generic routine is there can be exceptions for one case that interfere with others, and handling them can make the generic routine less useful than just having separate routines for each case. Thankfully, I haven't yet run into any exceptions this can't handle, but admittedly I haven't used this in practice very widely. To mitigate this problem, I've implemented an extensive set of test cases that can be expanded as exceptions and new use cases are discovered to ensure that the existing use cases remain correctly handled.

The tests can be invoked via:

    $ pytest-3 ./tests/test_dict.py

There is also a smoketest that demonstrates parsing of some different real-world examples, to give a better feel for how this will be used in practice.

    $ python3 ./ppa/dict.py

To post a comment you must log in.
Revision history for this message
Lena Voytek (lvoytek) wrote :

The additions here look good to me. I also ran the unit tests on my system and everything passed. I added some comments for nitpicks + code clarity below

review: Approve
Revision history for this message
Bryce Harrington (bryce) wrote :

Thanks! I've incorporated all your feedback (and some from Athos on another MP). I've force-pushed the branch with all the corrections here, and will land the same branch to main.

To git+ssh://git.launchpad.net/ppa-dev-tools
 + 8a1b18c...555b089 add_unpack_to_dict -> add_unpack_to_dict (forced update)

To git+ssh://git.launchpad.net/ppa-dev-tools
   8ca9a46..555b089 main -> main

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
diff --git a/ppa/dict.py b/ppa/dict.py
0new file mode 1006440new file mode 100644
index 0000000..b4c74f1
--- /dev/null
+++ b/ppa/dict.py
@@ -0,0 +1,183 @@
1#!/usr/bin/env python3
2# -*- Mode: Python; coding: utf-8; indent-tabs-mode: nil; tab-width: 4 -*-
3
4# Copyright (C) 2022 Bryce W. Harrington
5#
6# Released under GNU GPLv2 or later, read the file 'LICENSE.GPLv2+' for
7# more information.
8#
9# Authors:
10# Bryce Harrington <bryce@canonical.com>
11
12'''
13Routines for dealing with dictionary objects.
14'''
15
16
17def unpack_to_dict(text, key_cut=':', key_sep='=') -> dict:
18 """Converts comma-delimited data into a dictionary.
19
20 For each item, if @param key_sep is present split on it to get the
21 key and value. If @param key_sep is not present, then the item will
22 be stored as the key with an empty string as the value.
23
24 The key is further processed by excluding anything after @param
25 key_cut. For example, with the default values for @param key_sep
26 and @param key_cut, the string "a,b=1.2.3,c:xyz=42" will unpack
27 into:
28
29 {
30 'a': '',
31 'b': '1.2.3',
32 'c': '42',
33 }
34
35 A basic use case of this routine is to split out comma-separated
36 items in a command line parameter's value string. The @param
37 key_sep parameter facilitates parsing lists of key=value items. The
38 @param key_cut provides a mechanism for filtering out unwanted
39 portions of key names; for example, in "mypackage/universe=1.2.3" we
40 may want to ignore the 'universe' detail.
41
42 This routine is intended to handle parsing of Debian control fields
43 sufficiently to determine the package names. This routine is NOT
44 intended to be a specification-compliant parser, and in particular
45 is neither idempotent nor thorough in its parsing. See
46 https://wiki.debian.org/BuildProfileSpec for details about Debian's
47 control field format.
48
49 To support these control fields, the '|' symbol is recognized as a
50 way of representing multiple alternative items via a tuple of keys
51 and a dict for the value. For example, the string 'a,b=1|c=2,d=3'
52 will unpack into:
53
54 {
55 'a': '',
56 ('b', 'c'): {'b': '1', 'c': '2'},
57 'd': '3'
58 }
59
60 :param str text: Comma-separated textual data collection.
61 :param str key_cut: Ignore anything in key after this character.
62 :param str key_sep: Character used to separate an item's key from its value.
63 :returns: Dictionary of data->value items
64 :rtype: dict[str, str]
65 """
66 if not text or not key_sep or not key_cut:
67 # None is not handled for any of the parameters so far.
68 raise ValueError("unpack_to_dict() requires non-None values for all arguments")
69 elif key_sep.strip() == ',':
70 # Comma is used internally as the separator character, and
71 # cannot currently be configured differently, thus it can't be
72 # used by key_sep as a secondary separator.
73 raise ValueError("comma is reserved and cannot be used for key_sep")
74 elif not key_cut.strip() or key_cut.strip() == ',':
75 # Whitespace is permitted for key_sep, but not for key_cut.
76 # Comma is not allowed for key_cut, for same reason as for key_sep.
77 raise ValueError("key_cut must be at least one (non-comma, non-whitespace) character")
78 elif key_sep.strip() == key_cut.strip():
79 # Since we're splitting on key_sep, using the same string to then split
80 # key_cut would be redundant and ineffective.
81 raise ValueError("key_sep and key_cut must not be the same string")
82
83 def _split_item(item, key_sep, key_cut):
84 if key_sep in item:
85 key, value = item.split(key_sep, 1)
86 else:
87 key = item
88 value = ''
89 if key_cut:
90 key = key.split(key_cut, 1)[0]
91
92 # Blank value is allowed, but not key. Neither can be None.
93 if not key or value is None:
94 raise ValueError
95 return key.strip(), value.strip()
96
97 d = {}
98 for item in text.split(','):
99 if not item:
100 raise ValueError
101
102 item = item.strip()
103 if '|' in item:
104 # Handled items with are actually multiple items OR-ed together.
105 subitems = {}
106 for subitem in item.split('|'):
107 subitem = subitem.strip()
108 if not subitem:
109 raise ValueError("Undefined element of OR ('|') clause")
110 subitems.update(dict([_split_item(subitem, key_sep, key_cut)]))
111
112 # Store multi-values using a tuple key rather than a simple string.
113 d[tuple(subitems.keys())] = subitems
114 else:
115 # Store single-values as a simple string key.
116 d.update(dict([_split_item(item, key_sep, key_cut)]))
117 return d
118
119
120if __name__ == "__main__":
121 # This is a smoketest
122
123 import pprint
124 pp = pprint.PrettyPrinter(indent=4)
125
126 text = "a, b=1.2.3, c:x=4"
127 print(text)
128 pp.pprint(unpack_to_dict(text))
129 print()
130
131 print("Conflicts:")
132 text = "binutils-mingw-w64-i686 (<< 2.23.52.20130612-1+3), zlib1g (>= 1:1.1.4)"
133 print(text)
134 pp.pprint(unpack_to_dict(text, key_sep=' '))
135 print()
136
137 print("Build-Depends:")
138 text = "autoconf (>= 2.64), dpkg-dev (>= 1.19.0.5), bison, flex, gettext, texinfo, dejagnu, quilt, chrpath, dwz, debugedit (>= 4.16), python3:any, file, xz-utils, lsb-release, zlib1g-dev, procps, g++-aarch64-linux-gnu [amd64 i386 x32] <!nocheck>, g++-arm-linux-gnueabi [amd64 arm64 i386 x32] <!nocheck>, g++-arm-linux-gnueabihf [amd64 arm64 i386 x32] <!nocheck>, g++-powerpc64le-linux-gnu [amd64 arm64 i386 ppc64 x32] <!nocheck>, g++-s390x-linux-gnu [amd64 arm64 i386 ppc64el x32] <!nocheck>, g++-alpha-linux-gnu [amd64 i386 x32] <!nocheck>, g++-hppa-linux-gnu [amd64 i386 x32] <!nocheck>, g++-m68k-linux-gnu [amd64 i386 x32] <!nocheck>, g++-powerpc-linux-gnu [amd64 i386 ppc64el x32] <!nocheck>, g++-powerpc64-linux-gnu [amd64 i386 x32] <!nocheck>, g++-riscv64-linux-gnu [amd64 arm64 i386 ppc64el x32] <!nocheck>, g++-sh4-linux-gnu [amd64 i386 x32] <!nocheck>, g++-sparc64-linux-gnu [amd64 i386 x32] <!nocheck>, g++-i686-linux-gnu [amd64 arm64 ppc64el x32] <!nocheck>, g++-x86-64-linux-gnu [arm64 i386 ppc64el] <!nocheck>, g++-x86-64-linux-gnux32 [amd64 arm64 i386 ppc64el] <!nocheck>"
139 print(text)
140 pp.pprint(unpack_to_dict(text, key_sep=' '))
141 print()
142
143 print("Depends:")
144 text = "binutils-common (= 2.38.50.20220707-1ubuntu1), libbinutils (= 2.38.50.20220707-1ubuntu1), binutils-x86-64-linux-gnu (= 2.38.50.20220707-1ubuntu1)"
145 print(text)
146 pp.pprint(unpack_to_dict(text, key_cut='#', key_sep=' '))
147 print()
148
149 print("Depends:")
150 text = """ adduser,
151 libpam-runtime,
152 lsb-base,
153 openssl,
154 ssl-cert,
155 ucf,
156 ${misc:Depends},
157 ${shlibs:Depends}
158 """
159 print(text)
160 pp.pprint(unpack_to_dict(text, key_cut='#', key_sep=' '))
161 print()
162
163 print("Depends:")
164 text = """ ${misc:Depends},
165 debhelper-compat (= 13),
166 dpkg-dev (>= 1.15.5),
167 nginx-core (<< 5.1~) | nginx-light (<< 5.1~) | nginx-extras (<< 5.1~),
168 nginx-core (>= 5) | nginx-light (>= 5) | nginx-extras (>= 5)
169 """
170 print(text)
171 pp.pprint(unpack_to_dict(text, key_cut='#', key_sep=' '))
172 print()
173
174 print("Depends:")
175 text = """ ${misc:Depends},
176 debhelper-compat (= 13),
177 dpkg-dev (>= 1.15.5),
178 nginx-core (<< 5.1~) | nginx-light (<< 5.1~) | nginx-extras (<< 5.1~),
179 nginx-core (>= 5) | nginx-light (>= 5) | nginx-extras (>= 5)
180 """
181 print(text)
182 pp.pprint(unpack_to_dict(text, key_sep=' ', key_cut='#'))
183 print()
diff --git a/tests/test_dict.py b/tests/test_dict.py
0new file mode 100644184new file mode 100644
index 0000000..8650d69
--- /dev/null
+++ b/tests/test_dict.py
@@ -0,0 +1,117 @@
1#!/usr/bin/env python3
2# -*- Mode: Python; coding: utf-8; indent-tabs-mode: nil; tab-width: 4 -*-
3
4# Author: Bryce Harrington <bryce@canonical.com>
5#
6# Copyright (C) 2023 Bryce W. Harrington
7#
8# Released under GNU GPLv2 or later, read the file 'LICENSE.GPLv2+' for
9# more information.
10
11"""Tests helper routines from the dict module."""
12
13import os
14import sys
15import pytest
16
17sys.path.insert(0, os.path.realpath(
18 os.path.join(os.path.dirname(os.path.realpath(__file__)), "..")))
19
20from ppa.dict import unpack_to_dict
21
22
23@pytest.mark.parametrize('text, expected', [
24 ('a', {'a': ''}),
25 ('a=1', {'a': '1'}),
26 ('a=1:2', {'a': '1:2'}),
27 ('a = 1', {'a': '1'}),
28 ('a:x', {'a': ''}),
29 ('a:x:', {'a': ''}),
30 ('a:x:y', {'a': ''}),
31 ('a:x=1', {'a': '1'}),
32 ('a:x = 1', {'a': '1'}),
33 ('a=x=1', {'a': 'x=1'}),
34 ('a : x=1', {'a': '1'}),
35 ('a,b', {'a': '', 'b': ''}),
36 ('a, b', {'a': '', 'b': ''}),
37 ('a,b=1.2.3,c:x=4', {'a': '', 'b': '1.2.3', 'c': '4'}),
38 ('a, b=1.2.3, c:x=4', {'a': '', 'b': '1.2.3', 'c': '4'}),
39 ('a, b|c', {'a': '', ('b', 'c'): {'b': '', 'c': ''}}),
40 ('a, b=1|c=2', {'a': '', ('b', 'c'): {'b': '1', 'c': '2'}}),
41 ('a, b=1|c:x=2, d=3', {'a': '', ('b', 'c'): {'b': '1', 'c': '2'}, 'd': '3'}),
42])
43def test_unpack_to_dict(text, expected):
44 """Checks the unpack_to_dict() routine's string unpacking."""
45 result = unpack_to_dict(text)
46
47 assert result
48 assert isinstance(result, dict)
49 assert result == expected
50
51
52@pytest.mark.parametrize('text, expected_exception', [
53 (None, ValueError),
54 ('', ValueError),
55 (',', ValueError),
56 (',z', ValueError),
57 (':', ValueError),
58 (':z', ValueError),
59 ('=', ValueError),
60 ('=z', ValueError),
61])
62def test_unpack_to_dict_error(text, expected_exception):
63 """Checks the unpack_to_dict() routine's string unpacking."""
64 with pytest.raises(expected_exception):
65 unpack_to_dict(text)
66
67
68@pytest.mark.parametrize('text, key_cut, sep, expected', [
69 ('a:x=1', ':', '=', {'a': '1'}),
70
71 ('a.x=1', '.', '=', {'a': '1'}),
72 ('a+x=1', '+', '=', {'a': '1'}),
73 ('a-x=1', '-', '=', {'a': '1'}),
74 ('a~x=1', '~', '=', {'a': '1'}),
75 ('a_x=1', '_', '=', {'a': '1'}),
76 ('a!x=1', '!', '=', {'a': '1'}),
77 ('a;x=1', ';', '=', {'a': '1'}),
78 ('a/x=1', '/', '=', {'a': '1'}),
79
80 ('a:x.1', ':', '.', {'a': '1'}),
81 ('a:x+1', ':', '+', {'a': '1'}),
82 ('a:x-1', ':', '-', {'a': '1'}),
83 ('a:x~1', ':', '~', {'a': '1'}),
84 ('a:x_1', ':', '_', {'a': '1'}),
85 ('a:x!1', ':', '!', {'a': '1'}),
86 ('a:x;1', ':', ';', {'a': '1'}),
87 ('a:x/1', ':', '/', {'a': '1'}),
88
89 # Spaces are allowed as separators
90 ('a 1', ':', ' ', {'a': '1'}),
91 ('a 1, b, c 3', ':', ' ', {'a': '1', 'b': '', 'c': '3'}),
92
93])
94def test_unpack_to_dict_parameters(text, sep, key_cut, expected):
95 """Checks the unpack_to_dict() routine's string unpacking."""
96 result = unpack_to_dict(text, key_sep=sep, key_cut=key_cut)
97
98 assert result
99 assert isinstance(result, dict)
100 assert result == expected
101
102
103@pytest.mark.parametrize('text, key_cut, sep, expected_exception', [
104 # Commas are reserved as the item separator
105 ('a:x=1', ',', '=', ValueError),
106 ('a:x=1', ':', ',', ValueError),
107 ('a:x=1', ',', ',', ValueError),
108
109 # key_cut and sep have to be different
110 ('a:x=1', ':', ':', ValueError),
111 ('a:x=1', '=', '=', ValueError),
112 ('a:x=1', '.', '.', ValueError),
113])
114def test_unpack_to_dict_parameters_error(text, sep, key_cut, expected_exception):
115 """Checks the unpack_to_dict() error handling, with invalid parameters."""
116 with pytest.raises(expected_exception):
117 unpack_to_dict(text, key_sep=sep, key_cut=key_cut)

Subscribers

People subscribed via source and target branches

to all changes: