Merge ~mdeslaur/ubuntu-cve-tracker:perf-part3 into ubuntu-cve-tracker:master

Proposed by Marc Deslauriers
Status: Needs review
Proposed branch: ~mdeslaur/ubuntu-cve-tracker:perf-part3
Merge into: ubuntu-cve-tracker:master
Diff against target: 131 lines (+21/-23)
3 files modified
scripts/active_edit (+1/-1)
scripts/check-syntax (+4/-4)
scripts/source_map.py (+16/-18)
Reviewer Review Type Date Requested Status
Mark Esler Approve
Alex Murray Approve
Review via email: mp+461759@code.launchpad.net

Commit message

commit 0606059918cd6e9c9daecf362066377c37324033
Author: Marc Deslauriers <email address hidden>
Date: Mon Mar 4 17:09:01 2024 -0500

    improve performance in source_map.py by skipping package details

    When source_map.py parses the Sources files, it loads a lot of
    details including doing version comparisons. This is expensive and
    isn't needed for active_edit. Add a details parameter to
    load_sources_collection() that skips the extra details.

commit 42d41242937e972e3061d4782130daefe8b840cb
Author: Marc Deslauriers <email address hidden>
Date: Mon Mar 4 17:06:31 2024 -0500

    source_map.py: apt_pkg.TagFile() natively supports gzip files

Description of the change

Skip some unneeded parts of Source files loading to improve performance.

BEFORE:
$ time ./scripts/active_edit -c CVE-2024-NNN1 -p python-django

real 0m4.912s
user 0m5.278s
sys 0m0.459s

AFTER:
$ time ./scripts/active_edit -c CVE-2024-NNN1 -p python-django

real 0m3.411s
user 0m3.279s
sys 0m0.133s

To post a comment you must log in.
5a6a72e... by Marc Deslauriers

source_map.py: add some missing whitespace (cosmetic only)

5a6d47e... by Marc Deslauriers

check-syntax: improve performance by skipping certain source details

The only place where version information is required for source
packages is when --strict is used, so when not using that option,
don't load detailed source map information.

Revision history for this message
Marc Deslauriers (mdeslaur) wrote :

The last commit shaved a few seconds more off of check-syntax:

BEFORE:
$ time ./scripts/check-syntax -j8

real 0m50.371s
user 5m58.289s
sys 0m6.823s

AFTER:
$ time ./scripts/check-syntax -j8

real 0m45.963s
user 5m24.200s
sys 0m5.888s

Revision history for this message
Alex Murray (alexmurray) wrote :

Nice - yep can confirm the improvement in check-syntax too

review: Approve
Revision history for this message
Alex Murray (alexmurray) wrote :

I wonder if _apt_get_tags() should decompress the files and leave them uncompressed on disk to save time on subsequent runs? Or whether packages-mirror should just do this itself?

Revision history for this message
Mark Esler (eslerm) wrote :

Nice .gz find.

review: Approve
Revision history for this message
Marc Deslauriers (mdeslaur) wrote (last edit ):

> I wonder if _apt_get_tags() should decompress the files and leave them
> uncompressed on disk to save time on subsequent runs? Or whether packages-
> mirror should just do this itself?

That's a good question, I'll have to time that to see if it would save any time.

Unmerged commits

5a6d47e... by Marc Deslauriers

check-syntax: improve performance by skipping certain source details

The only place where version information is required for source
packages is when --strict is used, so when not using that option,
don't load detailed source map information.

Succeeded
[SUCCEEDED] unit-tests:0 (build)
[SUCCEEDED] check-cves:0 (build)
12 of 2 results
5a6a72e... by Marc Deslauriers

source_map.py: add some missing whitespace (cosmetic only)

Succeeded
[SUCCEEDED] unit-tests:0 (build)
[SUCCEEDED] check-cves:0 (build)
12 of 2 results
0606059... by Marc Deslauriers

improve performance in source_map.py by skipping package details

When source_map.py parses the Sources files, it loads a lot of
details including doing version comparisons. This is expensive and
isn't needed for active_edit. Add a details parameter to
load_sources_collection() that skips the extra details.

Succeeded
[SUCCEEDED] unit-tests:0 (build)
[SUCCEEDED] check-cves:0 (build)
12 of 2 results
42d4124... by Marc Deslauriers

source_map.py: apt_pkg.TagFile() natively supports gzip files

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/scripts/active_edit b/scripts/active_edit
2index 9a1c18a..321f39a 100755
3--- a/scripts/active_edit
4+++ b/scripts/active_edit
5@@ -42,7 +42,7 @@ source_releases = []
6 for release in cve_lib.all_releases:
7 if cve_lib.is_active_esm_release(release) or cve_lib.is_active_release(release):
8 source_releases.append(release)
9-source = source_map.load(releases=source_releases, skip_eol_releases=False)
10+source = source_map.load(releases=source_releases, skip_eol_releases=False, detailed=False)
11
12 added_rel_pkg = []
13
14diff --git a/scripts/check-syntax b/scripts/check-syntax
15index 1dd3781..78640b3 100755
16--- a/scripts/check-syntax
17+++ b/scripts/check-syntax
18@@ -30,10 +30,6 @@ import usn_lib
19
20 import source_map
21
22-source = source_map.load()
23-if cve_lib.devel_release:
24- dev_proposed = source_map.load(pockets=["-proposed"], releases=[cve_lib.devel_release])
25-
26 # add to this list when adding tracking for kernels that have not been
27 # published to updates/security yet. Once published, can be removed
28 # (script should warn about that). If a kernel is published in one
29@@ -285,6 +281,10 @@ parser.add_option(
30 # parser.add_option("-c", "--cna", help="Ensure every CVE assigned by Canonical's CNA has an entry", action='store_true')
31 (opt, args) = parser.parse_args()
32
33+source = source_map.load(detailed=opt.strict)
34+if cve_lib.devel_release:
35+ dev_proposed = source_map.load(pockets=["-proposed"], releases=[cve_lib.devel_release], detailed=opt.strict)
36+
37 if opt.debug:
38 pp = pprint.PrettyPrinter(indent=4)
39
40diff --git a/scripts/source_map.py b/scripts/source_map.py
41index 48c22f4..226b92e 100755
42--- a/scripts/source_map.py
43+++ b/scripts/source_map.py
44@@ -212,21 +212,21 @@ def _find_sources_from_apt(pockets=None, releases=None):
45
46
47 # release -> pkg -> dict( 'section', 'pocket', 'version' )
48-def load(data_type='sources', pockets=None, releases=None, skip_eol_releases=True, arch="amd64"):
49+def load(data_type='sources', pockets=None, releases=None, skip_eol_releases=True, arch="amd64", detailed=True):
50 if data_type not in ['sources', 'packages']:
51 raise ValueError("'data_type' should be either 'sources' or 'packages'")
52
53 map = dict()
54 if data_type == 'sources':
55 for item in _find_sources(pockets=pockets, releases=releases, skip_eol_releases=skip_eol_releases, arch=arch):
56- load_sources_collection(item, map)
57+ load_sources_collection(item, map, detailed)
58 else:
59 for item in _find_packages(pockets=pockets, releases=releases, skip_eol_releases=skip_eol_releases, arch=arch):
60 load_packages_collection(item, map)
61
62 # subprojects only do sources, not binaries
63 if data_type == 'sources':
64- subproject_lists = load_subprojects_lists(releases=releases)
65+ subproject_lists = load_subprojects_lists(releases=releases, detailed=detailed)
66 for item in subproject_lists:
67 map.setdefault(item, dict())
68 for pkg in subproject_lists[item]:
69@@ -240,9 +240,7 @@ def load(data_type='sources', pockets=None, releases=None, skip_eol_releases=Tru
70
71 def _get_apt_tags(tagfile):
72 tags = None
73- if tagfile.endswith('.gz'):
74- tags = subprocess.Popen(['/bin/gunzip', '-c', tagfile], stdout=subprocess.PIPE).stdout
75- elif tagfile.endswith('.bz2'):
76+ if tagfile.endswith('.bz2'):
77 tags = subprocess.Popen(['/bin/bunzip2', '-c', tagfile], stdout=subprocess.PIPE).stdout
78 elif tagfile.endswith('.xz'):
79 tags = subprocess.Popen(['/usr//bin/xzcat', tagfile], stdout=subprocess.PIPE).stdout
80@@ -252,7 +250,7 @@ def _get_apt_tags(tagfile):
81 return tags
82
83
84-def load_sources_collection(item, map):
85+def load_sources_collection(item, map, detailed=True):
86 tagfile, release, pocket, section = item
87
88 parser = apt_pkg.TagFile(_get_apt_tags(tagfile))
89@@ -260,15 +258,15 @@ def load_sources_collection(item, map):
90 pkg = parser.section['Package']
91 map.setdefault(release, dict()).setdefault(pkg, {'section': 'unset', 'version': '~', 'pocket': 'unset'})
92 map[release][pkg]['section'] = section
93- if 'Description' in parser.section:
94- map[release][pkg]['description'] = parser.section['Description']
95- if not pocket:
96- map[release][pkg]['release_version'] = parser.section['Version']
97- if apt_pkg.version_compare(parser.section['Version'], map[release][pkg]['version']) > 0:
98- map[release][pkg]['pocket'] = pocket
99- map[release][pkg]['version'] = parser.section['Version']
100- map[release][pkg]['binaries'] = parser.section['Binary'].split(', ')
101-
102+ if detailed:
103+ if 'Description' in parser.section:
104+ map[release][pkg]['description'] = parser.section['Description']
105+ if not pocket:
106+ map[release][pkg]['release_version'] = parser.section['Version']
107+ if apt_pkg.version_compare(parser.section['Version'], map[release][pkg]['version']) > 0:
108+ map[release][pkg]['pocket'] = pocket
109+ map[release][pkg]['version'] = parser.section['Version']
110+ map[release][pkg]['binaries'] = parser.section['Binary'].split(', ')
111 return map
112
113
114@@ -416,7 +414,7 @@ def get_aliases_of_ubuntu_package(sources, pkg_name, rel):
115 aliases.append(pkg)
116 return aliases
117
118-def load_subprojects_lists(releases=None):
119+def load_subprojects_lists(releases=None, detailed=True):
120 map = dict()
121
122 if releases is None:
123@@ -424,7 +422,7 @@ def load_subprojects_lists(releases=None):
124
125 all_sources_esm = {}
126 for item in _find_sources(releases=cve_lib.get_active_releases_with_esm(), skip_eol_releases=False):
127- load_sources_collection(item, all_sources_esm)
128+ load_sources_collection(item, all_sources_esm, detailed=detailed)
129
130 for rel in releases:
131 _, _, _, details = cve_lib.get_subproject_details(rel)

Subscribers

People subscribed via source and target branches