Ubuntu CVE Tracker

Merge ~mdeslaur/ubuntu-cve-tracker:perf-part4 into ubuntu-cve-tracker:master

Proposed by Marc Deslauriers on 2024-03-05

Status:	Merged
Merge reported by:	Marc Deslauriers
Merged at revision:	06df23418f8851d45c435207f0e5831826805293
Proposed branch:	~mdeslaur/ubuntu-cve-tracker:perf-part4
Merge into:	ubuntu-cve-tracker:master
Diff against target:	129 lines (+80/-6) 4 files modified scripts/active_edit (+1/-6) scripts/gen-source-map-cache (+50/-0) scripts/packages-mirror (+9/-0) scripts/source_map.py (+20/-0)
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
Alex Murray		2024-03-05	Approve on 2024-03-06
Review via email: mp+461826@code.launchpad.net

Commit message

commit 7e46124a0687ea0b4ae955cd23d4d3fbde6ec9ed
Author: Marc Deslauriers <email address hidden>
Date: Tue Mar 5 12:40:42 2024 -0500

gen-source-map-cache: create pickles of cached source map data

    Generating a source map is an expensive operation. This script
    generates source map data and stores it into pickles in the
    packages-mirror directory for use by source_map.py later on.

commit 45f1855791946615b8b7fd261979e7168addadcb
Author: Marc Deslauriers <email address hidden>
Date: Tue Mar 5 12:42:59 2024 -0500

packages-mirror: call gen-source-map-cache to update pickles

commit 52c25d36857a9284fd4066f4f1dff1c580dd8691
Author: Marc Deslauriers <email address hidden>
Date: Tue Mar 5 12:43:26 2024 -0500

source_map.py: improve performance by using cached data

If a pickle containing cached information is available, load that
instead of parsing all the Sources files.

commit 5762b55746375254778688fe3d0d64f1811d45f7 (HEAD -> perf-part4)
Author: Marc Deslauriers <email address hidden>
Date: Tue Mar 5 12:45:45 2024 -0500

active_edit: use the default source_map options to get the cache

Description of the change

This branch creates pickles of cached source map data whenever packages-mirror has been run. This massively improves runtime of small scripts such as active_edit:

active_edit before:
$ time ./scripts/active_edit -c CVE-2024-NNN1 -p python-django

real 0m3.406s
user 0m3.290s
sys 0m0.116s

active_edit after:
$ time ./scripts/active_edit -c CVE-2024-NNN1 -p python-django

real 0m0.443s
user 0m0.387s
sys 0m0.056s

check-syntax before:
$ time ./scripts/check-syntax -j8
real 0m50.289s
user 6m6.409s
sys 0m6.995s

check-syntax after:
$ time ./scripts/check-syntax -j8
real 0m48.658s
user 6m19.725s
sys 0m7.410s

Revision history for this message

Marc Deslauriers (mdeslaur) wrote on 2024-03-05:

This needs work still, 5762b55746375254778688fe3d0d64f1811d45f7 is incorrect.

Revision history for this message

Marc Deslauriers (mdeslaur) wrote on 2024-03-05:

So 5762b55746375254778688fe3d0d64f1811d45f7 modifies the way active_edit will create a new CVE file. It will no longer add the end of standard support releases...for example:

Patches_anacron:
upstream_anacron: needs-triage
trusty/esm_anacron: needs-triage
esm-infra/xenial_anacron: needs-triage
esm-infra/bionic_anacron: needs-triage
focal_anacron: needs-triage
jammy_anacron: needs-triage
mantic_anacron: needs-triage
devel_anacron: needs-triage

instead of:

Patches_anacron:
upstream_anacron: needs-triage
trusty_anacron: ignored (end of standard support)
trusty/esm_anacron: needs-triage
xenial_anacron: ignored (end of standard support)
esm-infra/xenial_anacron: needs-triage
bionic_anacron: ignored (end of standard support)
esm-infra/bionic_anacron: needs-triage
focal_anacron: needs-triage
jammy_anacron: needs-triage
mantic_anacron: needs-triage
devel_anacron: needs-triage

Do we really need those releases to be listed in new CVE files? All they do is add noise, and if nothing uses them, perhaps we should simply leave them out?

Revision history for this message

Alex Murray (alexmurray) wrote on 2024-03-05:

Re: the old releaase - I always thought they seemed redundant so I am happy to seem them go.

One question - why not use `detailed=True` for the cache so that it is as complete as possible since we are generating it out-of-band then we may as well put as much in it as we can IMO.

Revision history for this message

Marc Deslauriers (mdeslaur) wrote on 2024-03-05:

> Re: the old releaase - I always thought they seemed redundant so I am happy to
> seem them go.
>
> One question - why not use `detailed=True` for the cache so that it is as
> complete as possible since we are generating it out-of-band then we may as
> well put as much in it as we can IMO.

Good point, I'll measure if there's a runtime search performance difference with a bigger dataset, and if it's negligible, I'll switch it to detailed.

Revision history for this message

Marc Deslauriers (mdeslaur) wrote on 2024-03-05:

So there's a good difference with the bigger dataset, almost half a second longer:

detailed=True:
$ time ./scripts/active_edit -c CVE-2024-NNN1 -p python-django

real 0m0.848s
user 0m0.752s
sys 0m0.096s

detailed=False:
$ time ./scripts/active_edit -c CVE-2024-NNN1 -p python-django

real 0m0.440s
user 0m0.383s
sys 0m0.056s

I think we're better off keeping a non-detailed dataset, and if we think other tools could benefit from this, we can certainly generate a detailed dataset too. There's no reason why we can't generate a few different pickles there, depending on usage.

Revision history for this message

Alex Murray (alexmurray) wrote on 2024-03-06:

Awesome - LGTM - thanks for doing that additional analysis Marc - I agree, the extra overhead is not worth it so let's go for performance and like you say we can add additional caches in the future if needed.

review: Approve

Revision history for this message

Eduardo Barretto (ebarretto) wrote on 2024-03-06:

> So 5762b55746375254778688fe3d0d64f1811d45f7 modifies the way active_edit will
> create a new CVE file. It will no longer add the end of standard support
> releases...for example:
>
> Patches_anacron:
> upstream_anacron: needs-triage
> trusty/esm_anacron: needs-triage
> esm-infra/xenial_anacron: needs-triage
> esm-infra/bionic_anacron: needs-triage
> focal_anacron: needs-triage
> jammy_anacron: needs-triage
> mantic_anacron: needs-triage
> devel_anacron: needs-triage
>
>
> instead of:
>
> Patches_anacron:
> upstream_anacron: needs-triage
> trusty_anacron: ignored (end of standard support)
> trusty/esm_anacron: needs-triage
> xenial_anacron: ignored (end of standard support)
> esm-infra/xenial_anacron: needs-triage
> bionic_anacron: ignored (end of standard support)
> esm-infra/bionic_anacron: needs-triage
> focal_anacron: needs-triage
> jammy_anacron: needs-triage
> mantic_anacron: needs-triage
> devel_anacron: needs-triage
>
> Do we really need those releases to be listed in new CVE files? All they do is
> add noise, and if nothing uses them, perhaps we should simply leave them out?

from an OVAL point of view, I don't think this will be a problem as we moved into having one OVAL (CVE- or PKG-based) per release, therefore it will consider trusty/esm in that case, instead of trusty, for example.

I believe that won't create an issue for the Web CVE Tracker, but maybe it might be worth testing with a single CVE file and seeing if anything happens in the website.

Revision history for this message

Paulo Flabiano Smorigo (pfsmorigo) wrote on 2024-03-06:

LGTM. Maybe just fix the lpci failures:
scripts/packages-mirror: line 358: ./scripts/gen-source-map-cache: No such file or directory

~mdeslaur/ubuntu-cve-tracker:perf-part4 updated on 2024-03-06

06df234... by Marc Deslauriers on 2024-03-06: packages-mirror: don't generate pickles if the tool is missing

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2024-03-07:

Nice, LGTM :)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Aravind

Marc Deslauriers

Philip Roche

Robert C Jennings

Simon Quigley

Steve Beattie

Ubuntu Security Team

 diff --git a/scripts/active_edit b/scripts/active_edit
 index 321f39a..a0e0efa 100755
 --- a/scripts/active_edit
 +++ b/scripts/active_edit
@@ -37,12 +37,7 @@ parser.add_option("-C", "--cvss", help="CVSS3.1 rating", metavar="CVSS:3.1/AV:_/
  parser.add_option("-d", "--description", help="Description", default=None)
  (options, args) = parser.parse_args()
--
--source_releases = []
--for release in cve_lib.all_releases:
--    if cve_lib.is_active_esm_release(release) or cve_lib.is_active_release(release):
--        source_releases.append(release)
--source = source_map.load(releases=source_releases, skip_eol_releases=False, detailed=False)
++source = source_map.load(detailed=False)
  added_rel_pkg = []
 diff --git a/scripts/gen-source-map-cache b/scripts/gen-source-map-cache
 new file mode 100755
 index 0000000..afc6956
 --- /dev/null
 +++ b/scripts/gen-source-map-cache
@@ -0,0 +1,50 @@
++#!/usr/bin/python3
++# -*- coding: utf-8 -*-
++#
++# This script pre-calculates some sources maps and saves them as pickle
++# files to speed up being used by various tools.
++#
++# Author: Marc Deslauriers <marc.deslauriers@canonical.com>
++# Copyright (C) 2024 Canonical Ltd.
++#
++# This script is distributed under the terms and conditions of the GNU General
++# Public License, Version 3 or later. See http://www.gnu.org/copyleft/gpl.html
++# for details.
++#
++
++import sys
++import os
++import pickle
++import cve_lib
++import source_map
++
++def main():
++    config = source_map.read_config_file(os.path.expanduser("~/.ubuntu-cve-tracker.conf"))
++    if 'packages_mirror' not in config:
++        sys.stderr.write("\rERROR: Could not find packages_mirror in config file!\n")
++        sys.exit(1)
++
++    mirror = config['packages_mirror']
++    sources_pickle = os.path.join(mirror, 'sources_map.pickle')
++    sources_pickle_dev_proposed = os.path.join(mirror, 'sources_map_dev_proposed.pickle')
++
++    # Remove these first so that source_map.load doesn't load them!
++    if os.path.exists(sources_pickle):
++        os.unlink(sources_pickle)
++    if os.path.exists(sources_pickle_dev_proposed):
++        os.unlink(sources_pickle_dev_proposed)
++
++    sources = source_map.load(detailed=False)
++    sources_dev_proposed = source_map.load(pockets=["-proposed"],
++                                           releases=[cve_lib.devel_release],
++                                           detailed=False)
++
++    # Use protocol=3 to remain compatible with focal
++    with open(sources_pickle, 'wb') as handle:
++        pickle.dump(sources, handle, protocol=3)
++
++    with open(sources_pickle_dev_proposed, 'wb') as handle:
++        pickle.dump(sources_dev_proposed, handle, protocol=3)
++
++if __name__ == '__main__':
++    main()
 diff --git a/scripts/packages-mirror b/scripts/packages-mirror
 index f7a9886..067e128 100755
 --- a/scripts/packages-mirror
 +++ b/scripts/packages-mirror
@@ -350,3 +350,12 @@ fi
  for path in "$outPath" "$debianPath"; do
    find "$path" -name Sources.gz -exec zgrep '^Package: .*' {} \; | cut -c10- | sort -u > "$path/sources"
  done
++
++# generate source map cache pickles
++SOURCE_MAP_SCRIPT=$(dirname "$0")/gen-source-map-cache
++if [ -f "$SOURCE_MAP_SCRIPT" ]; then
++    if [ "$verbosity_args" != "-q" ]; then
++        echo "Generating Source Map Cache"
++    fi
++    $SOURCE_MAP_SCRIPT
++fi
 diff --git a/scripts/source_map.py b/scripts/source_map.py
 index 226b92e..c5ec9ad 100755
 --- a/scripts/source_map.py
 +++ b/scripts/source_map.py
@@ -18,6 +18,7 @@ import subprocess
  import sys
  import cve_lib
  import yaml
++import pickle
  built_using_tags = ["Built-Using", "Static-Built-Using", "X-Cargo-Built-Using"]
  apt_pkg.init_system()
@@ -216,6 +217,25 @@ def load(data_type='sources', pockets=None, releases=None, skip_eol_releases=Tru
      if data_type not in ['sources', 'packages']:
          raise ValueError("'data_type' should be either 'sources' or 'packages'")
++    # Attempt to load from cached pickle files
++    config = read_config_file(os.path.expanduser("~/.ubuntu-cve-tracker.conf"))
++    if 'packages_mirror' in config:
++        mirror = config['packages_mirror']
++
++        if data_type == 'sources' and skip_eol_releases == True and arch == "amd64" and detailed == False:
++            if pockets == None and releases == None:
++                sources_pickle = os.path.join(mirror, 'sources_map.pickle')
++                if os.path.exists(sources_pickle):
++                    with open(sources_pickle, 'rb') as handle:
++                        map = pickle.load(handle)
++                    return map
++            elif pockets == ["-proposed"] and releases == [cve_lib.devel_release]:
++                sources_pickle_dev_proposed = os.path.join(mirror, 'sources_map_dev_proposed.pickle')
++                if os.path.exists(sources_pickle_dev_proposed):
++                    with open(sources_pickle_dev_proposed, 'rb') as handle:
++                        map = pickle.load(handle)
++                    return map
++
      map = dict()
      if data_type == 'sources':
          for item in _find_sources(pockets=pockets, releases=releases, skip_eol_releases=skip_eol_releases, arch=arch):