Merge ~ebarretto/ubuntu-cve-tracker:new-pkg-cache into ubuntu-cve-tracker:master

Proposed by Eduardo Barretto
Status: Merged
Merged at revision: 4ea99b2f2274de8b79513decac903995de67f68b
Proposed branch: ~ebarretto/ubuntu-cve-tracker:new-pkg-cache
Merge into: ubuntu-cve-tracker:master
Diff against target: 204 lines (+198/-0)
1 file modified
scripts/generate_pkg_cache.py (+198/-0)
Reviewer Review Type Date Requested Status
David Fernandez Gonzalez Approve
Review via email: mp+443872@code.launchpad.net

Description of the change

Currently we have a package cache code inside generate-oval that is not acting as good as it could and also it is not obtaining all the information we need to generate OVAL data. To solve that I'm proposing the following script to generate a package cache in a de-coupled step and not inside generate-oval.

This is the first step of a 4 step process that I've designed:

1. Propose this new script to generate a package cache by getting all the needed information from launchpad. Our current cache does not have information such as binary version, only source package version and those can differ and are currently creating some false positives for some packages. The cache will be a .json file for each of the ubuntu releases supported. I'm only querying launchpad for Release, Security and Updates pockets.

2. After this proposal is merged, I will include this script to the oval generation cron job so it starts generating cache data. I already have locally a start cache data, so that we don't loose too much time creating it from scratch. Also in the cron job I will copy the cache data to another server, so we can have a backup and a similar process to fetch-db, so we can simply fetch latest cache data.

3. Propose a fetch-db like script to get the cache data from the backup server.

4. Propose the changes to generate-oval, and probably oval_lib, to use this new cache instead of the current one.

Querying a new Ubuntu release from scratch does take a long time, but since we only have 2 new releases per year, this is still a good trade. And being able to have backups of this data provides some assurances, rather than creating from scratch multiple times.

I'm not sure if this cache can still replace packages-mirror and have source_map to use it instead. I feel that we need to check if there are other information that we have in source_map that are not available through this cache. One that comes to my mind is package description that is used for package based OVAL, and we couldn't figure out how to fetch it from LP. Also if a package is removed during devel cycle, it still will show up in the pkg cache, so we would need to filter out deleted packages, and do it in a smart way.

Finally, the cache stores the date from the last queried build so in a next run it will query builds from that timestamp forward.

To post a comment you must log in.
Revision history for this message
David Fernandez Gonzalez (litios) wrote :

LGTM! Thanks for this :)

A couple of side notes:

* As the cache-dir is mandatory for the script to run, I would change that to be a mandatory CLI argument. From a different perspective, we could specify some directory as the default one (somewhere in UCT) and allow the user to select an alternative one. If we consider this as a replacement for source_map, we should have a default.

* Regarding "if a package is removed during devel cycle, it still will show up in the pkg cache", we could include the source package status as a new key in the structure. That would allow the tooling to quickly retrieve only the Published packages but still retain the information regarding other statuses for OVAL.

Revision history for this message
David Fernandez Gonzalez (litios) :
review: Approve
Revision history for this message
Eduardo Barretto (ebarretto) wrote :

Thanks for the review.
I've fixed the cache-dir, it shouldn't be mandatory.

* Regarding "if a package is removed during devel cycle, it still will show up in the pkg cache"
I will do some tests and try to come up with something in a future PR.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/scripts/generate_pkg_cache.py b/scripts/generate_pkg_cache.py
2new file mode 100644
3index 0000000..cc2efee
4--- /dev/null
5+++ b/scripts/generate_pkg_cache.py
6@@ -0,0 +1,198 @@
7+#!/usr/bin/python3
8+# -*- coding: utf-8 -*-
9+# Script to query launchpad and get package publication history for Release
10+# Security and Updates pockets. This is needed for creating OVAL content.
11+#
12+# Author: Eduardo Barretto <eduardo.barretto@canonical.com>
13+# Copyright (C) 2023 Canonical Ltd.
14+#
15+# This script is distributed under the terms and conditions of the GNU General
16+# Public License, Version 3 or later. See http://www.gnu.org/copyleft/gpl.html
17+# for details.
18+#
19+
20+from cve_lib import (all_releases, devel_release, eol_releases, needs_oval, product_series, release_parent, release_ppa)
21+
22+import argparse
23+import datetime
24+import json
25+import lpl_common
26+import os
27+import sys
28+
29+is_debug = False
30+
31+pockets = []
32+
33+def load_cache_file(cache_dir, release):
34+ if not cache_dir:
35+ cache_dir = os.getcwd()
36+ elif not os.path.exists(cache_dir):
37+ error("Cache directory does not exist")
38+
39+ rel = release.replace('/', '_')
40+ filename = f"{rel}-pkg-cache.json"
41+ cache = {}
42+ try:
43+ with open(os.path.join(cache_dir, filename), 'r') as json_file:
44+ cache = json.load(json_file)
45+ debug(f"Reading {filename}")
46+ pockets.append("Security")
47+ pockets.append("Updates")
48+ except OSError:
49+ debug(f"File {filename} not found!")
50+ debug(f"Creating {filename}")
51+ pockets.append("Release")
52+ except json.decoder.JSONDecodeError:
53+ error(f"There was a problem loading JSON file: {filename}")
54+
55+ return cache
56+
57+
58+def write_to_cache(cache_dir, release, cache):
59+ rel = release.replace('/', '_')
60+ debug(f"{rel}, {release}")
61+ filename = f"{rel}-pkg-cache.json"
62+ try:
63+ with open(os.path.join(cache_dir, filename), 'w+') as json_file:
64+ json.dump(cache, json_file, indent=2, sort_keys=True)
65+ except Exception as e:
66+ error(e)
67+
68+
69+def update_cache(release, cache, ppa=None, latest_date_created=None):
70+ lp = lpl_common.connect(version='devel')
71+ ubuntu = lp.distributions['ubuntu']
72+ series = ubuntu.getSeries(name_or_version=product_series(release)[1])
73+
74+ real_threshold = None
75+ if latest_date_created:
76+ # Allow a grace period to cope with publications arriving out of
77+ # order during long transactions.
78+ real_threshold = latest_date_created - datetime.timedelta(hours=1)
79+
80+ if ppa:
81+ archive, group, ppa_full_name = lpl_common.get_archive(
82+ ppa,
83+ lp,
84+ False,
85+ distribution=ubuntu
86+ )
87+ else:
88+ archive = ubuntu.main_archive
89+
90+ debug(f"Retrieving Launchpad publications since {real_threshold}")
91+ sources = archive.getPublishedSources(order_by_date=True, created_since_date=real_threshold, distro_series=series)
92+ for s in sources:
93+ if s.pocket not in pockets:
94+ continue
95+
96+ debug(f"{s.source_package_name}, {s.date_created}, {s.status}")
97+ if latest_date_created is None or s.date_created > latest_date_created:
98+ latest_date_created = s.date_created
99+
100+ src = s.source_package_name
101+ src_ver = s.source_package_version
102+ src_component = None
103+ if not ppa:
104+ src_component = s.component_name
105+
106+ binaries = s.getPublishedBinaries()
107+ for b in binaries:
108+ bin_name = b.binary_package_name
109+ bin_version = b.binary_package_version
110+ bin_component = b.component_name
111+ pocket = b.pocket
112+ bin_arch = b.display_name.split(' ')[-1]
113+
114+ if src not in cache:
115+ cache[src] = {}
116+ if src_ver not in cache[src]:
117+ cache[src][src_ver] = {
118+ "binaries": {},
119+ "component": src_component,
120+ "pocket": pocket,
121+ }
122+ if bin_name not in cache[src][src_ver]["binaries"]:
123+ cache[src][src_ver]["binaries"][bin_name] = {
124+ "arch": [],
125+ "component": bin_component,
126+ "version": bin_version
127+ }
128+ if bin_arch not in cache[src][src_ver]["binaries"][bin_name]["arch"]:
129+ cache[src][src_ver]["binaries"][bin_name]["arch"].append(bin_arch)
130+
131+ return latest_date_created
132+
133+
134+def warn(message):
135+ """ print a warning message """
136+ sys.stdout.write(f"\rWARNING: {message}\n")
137+
138+
139+def error(message):
140+ """ print an error message """
141+ sys.stderr.write(f"\rERROR: {message}\n")
142+ sys.exit(1)
143+
144+
145+def debug(message):
146+ """ print a debugging message """
147+ if is_debug:
148+ sys.stdout.write(f"\rDEBUG: {message}\n")
149+
150+
151+def parse_args():
152+ argparser = argparse.ArgumentParser()
153+ argparser.add_argument("--cache-dir", help="cache files directory")
154+ argparser.add_argument("-d", "--debug", action="store_true",
155+ help="make execution verbose")
156+ return argparser.parse_args()
157+
158+
159+def main():
160+ args = parse_args()
161+
162+ global is_debug
163+ is_debug = args.debug
164+
165+ supported_releases = []
166+ for r in set(all_releases).difference(set(eol_releases)).difference(set([devel_release])):
167+ if needs_oval(r):
168+ supported_releases.append(r)
169+ parent = release_parent(r)
170+ if parent and parent not in supported_releases:
171+ supported_releases.append(parent)
172+
173+ try:
174+ for release in supported_releases:
175+ cache = load_cache_file(args.cache_dir, release)
176+
177+ latest_date_created = None
178+ if "latest_date_created" in cache:
179+ latest_date_created = datetime.datetime.fromtimestamp(
180+ cache["latest_date_created"],
181+ tz=datetime.timezone.utc
182+ )
183+
184+ debug('UPDATING CACHE')
185+
186+ ppa = release_ppa(release)
187+ latest_date_created = update_cache(release, cache, ppa, latest_date_created)
188+
189+ if latest_date_created is not None:
190+ epoch = datetime.datetime.fromtimestamp(0, tz=datetime.timezone.utc)
191+ new_threshold = (latest_date_created - epoch).total_seconds()
192+ cache["latest_date_created"] = new_threshold
193+
194+ debug(f"NEW DATE: {cache['latest_date_created']}")
195+
196+ debug('WRITING TO CACHE')
197+ write_to_cache(args.cache_dir, release, cache)
198+
199+ except Exception as e:
200+ error(e)
201+
202+
203+if __name__ == '__main__':
204+ main()

Subscribers

People subscribed via source and target branches