Merge ~ebarretto/ubuntu-cve-tracker:new-pkg-cache into ubuntu-cve-tracker:master
Status: | Merged |
---|---|
Merged at revision: | 4ea99b2f2274de8b79513decac903995de67f68b |
Proposed branch: | ~ebarretto/ubuntu-cve-tracker:new-pkg-cache |
Merge into: | ubuntu-cve-tracker:master |
Diff against target: |
204 lines (+198/-0) 1 file modified
scripts/generate_pkg_cache.py (+198/-0) |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
David Fernandez Gonzalez | Approve | ||
Review via email: mp+443872@code.launchpad.net |
Description of the change
Currently we have a package cache code inside generate-oval that is not acting as good as it could and also it is not obtaining all the information we need to generate OVAL data. To solve that I'm proposing the following script to generate a package cache in a de-coupled step and not inside generate-oval.
This is the first step of a 4 step process that I've designed:
1. Propose this new script to generate a package cache by getting all the needed information from launchpad. Our current cache does not have information such as binary version, only source package version and those can differ and are currently creating some false positives for some packages. The cache will be a .json file for each of the ubuntu releases supported. I'm only querying launchpad for Release, Security and Updates pockets.
2. After this proposal is merged, I will include this script to the oval generation cron job so it starts generating cache data. I already have locally a start cache data, so that we don't loose too much time creating it from scratch. Also in the cron job I will copy the cache data to another server, so we can have a backup and a similar process to fetch-db, so we can simply fetch latest cache data.
3. Propose a fetch-db like script to get the cache data from the backup server.
4. Propose the changes to generate-oval, and probably oval_lib, to use this new cache instead of the current one.
Querying a new Ubuntu release from scratch does take a long time, but since we only have 2 new releases per year, this is still a good trade. And being able to have backups of this data provides some assurances, rather than creating from scratch multiple times.
I'm not sure if this cache can still replace packages-mirror and have source_map to use it instead. I feel that we need to check if there are other information that we have in source_map that are not available through this cache. One that comes to my mind is package description that is used for package based OVAL, and we couldn't figure out how to fetch it from LP. Also if a package is removed during devel cycle, it still will show up in the pkg cache, so we would need to filter out deleted packages, and do it in a smart way.
Finally, the cache stores the date from the last queried build so in a next run it will query builds from that timestamp forward.
LGTM! Thanks for this :)
A couple of side notes:
* As the cache-dir is mandatory for the script to run, I would change that to be a mandatory CLI argument. From a different perspective, we could specify some directory as the default one (somewhere in UCT) and allow the user to select an alternative one. If we consider this as a replacement for source_map, we should have a default.
* Regarding "if a package is removed during devel cycle, it still will show up in the pkg cache", we could include the source package status as a new key in the structure. That would allow the tooling to quickly retrieve only the Published packages but still retain the information regarding other statuses for OVAL.