ktl/kernel_series: use environment to pick a cycle by default
Since commit d62fc2a89ba4 ("ktl/kernel_series: add per-cycle view
support"), it is possible to use a kernel series data for a particular
cycle. That requires, however, that code is changed to accept it as input
and use to instantiate that KernelSeries.
Tools that are not yet prepared for that could still benefit from using
per-cycle data and one simple way to do that is to use an environment
variable. In this case, KERNEL_SERIES_CYCLE.
This was tested with cranky-update-snap and cranky-dput-sources.
Enhances: d62fc2a89ba4 ("ktl/kernel_series: add per-cycle view support")
Signed-off-by: Thadeu Lima de Souza Cascardo <email address hidden>
swm: add support for cycle-specific data and improve performance
During the life-cycle of a package its variants, package-set, or even its routing may change. For example we may switch the main variant (--) from linux-hwe-5.15 to linux-hwe-5.19 in a new cycle. When we respin the affected kernel we now need to use the appropriate variants for the cycle in which the respin in intended. To support these changes we maintain cycle specific versions of kernel-series.yaml in kernel-versions (alongside the dkms-versions data).
As part of this series we also add some significant performance and reliability improvements. Parsing large YAML document is horribly slow (see the individual commits for details). By auto-converting the YAML files to compressed JSON we can reduce load times to under 20% of the original. By hosting these in PS5 we avoid the need to wait for Launchpad authentication and avoid the recurrent performance issues with that hosting.
Add support to KernelSeries for requesting a version for a specific cycle (or spin). Add support for using kernel-series.json.gz format data sources. Switching to those by default. New interfaces are provided for instantiating cycle specific data (the existing API is also maintained):
ktl/kernel_series: expand compress support to cycles
We are now maintining kernel-series.json.gz form for all cycle
switch to these by default. Also add a new KERNEL_SERIES_USE
environmental which lets you select from the main sources:
launchpad: the raw original data as committed to launchpad
local: local files in the tree
json: the kernel-series.json.gz data from kernel.ubuntu.com
Signed-off-by: Andy Whitcroft <email address hidden>
ktl/kernel_series: support gzip compressed json formatted data
We use YAML form for kernel-series because it is significantly easier
for a human to parse. It also allows reuse of common data helping to
reduce the size of the data itself. However it turns out that decoding
large YAML files is horribly expensive taking approximatly a full second
on an Thinkpad x250 laptop. Converting the same file mechanically to
JSON form creates a file 2.5x the size, but this still decodes 8x
faster. By compressing this using gzip we more than make up for that
size increase and significantly improve KernelSeries() instantiation
time by close to 5x.
We can maintain the convienience of a simple YAML format file for human
manipulation by maintaining a rapid mirror of the git based YAML file in
a compressed JSON form. By expressing this data in our new PS5
envionment we also gain further benefits from avoiding git.l.n
permissions checks. The only negative is introduction of a short (of
the order of a minute) delay to updates to kernel-series.
Detailed analysis. Taking a snapshot of kernel-series.yaml for
comparison, it is current 285Kb and a raw conversion
(json.dumps(yaml.safe_load())) is 716Kb:
Loading and parsing the kernel-series.yaml takes about 1.17s (averaged
over 10 loads):
1: ./test7.py
Mean Std.Dev. Min Median Max
real 1.174 0.082 1.095 1.143 1.331
user 1.130 0.067 1.066 1.104 1.262
sys 0.029 0.010 0.012 0.028 0.044
loading and parsing the kernel-series.json take about 0.12s (over 10
runs) dispite being 2.5x times the size due to routing-table duplication
in the conversion:
1: ./test7.py
Mean Std.Dev. Min Median Max
real 0.123 0.016 0.111 0.116 0.166
user 0.103 0.007 0.094 0.101 0.118
sys 0.016 0.005 0.008 0.016 0.029
Things are obviously worse when this data is remote in the DC, loading
kernel-series.yaml from the DC, costing an average of .22s to download:
1: ./test7.py
Mean Std.Dev. Min Median Max
real 1.389 0.118 1.307 1.347 1.730
user 1.083 0.047 1.019 1.072 1.183
sys 0.033 0.026 0.012 0.020 0.096
Loading loading kernel-series.json from the DC, costing an average of .36s
to download due to its size:
1: ./test7.py
Mean Std.Dev. Min Median Max
real 0.478 0.018 0.460 0.474 0.522
user 0.116 0.013 0.102 0.113 0.138
sys 0.019 0.009 0.004 0.021 0.031
It should be noted that the kernel-series data is highly repetative and
as such is highly compressible, compressing to 2% of its original size:
Loading and decompressing kernel-series.json.gz locally is very close to the
same performance as the uncompressed load, and indeed the difference here is
within the noise of repeated runs:
1: ./test7.py
Mean Std.Dev. Min Median Max
real 0.131 0.009 0.121 0.128 0.149
user 0.115 0.009 0.096 0.115 0.125
sys 0.015 0.009 0.000 0.014 0.027
Loading this kernel-series.json.gz from the DC is significantly faster,
reducing the remote overhead to 0.16s:
1: ./test7.py
Mean Std.Dev. Min Median Max
real 0.287 0.015 0.272 0.282 0.327
user 0.129 0.014 0.108 0.127 0.159
sys 0.015 0.006 0.004 0.015 0.025
Finally it should be noted that this comparison was done using a mirror
of the kernel-series.yaml data to allow the times to be compared fairly.
Putting kernel-series.yaml from git.launchpad.net (the current default)
is a further 0.2s slower.
1: ./test7.py
Mean Std.Dev. Min Median Max
real 1.606 0.162 1.424 1.548 1.896
user 1.056 0.033 1.028 1.040 1.137
sys 0.026 0.004 0.020 0.024 0.036
Signed-off-by: Andy Whitcroft <email address hidden>
Add support for requesting a KernelSeries object for a specific SRU
cycle. kernel-series.yaml is frozen into kernel-versions for each live
SRU cycle. Add support for selecting a specific cycle from these
copies. Instantiated cycles are cached.
Cycle selection can be made in one of two ways. Firstly you can
directly use the KernelSeries class to request an instantiated cycle
(or spin) specific instance:
Secondly you can instantiate a KernelSeriesCycles() object and perform
the same requests against that. This allows greater control over the
builtin caching which follows the lifecycle of that object.
Signed-off-by: Andy Whitcroft <email address hidden>