~canonical-kernel/+git/kteam-tools:cascardo/cranky_kernel_series_per_cycle

Last commit made on 2023-06-27
Get this branch:
git clone -b cascardo/cranky_kernel_series_per_cycle https://git.launchpad.net/~canonical-kernel/+git/kteam-tools
Members of Canonical Kernel can upload to this branch. Log in for directions.

Branch merges

Branch information

Name:
cascardo/cranky_kernel_series_per_cycle
Repository:
lp:~canonical-kernel/+git/kteam-tools

Recent commits

84459c4... by Thadeu Lima de Souza Cascardo

ktl/kernel_series: use environment to pick a cycle by default

Since commit d62fc2a89ba4 ("ktl/kernel_series: add per-cycle view
support"), it is possible to use a kernel series data for a particular
cycle. That requires, however, that code is changed to accept it as input
and use to instantiate that KernelSeries.

Tools that are not yet prepared for that could still benefit from using
per-cycle data and one simple way to do that is to use an environment
variable. In this case, KERNEL_SERIES_CYCLE.

This was tested with cranky-update-snap and cranky-dput-sources.

Enhances: d62fc2a89ba4 ("ktl/kernel_series: add per-cycle view support")
Signed-off-by: Thadeu Lima de Souza Cascardo <email address hidden>

44bc17e... by Andy Whitcroft

swm: add support for cycle-specific data and improve performance

During the life-cycle of a package its variants, package-set, or even its routing may change. For example we may switch the main variant (--) from linux-hwe-5.15 to linux-hwe-5.19 in a new cycle. When we respin the affected kernel we now need to use the appropriate variants for the cycle in which the respin in intended. To support these changes we maintain cycle specific versions of kernel-series.yaml in kernel-versions (alongside the dkms-versions data).

As part of this series we also add some significant performance and reliability improvements. Parsing large YAML document is horribly slow (see the individual commits for details). By auto-converting the YAML files to compressed JSON we can reduce load times to under 20% of the original. By hosting these in PS5 we avoid the need to wait for Launchpad authentication and avoid the recurrent performance issues with that hosting.

Add support to KernelSeries for requesting a version for a specific cycle (or spin). Add support for using kernel-series.json.gz format data sources. Switching to those by default. New interfaces are provided for instantiating cycle specific data (the existing API is also maintained):

    KernelSeries.for_cycle("2023.05.15")
    KernelSeries.for_spin("2023.05.15-1")
    KernelSeries.tip()

Acked-by: Stefan Bader <email address hidden>
Acked-by: Cory Todd <email address hidden>
Acked-by: Juerg Haefliger <email address hidden>
Signed-off-by: Andy Whitcroft <email address hidden>

2d031b1... by Andy Whitcroft

ktl/kernel_series: document and enforce deprecation of environmental overrides

Signed-off-by: Andy Whitcroft <email address hidden>

f05b39d... by Andy Whitcroft

ktl/kernel_series: drop redundant bool() operations

Signed-off-by: Andy Whitcroft <email address hidden>

44d8a08... by Andy Whitcroft

ktl/kernel_series: KernelSeriesCache -- switch naming

Switch naming of KernelSeriesCycles to KernelSeriesCache to make it more
obvious what this is used for.

Signed-off-by: Andy Whitcroft <email address hidden>

92f6034... by Andy Whitcroft

ktl/kernel_series: KernelRoutingEntryRoute -- add length

KernelRoutingEntryRoute is a list like object which can be iterated and
indexed. Add __len__ so we can also measure it.

Signed-off-by: Andy Whitcroft <email address hidden>

7f129c9... by Andy Whitcroft

ktl/kernel_series: handle missing cycle data better

Catch HTTP 404 for the kernel-series cycle data and convert that into a
None return from the lookup to allow callers to detect this more simply.

Signed-off-by: Andy Whitcroft <email address hidden>

eefeb52... by Andy Whitcroft

ktl/kernel_series: expand compress support to cycles

We are now maintining kernel-series.json.gz form for all cycle
switch to these by default. Also add a new KERNEL_SERIES_USE
environmental which lets you select from the main sources:

  launchpad: the raw original data as committed to launchpad
  local: local files in the tree
  json: the kernel-series.json.gz data from kernel.ubuntu.com

Signed-off-by: Andy Whitcroft <email address hidden>

626da35... by Andy Whitcroft

ktl/kernel_series: support gzip compressed json formatted data

We use YAML form for kernel-series because it is significantly easier
for a human to parse. It also allows reuse of common data helping to
reduce the size of the data itself. However it turns out that decoding
large YAML files is horribly expensive taking approximatly a full second
on an Thinkpad x250 laptop. Converting the same file mechanically to
JSON form creates a file 2.5x the size, but this still decodes 8x
faster. By compressing this using gzip we more than make up for that
size increase and significantly improve KernelSeries() instantiation
time by close to 5x.

We can maintain the convienience of a simple YAML format file for human
manipulation by maintaining a rapid mirror of the git based YAML file in
a compressed JSON form. By expressing this data in our new PS5
envionment we also gain further benefits from avoiding git.l.n
permissions checks. The only negative is introduction of a short (of
the order of a minute) delay to updates to kernel-series.

Detailed analysis. Taking a snapshot of kernel-series.yaml for
comparison, it is current 285Kb and a raw conversion
(json.dumps(yaml.safe_load())) is 716Kb:

kernel-series.yaml 285585
kernel-series.json 716432

Loading and parsing the kernel-series.yaml takes about 1.17s (averaged
over 10 loads):

1: ./test7.py
     Mean Std.Dev. Min Median Max
real 1.174 0.082 1.095 1.143 1.331
user 1.130 0.067 1.066 1.104 1.262
sys 0.029 0.010 0.012 0.028 0.044

loading and parsing the kernel-series.json take about 0.12s (over 10
runs) dispite being 2.5x times the size due to routing-table duplication
in the conversion:

1: ./test7.py
     Mean Std.Dev. Min Median Max
real 0.123 0.016 0.111 0.116 0.166
user 0.103 0.007 0.094 0.101 0.118
sys 0.016 0.005 0.008 0.016 0.029

Things are obviously worse when this data is remote in the DC, loading
kernel-series.yaml from the DC, costing an average of .22s to download:

1: ./test7.py
     Mean Std.Dev. Min Median Max
real 1.389 0.118 1.307 1.347 1.730
user 1.083 0.047 1.019 1.072 1.183
sys 0.033 0.026 0.012 0.020 0.096

Loading loading kernel-series.json from the DC, costing an average of .36s
to download due to its size:

1: ./test7.py
     Mean Std.Dev. Min Median Max
real 0.478 0.018 0.460 0.474 0.522
user 0.116 0.013 0.102 0.113 0.138
sys 0.019 0.009 0.004 0.021 0.031

It should be noted that the kernel-series data is highly repetative and
as such is highly compressible, compressing to 2% of its original size:

kernel-series.json 716432
kernel-series.json.gz 20635

Loading and decompressing kernel-series.json.gz locally is very close to the
same performance as the uncompressed load, and indeed the difference here is
within the noise of repeated runs:

1: ./test7.py
     Mean Std.Dev. Min Median Max
real 0.131 0.009 0.121 0.128 0.149
user 0.115 0.009 0.096 0.115 0.125
sys 0.015 0.009 0.000 0.014 0.027

Loading this kernel-series.json.gz from the DC is significantly faster,
reducing the remote overhead to 0.16s:

1: ./test7.py
     Mean Std.Dev. Min Median Max
real 0.287 0.015 0.272 0.282 0.327
user 0.129 0.014 0.108 0.127 0.159
sys 0.015 0.006 0.004 0.015 0.025

Finally it should be noted that this comparison was done using a mirror
of the kernel-series.yaml data to allow the times to be compared fairly.
Putting kernel-series.yaml from git.launchpad.net (the current default)
is a further 0.2s slower.

1: ./test7.py
     Mean Std.Dev. Min Median Max
real 1.606 0.162 1.424 1.548 1.896
user 1.056 0.033 1.028 1.040 1.137
sys 0.026 0.004 0.020 0.024 0.036

Signed-off-by: Andy Whitcroft <email address hidden>

d62fc2a... by Andy Whitcroft

ktl/kernel_series: add per-cycle view support

Add support for requesting a KernelSeries object for a specific SRU
cycle. kernel-series.yaml is frozen into kernel-versions for each live
SRU cycle. Add support for selecting a specific cycle from these
copies. Instantiated cycles are cached.

Cycle selection can be made in one of two ways. Firstly you can
directly use the KernelSeries class to request an instantiated cycle
(or spin) specific instance:

    ks = KernelSeries.for_cycle("2023.05.15")
or:
    ks = KernelSeries.for_spin("2023.05.15-4")
or:
    ks = KernelSeries.tip()

Secondly you can instantiate a KernelSeriesCycles() object and perform
the same requests against that. This allows greater control over the
builtin caching which follows the lifecycle of that object.

Signed-off-by: Andy Whitcroft <email address hidden>