Merge ~philroche/simplestreams/+git/simplestreams:feture/filter-items-before-filter-versions into simplestreams:master

Proposed by Philip Roche
Status: Merged
Approved by: Robert C Jennings
Approved revision: 3075f5c857e66cb4732747cbbe576b30055b99ea
Merge reported by: Server Team CI bot
Merged at revision: not available
Proposed branch: ~philroche/simplestreams/+git/simplestreams:feture/filter-items-before-filter-versions
Merge into: simplestreams:master
Diff against target: 37 lines (+17/-3)
1 file modified
simplestreams/mirrors/__init__.py (+17/-3)
Reviewer Review Type Date Requested Status
Server Team CI bot continuous-integration Approve
Robert C Jennings (community) Approve
Tobias Koch (community) Approve
Cody Shepherd (community) Approve
Review via email: mp+369682@code.launchpad.net

Commit message

Apply filters to items before filtering versions

Filters were applied to versions before being applied to
items which meant if you were using --max option then
you often were not getting the expected output as the
version filter was filtering out the expected items.

This happens if there is a version greater than the version
of the item you expect despite the fact that the item
with the greater version never passing your supplied
filters.

Description of the change

Apply filters to items before filtering versions

Filters were applied to versions before being applied to
items which meant if you were using --max option then
you often were not getting the expected output as the
version filter was filtering out the expected items.

This happens if there is a version greater than the version
of the item you expect despite the fact that the item
with the greater version never passing your supplied
filters.

To post a comment you must log in.
Revision history for this message
Philip Roche (philroche) wrote :

An example query where the problem this commit fixes is

```
sstream-query --max=1 -U http://cloud-images.ubuntu.com/releases/streams/v1/com.ubuntu.cloud:released:aws-cn.sjson crsn=cn-north-1 release=xenial arch=amd64 virt=hvm root_store=ssd --output-format="16.04 %(version_name)s %(arch)s %(id)s %(root_store)s %(virt)s"
```

This would return no results despite there being 41 images that satisfy those filters.

Revision history for this message
Cody Shepherd (codyshepherd) :
review: Approve
Revision history for this message
Tobias Koch (tobijk) wrote :

Nice work figuring this out!

review: Approve
Revision history for this message
Scott Moser (smoser) wrote :

I don't see any obvious holes in the implementation, but this was kind of functioning as designed. It seems you'd only hit this if "the version greater than the item you expect" had a subset of the items that you expected.

Why would that be?

Within a given product, I would not have expected that there was any case where a newer version contained a subset of an older version *AND* that older version's content was still desirable.

One solution would be to just fix your published data to have "full" content.

Revision history for this message
Philip Roche (philroche) wrote :

Thank you for your comment.

I take your point about publishing full content but this is not always possible to do at the same time, for example publishing to aws-cn partition is very error prone and can result in serials not being published to all regions. This shouldn't mean that streams start returning incorrect results though.

Eventually full content would be published but this MP fixes the issue of incorrect results being returned in the interim.

As you say, the older versions content is not desirable but we do require that content to be filterable instead of returning no results.

Revision history for this message
Scott Moser (smoser) wrote :

> Thank you for your comment.
>
> I take your point about publishing full content but this is not always
> possible to do at the same time, for example publishing to aws-cn partition is
> very error prone and can result in serials not being published to all regions.
> This shouldn't mean that streams start returning incorrect results though.
>
> Eventually full content would be published but this MP fixes the issue of

> incorrect results being returned in the interim.

Well... you clearly do not have to publish data until it is correct and
complete. I'm not going to argue all that much here, but I think it could
be argued that you're just taking a shortcut in publishing of data that
pushes complexity on to the consumer.

>
> As you say, the older versions content is not desirable but we do require that
> content to be filterable instead of returning no results.

Also... for us pesky humans, it might be nice to dump your json data with
sorted keys. (json.dumps(data, sort_keys=True))

Revision history for this message
Philip Roche (philroche) wrote :

I do agree that all data should be complete but when publication fails in one region for reasons outside of our control it shouldn't block updating of streams, especially if the publication is to address a CVE.

I will add sorting of streams to our process.

Revision history for this message
Robert C Jennings (rcj) wrote :

The changes look good. It should not occur that publication is incomplete but the ability to quickly service CVEs with images as they become available is important.

review: Approve
Revision history for this message
Server Team CI bot (server-team-bot) :
review: Approve (continuous-integration)
Revision history for this message
Server Team CI bot (server-team-bot) wrote :

Commit message lints:
 - Line #3 has 1 too many characters. Line starts with: "if you were using --max"... - Line #6 has 8 too many characters. Line starts with: "This happens if there"... - Line #7 has 18 too many characters. Line starts with: "despite the fact that"...

review: Needs Fixing
Revision history for this message
Philip Roche (philroche) wrote :

Commit message now updated

Revision history for this message
Server Team CI bot (server-team-bot) :
review: Approve (continuous-integration)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/simplestreams/mirrors/__init__.py b/simplestreams/mirrors/__init__.py
2index 2a1dc7e..60d5ade 100644
3--- a/simplestreams/mirrors/__init__.py
4+++ b/simplestreams/mirrors/__init__.py
5@@ -280,6 +280,23 @@ class BasicMirrorWriter(MirrorWriter):
6
7 filtered_products = []
8 prodname = None
9+
10+ # Apply filters to items before filtering versions
11+ for prodname, product in list(stree.items()):
12+
13+ for vername, version in list(product.get('versions', {}).items()):
14+ for itemname, item in list(version.get('items', {}).items()):
15+ pgree = (prodname, vername, itemname)
16+ if not self.filter_item(item, src, target, pgree):
17+ LOG.debug("Filtered out item: %s/%s", itemname, item)
18+ del stree[prodname]['versions'][vername]['items'][
19+ itemname]
20+ if not stree[prodname]['versions'][vername].get(
21+ 'items', {}):
22+ del stree[prodname]['versions'][vername]
23+ if not stree[prodname].get('versions', {}):
24+ del stree[prodname]
25+
26 for prodname, product in stree.items():
27 if not self.filter_product(product, src, target, (prodname,)):
28 filtered_products.append(prodname)
29@@ -320,9 +337,6 @@ class BasicMirrorWriter(MirrorWriter):
30 added_items = []
31 for itemname, item in version.get('items', {}).items():
32 pgree = (prodname, vername, itemname)
33- if not self.filter_item(item, src, target, pgree):
34- LOG.debug("Filtered out item: %s/%s", itemname, item)
35- continue
36
37 added_items.append(itemname)
38

Subscribers

People subscribed via source and target branches