Merge ~philroche/simplestreams/+git/simplestreams:feture/filter-items-before-filter-versions into simplestreams:master

Proposed by Philip Roche
Status: Merged
Approved by: Robert C Jennings
Approved revision: 3075f5c857e66cb4732747cbbe576b30055b99ea
Merge reported by: Server Team CI bot
Merged at revision: not available
Proposed branch: ~philroche/simplestreams/+git/simplestreams:feture/filter-items-before-filter-versions
Merge into: simplestreams:master
Diff against target: 37 lines (+17/-3)
1 file modified
simplestreams/mirrors/__init__.py (+17/-3)
Reviewer Review Type Date Requested Status
Server Team CI bot continuous-integration Approve
Robert C Jennings (community) Approve
Tobias Koch (community) Approve
Cody Shepherd (community) Approve
Review via email: mp+369682@code.launchpad.net

Commit message

Apply filters to items before filtering versions

Filters were applied to versions before being applied to
items which meant if you were using --max option then
you often were not getting the expected output as the
version filter was filtering out the expected items.

This happens if there is a version greater than the version
of the item you expect despite the fact that the item
with the greater version never passing your supplied
filters.

Description of the change

Apply filters to items before filtering versions

Filters were applied to versions before being applied to
items which meant if you were using --max option then
you often were not getting the expected output as the
version filter was filtering out the expected items.

This happens if there is a version greater than the version
of the item you expect despite the fact that the item
with the greater version never passing your supplied
filters.

To post a comment you must log in.
Revision history for this message
Philip Roche (philroche) wrote :

An example query where the problem this commit fixes is

```
sstream-query --max=1 -U http://cloud-images.ubuntu.com/releases/streams/v1/com.ubuntu.cloud:released:aws-cn.sjson crsn=cn-north-1 release=xenial arch=amd64 virt=hvm root_store=ssd --output-format="16.04 %(version_name)s %(arch)s %(id)s %(root_store)s %(virt)s"
```

This would return no results despite there being 41 images that satisfy those filters.

Revision history for this message
Cody Shepherd (codyshepherd) :
review: Approve
Revision history for this message
Tobias Koch (tobijk) wrote :

Nice work figuring this out!

review: Approve
Revision history for this message
Scott Moser (smoser) wrote :

I don't see any obvious holes in the implementation, but this was kind of functioning as designed. It seems you'd only hit this if "the version greater than the item you expect" had a subset of the items that you expected.

Why would that be?

Within a given product, I would not have expected that there was any case where a newer version contained a subset of an older version *AND* that older version's content was still desirable.

One solution would be to just fix your published data to have "full" content.

Revision history for this message
Philip Roche (philroche) wrote :

Thank you for your comment.

I take your point about publishing full content but this is not always possible to do at the same time, for example publishing to aws-cn partition is very error prone and can result in serials not being published to all regions. This shouldn't mean that streams start returning incorrect results though.

Eventually full content would be published but this MP fixes the issue of incorrect results being returned in the interim.

As you say, the older versions content is not desirable but we do require that content to be filterable instead of returning no results.

Revision history for this message
Scott Moser (smoser) wrote :

> Thank you for your comment.
>
> I take your point about publishing full content but this is not always
> possible to do at the same time, for example publishing to aws-cn partition is
> very error prone and can result in serials not being published to all regions.
> This shouldn't mean that streams start returning incorrect results though.
>
> Eventually full content would be published but this MP fixes the issue of

> incorrect results being returned in the interim.

Well... you clearly do not have to publish data until it is correct and
complete. I'm not going to argue all that much here, but I think it could
be argued that you're just taking a shortcut in publishing of data that
pushes complexity on to the consumer.

>
> As you say, the older versions content is not desirable but we do require that
> content to be filterable instead of returning no results.

Also... for us pesky humans, it might be nice to dump your json data with
sorted keys. (json.dumps(data, sort_keys=True))

Revision history for this message
Philip Roche (philroche) wrote :

I do agree that all data should be complete but when publication fails in one region for reasons outside of our control it shouldn't block updating of streams, especially if the publication is to address a CVE.

I will add sorting of streams to our process.

Revision history for this message
Robert C Jennings (rcj) wrote :

The changes look good. It should not occur that publication is incomplete but the ability to quickly service CVEs with images as they become available is important.

review: Approve
Revision history for this message
Server Team CI bot (server-team-bot) :
review: Approve (continuous-integration)
Revision history for this message
Server Team CI bot (server-team-bot) wrote :

Commit message lints:
 - Line #3 has 1 too many characters. Line starts with: "if you were using --max"... - Line #6 has 8 too many characters. Line starts with: "This happens if there"... - Line #7 has 18 too many characters. Line starts with: "despite the fact that"...

review: Needs Fixing
Revision history for this message
Philip Roche (philroche) wrote :

Commit message now updated

Revision history for this message
Server Team CI bot (server-team-bot) :
review: Approve (continuous-integration)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
diff --git a/simplestreams/mirrors/__init__.py b/simplestreams/mirrors/__init__.py
index 2a1dc7e..60d5ade 100644
--- a/simplestreams/mirrors/__init__.py
+++ b/simplestreams/mirrors/__init__.py
@@ -280,6 +280,23 @@ class BasicMirrorWriter(MirrorWriter):
280280
281 filtered_products = []281 filtered_products = []
282 prodname = None282 prodname = None
283
284 # Apply filters to items before filtering versions
285 for prodname, product in list(stree.items()):
286
287 for vername, version in list(product.get('versions', {}).items()):
288 for itemname, item in list(version.get('items', {}).items()):
289 pgree = (prodname, vername, itemname)
290 if not self.filter_item(item, src, target, pgree):
291 LOG.debug("Filtered out item: %s/%s", itemname, item)
292 del stree[prodname]['versions'][vername]['items'][
293 itemname]
294 if not stree[prodname]['versions'][vername].get(
295 'items', {}):
296 del stree[prodname]['versions'][vername]
297 if not stree[prodname].get('versions', {}):
298 del stree[prodname]
299
283 for prodname, product in stree.items():300 for prodname, product in stree.items():
284 if not self.filter_product(product, src, target, (prodname,)):301 if not self.filter_product(product, src, target, (prodname,)):
285 filtered_products.append(prodname)302 filtered_products.append(prodname)
@@ -320,9 +337,6 @@ class BasicMirrorWriter(MirrorWriter):
320 added_items = []337 added_items = []
321 for itemname, item in version.get('items', {}).items():338 for itemname, item in version.get('items', {}).items():
322 pgree = (prodname, vername, itemname)339 pgree = (prodname, vername, itemname)
323 if not self.filter_item(item, src, target, pgree):
324 LOG.debug("Filtered out item: %s/%s", itemname, item)
325 continue
326340
327 added_items.append(itemname)341 added_items.append(itemname)
328342

Subscribers

People subscribed via source and target branches