Glance

Code review comment for lp:~rackspace-titan/glance/api-pagination-limiting

api-pagination-limiting
Merge into trunk

Revision history for this message

Jay Pipes (jaypipes) wrote on 2011-05-19:

On Tue, May 17, 2011 at 8:42 PM, Brian Waldon
<email address hidden> wrote:
> According to the last versioning-related discussion we had, these changes were going to be part of v1.0. We haven't cut an official v1.0 release yet, and the limiting changes are definitely a major version change. I say we let the filtering and pagination branches in then call it v1.0. Any new feature (sets?) bump the version.

Cool, yep, that sounds fine.

>> 2) I'd prefer to set the default limit to < 1000. I'm thinking 25 or 50 would
>> be a more appropriate default limit. Also, please make this a constant
>> somewhere, not just a number in the method signature...
>
> I meant to do that before I pushed the branch. Now it's a constant in registry/server.py, set to 25

Cheers.

>> 3) Unless I'm mistaken these functions:
>>
>> 50 +def limit_collection(items, marker, limit):
>> 243 + def _limit_collection(self, req, images):
>>
>> in glance.common.wsgi and glance.registry.server, respectively, can be
>> removed. You've pushed (correctly) the limiting and offset down into the
>> registry db query.
>
> Good catch. Removed them.

Cool.

>> 4) This:
>>
>> 114 if filters != None:
>> 115 - action = "/images?%s" % urllib.urlencode(filters)
>> 116 + params = filters
>> 117 else:
>> 118 - action = "/images"
>> 119 + params = {}
>>
>> Might be better as:
>>
>> params = filters or {}
>
> Yep. Did this in a bunch of places.

Thanks.

>> 5)
>>
>> 184 + order_by(models.Image.id)
>>
>> I think I would prefer:
>>
>> order_by(desc(models.Image.created_at))
>
> Done. I kept the secondary order_by id for predictability in the behavior of marker b/c created_at is not unique.

Hmm, so you actually don't need created_at to be unique if you replace
the concept of marker with that of offset. See more below...

> Shoot for custom ordering in a future milestone (for v1.1)?

Cool with me :)

>> 6)
>>
>> 192 + if marker != None:
>> 193 + query = query.filter(models.Image.id > marker)
>>
>> Should be:
>>
>> if marker != None:
>> query = query.offset(marker)
>>
>> That way, SQLAlchemy will pass the marker as-is to the query as a LIMIT X
>> OFFSET Y. This allows more than just the Image.id column to be used for
>> ordering, and will allow you to add custom ordering to the API ;)
>
> Actually, the term "offset" means something different than "marker". Offset is an index in a list of images, while marker is the actual id of an image in that list. I did have to change the code to support the new created_at sort, but I kept the original functionality.

I understand that marker is different from offset. I was recommending
that instead of using the concept of a marker, that you use the
concept of an offset. A marker isn't particularly useful unless you
have do ordering with a unique column. In addition, using a marker
requires you to do the pagination in code, whereas using an offset
allows you to do the pagination in the SQL query itself, which will be
much more efficient than in Python.

The code I demonstrated above would send the offset to the SQL query
and allow you to do pagination in the query instead of this:

398 + images = sorted(images, key=lambda i: i['created_at'])
399 + images.reverse()
400 +
401 + if marker == None:
402 + start_index = 0
403 + else:
404 + start_index = -1
405 + for i, image in enumerate(images):
406 + if image['id'] == marker:
407 + start_index = i + 1
408 + break
409 +
410 + return images[start_index:start_index + limit]

So, I'm still going to stick with my recommendation of using an offset
versus a marker :)

-jay

On Tue, May 17, 2011 at 8:42 PM, Brian Waldon
<brian.waldon@rackspace.com> wrote:
> According to the last versioning-related discussion we had, these changes were going to be part of v1.0. We haven't cut an official v1.0 release yet, and the limiting changes are definitely a major version change. I say we let the filtering and pagination branches in then call it v1.0. Any new feature (sets?) bump the version.

Cool, yep, that sounds fine.

Cheers.

>> 3) Unless I'm mistaken these functions:
>>
>> 50      +def limit_collection(items, marker, limit):
>> 243     + def _limit_collection(self, req, images):
>>
>> in glance.common.wsgi and glance.registry.server, respectively, can be
>> removed. You've pushed (correctly) the limiting and offset down into the
>> registry db query.
>
> Good catch. Removed them.

Cool.

>> 4) This:
>>
>> 114     if filters != None:
>> 115     - action = "/images?%s" % urllib.urlencode(filters)
>> 116     + params = filters
>> 117     else:
>> 118     - action = "/images"
>> 119     + params = {}
>>
>> Might be better as:
>>
>> params = filters or {}
>
> Yep. Did this in a bunch of places.

Thanks.

>> 5)
>>
>> 184     + order_by(models.Image.id)
>>
>> I think I would prefer:
>>
>> order_by(desc(models.Image.created_at))
>
> Done. I kept the secondary order_by id for predictability in the behavior of marker b/c created_at is not unique.

Hmm, so you actually don't need created_at to be unique if you replace
the concept of marker with that of offset. See more below...

> Shoot for custom ordering in a future milestone (for v1.1)?

Cool with me :)

>> 6)
>>
>> 192     + if marker != None:
>> 193     + query = query.filter(models.Image.id > marker)
>>
>> Should be:
>>
>> if marker != None:
>>     query = query.offset(marker)
>>
>> That way, SQLAlchemy will pass the marker as-is to the query as a LIMIT X
>> OFFSET Y. This allows more than just the Image.id column to be used for
>> ordering, and will allow you to add custom ordering to the API ;)
>
> Actually, the term "offset" means something different than "marker". Offset is an index in a list of images, while marker is the actual id of an image in that list. I did have to change the code to support the new created_at sort, but I kept the original functionality.

The code I demonstrated above would send the offset to the SQL query
and allow you to do pagination in the query instead of this:

398	+ images = sorted(images, key=lambda i: i['created_at'])
399	+ images.reverse()
400	+
401	+ if marker == None:
402	+ start_index = 0
403	+ else:
404	+ start_index = -1
405	+ for i, image in enumerate(images):
406	+ if image['id'] == marker:
407	+ start_index = i + 1
408	+ break
409	+
410	+ return images[start_index:start_index + limit]

So, I'm still going to stick with my recommendation of using an offset
versus a marker :)

-jay

« Back to merge proposal