MAAS

Code review comment for lp:~rvb/maas/nonce-bug-1190986

nonce-bug-1190986
Merge into trunk

Revision history for this message

Gavin Panella (allenap) wrote on 2013-10-08:

> > [1]
> >
> > + create_checkpoint_nonce()
> > + # Delete old nonces.
> > + checkpoint = find_checkpoint_nonce()
> > + if checkpoint is None:
> > + return 0
> > + return delete_old_nonces(checkpoint)
> >
> > Seems odd to create the new nonce before deleting; there is a very
> > slow race condition there.
>
> Where do you see the race condition exactly?

In cleanup_old_nonces():

+ create_checkpoint_nonce()
+ # Delete old nonces.
+ checkpoint = find_checkpoint_nonce()

If there's a delay between create_checkpoint_nonce() and
find_checkpoint_nonce() of more than timestamp_threshold - which I
grant is *extremely* unlikely - then it'll delete the nonce just
created.

However, it doesn't matter anyway: by then it'll be time to delete the
new nonce! So, it's fine to leave it as it is.

>
> > [2]
> >
> > +def delete_old_nonces(checkpoint):
> > + """Delete nonces older than the given nonce."""
> > + nonce_to_delete = Nonce.objects.filter(id__lte=checkpoint.id)
> > + count = nonce_to_delete.count()
> > + nonce_to_delete.delete()
> > + return count
> >
> > Do we need the count?
>
> Well, we could do without, but it's pretty cheap, and I think it's pretty
> useful to give the caller some feedback about the actual clean-up (i.e. did it
> really happen or not); especially considering the two-phase strategy.

The count is not too bad as long as indexes are being used -
http://wiki.postgresql.org/wiki/Slow_Counting - but note the caveat:

  PostgreSQL will still need to read the resulting rows to verify that
  they exist; other database systems may only need to reference the
  index in this situation.

I guess it's going to be visiting those rows again very soon to delete
them, so they'll be hot and the delete will be quicker, but that might
be a false assumption. When regularly deleting nonces it probably
won't be a significant overhead, but the first run of this code might
be punishing.

PostgreSQL does return a count of deleted rows, eliminating the need
for a separate count, which might be a way to mitigate this. I wonder
if Django surfaces that in some way?

See http://www.postgresql.org/docs/9.1/interactive/sql-delete.html.

Another thought: key__endswith probably creates a LIKE '%SUFFIX'
clause, which can be very inefficient. I suggest talking to Jeroen or
Stuart Bishop about this. Can this code be changed to use a prefix
match instead? Can you check that an index on Nonce.key exists, and
that it supports prefix (or suffix) matches?

> > [1]
> >
> > +    create_checkpoint_nonce()
> > +    # Delete old nonces.
> > +    checkpoint = find_checkpoint_nonce()
> > +    if checkpoint is None:
> > +        return 0
> > +    return delete_old_nonces(checkpoint)
> >
> > Seems odd to create the new nonce before deleting; there is a very
> > slow race condition there.
>
> Where do you see the race condition exactly?

In cleanup_old_nonces():

+    create_checkpoint_nonce()
+    # Delete old nonces.
+    checkpoint = find_checkpoint_nonce()

If there's a delay between create_checkpoint_nonce() and
find_checkpoint_nonce() of more than timestamp_threshold - which I
grant is *extremely* unlikely - then it'll delete the nonce just
created.

However, it doesn't matter anyway: by then it'll be time to delete the
new nonce! So, it's fine to leave it as it is.

>
> > [2]
> >
> > +def delete_old_nonces(checkpoint):
> > +    """Delete nonces older than the given nonce."""
> > +    nonce_to_delete = Nonce.objects.filter(id__lte=checkpoint.id)
> > +    count = nonce_to_delete.count()
> > +    nonce_to_delete.delete()
> > +    return count
> >
> > Do we need the count?
>
> Well, we could do without, but it's pretty cheap, and I think it's pretty
> useful to give the caller some feedback about the actual clean-up (i.e. did it
> really happen or not); especially considering the two-phase strategy.

The count is not too bad as long as indexes are being used -
http://wiki.postgresql.org/wiki/Slow_Counting - but note the caveat:

PostgreSQL will still need to read the resulting rows to verify that
  they exist; other database systems may only need to reference the
  index in this situation.

PostgreSQL does return a count of deleted rows, eliminating the need
for a separate count, which might be a way to mitigate this. I wonder
if Django surfaces that in some way?

See http://www.postgresql.org/docs/9.1/interactive/sql-delete.html.

« Back to merge proposal