Merge lp:~rvb/maas/nonce-bug-1190986 into lp:~maas-committers/maas/trunk

Proposed by Raphaël Badin
Status: Merged
Approved by: Raphaël Badin
Approved revision: no longer in the source branch.
Merged at revision: 1697
Proposed branch: lp:~rvb/maas/nonce-bug-1190986
Merge into: lp:~maas-committers/maas/trunk
Diff against target: 222 lines (+213/-0)
2 files modified
src/maasserver/nonces_cleanup.py (+94/-0)
src/maasserver/tests/test_nonces_cleanup.py (+119/-0)
To merge this branch: bzr merge lp:~rvb/maas/nonce-bug-1190986
Reviewer Review Type Date Requested Status
Gavin Panella (community) Approve
Review via email: mp+189790@code.launchpad.net

Commit message

Add utility to remove old nonces.

Description of the change

Note that this is *just* the utility itself (function cleanup_old_nonces). Now this needs to be "croned" using either a cronjob or — preferably — a celery beat tasks + an API call.

To post a comment you must log in.
Revision history for this message
Gavin Panella (allenap) wrote :

Looks good, and easy to understand!

[1]

+    create_checkpoint_nonce()
+    # Delete old nonces.
+    checkpoint = find_checkpoint_nonce()
+    if checkpoint is None:
+        return 0
+    return delete_old_nonces(checkpoint)

Seems odd to create the new nonce before deleting; there is a very
slow race condition there.

[2]

+def delete_old_nonces(checkpoint):
+    """Delete nonces older than the given nonce."""
+    nonce_to_delete = Nonce.objects.filter(id__lte=checkpoint.id)
+    count = nonce_to_delete.count()
+    nonce_to_delete.delete()
+    return count

Do we need the count?

[3]

+from __future__ import (
+    absolute_import,
+    print_function,
+    unicode_literals,
+    )
+
+__metaclass__ = type
+__all__ = []

Please merge trunk and update this to the newest template; there's a
'str = None' missing from there.

[4]

+        self.patch(module_time, "time").return_value = (
+            now - timestamp_threshold)

I'd hesitate from patching time.time() - it may be used by nose before
the patching machinery is invoked to repair its work. How about:

        from maasserver import nonces_cleanup

        timemod = self.patch(nonces_cleanup, "time")
        timemod.time.return_value = now - timestamp_threshold
        ...
        timemod.time.return_value = now

?

[5]

+        self.assertLessEqual(before, float(checkpoint.key.strip(key_suffix)))
+        self.assertGreaterEqual(
+            after, float(checkpoint.key.strip(key_suffix)))

strip() doesn't work like that; key_suffix will be used as
*characters* to strip. In this instance it works, but perhaps a
different mechanism ought to be used, e.g.:

        checkpoint_time = float(re.search(r'[\d.]+', checkpoint.key).group(0))

Also, with testtools you can do the above check in one go:

        self.assertThat(checkpoint_time, MatchesAll(
            MatchesAny(Equals(before), GreaterThan(before)),
            MatchesAny(Equals(after), LessThan(after))))

Though it's not really any clearer.

review: Approve
Revision history for this message
Raphaël Badin (rvb) wrote :

Thanks for the review!
>
>
> [1]
>
> +    create_checkpoint_nonce()
> +    # Delete old nonces.
> +    checkpoint = find_checkpoint_nonce()
> +    if checkpoint is None:
> +        return 0
> +    return delete_old_nonces(checkpoint)
>
> Seems odd to create the new nonce before deleting; there is a very
> slow race condition there.

Where do you see the race condition exactly?

> [2]
>
> +def delete_old_nonces(checkpoint):
> +    """Delete nonces older than the given nonce."""
> +    nonce_to_delete = Nonce.objects.filter(id__lte=checkpoint.id)
> +    count = nonce_to_delete.count()
> +    nonce_to_delete.delete()
> +    return count
>
> Do we need the count?

Well, we could do without, but it's pretty cheap, and I think it's pretty useful to give the caller some feedback about the actual clean-up (i.e. did it really happen or not); especially considering the two-phase strategy.

> [3]
>
> +from __future__ import (
> +    absolute_import,
> +    print_function,
> +    unicode_literals,
> +    )
> +
> +__metaclass__ = type
> +__all__ = []
>
> Please merge trunk and update this to the newest template; there's a
> 'str = None' missing from there.

All right, done.

> [4]
>
> +        self.patch(module_time, "time").return_value = (
> +            now - timestamp_threshold)
>
> I'd hesitate from patching time.time() - it may be used by nose before
> the patching machinery is invoked to repair its work. How about:
>
>        from maasserver import nonces_cleanup
>
>        timemod = self.patch(nonces_cleanup, "time")
>        timemod.time.return_value = now - timestamp_threshold
>        ...
>        timemod.time.return_value = now

Indeed, it's a tad cleaner, done.

> [5]
>
> +        self.assertLessEqual(before, float(checkpoint.key.strip(key_suffix)))
> +        self.assertGreaterEqual(
> +            after, float(checkpoint.key.strip(key_suffix)))
>
> strip() doesn't work like that; key_suffix will be used as
> *characters* to strip. In this instance it works, but perhaps a
> different mechanism ought to be used, e.g.:
>
>        checkpoint_time = float(re.search(r'[\d.]+', checkpoint.key).group(0))
>
> Also, with testtools you can do the above check in one go:
>
>        self.assertThat(checkpoint_time, MatchesAll(
>            MatchesAny(Equals(before), GreaterThan(before)),
>            MatchesAny(Equals(after), LessThan(after))))
>
> Though it's not really any clearer.

I've refactored that code a bit, instead of the heavy artillery, I've used "checkpoint_time = checkpoint.key[:-len(key_suffix)]".

Revision history for this message
Gavin Panella (allenap) wrote :

> > [1]
> >
> > +    create_checkpoint_nonce()
> > +    # Delete old nonces.
> > +    checkpoint = find_checkpoint_nonce()
> > +    if checkpoint is None:
> > +        return 0
> > +    return delete_old_nonces(checkpoint)
> >
> > Seems odd to create the new nonce before deleting; there is a very
> > slow race condition there.
>
> Where do you see the race condition exactly?

In cleanup_old_nonces():

+    create_checkpoint_nonce()
+    # Delete old nonces.
+    checkpoint = find_checkpoint_nonce()

If there's a delay between create_checkpoint_nonce() and
find_checkpoint_nonce() of more than timestamp_threshold - which I
grant is *extremely* unlikely - then it'll delete the nonce just
created.

However, it doesn't matter anyway: by then it'll be time to delete the
new nonce! So, it's fine to leave it as it is.

>
> > [2]
> >
> > +def delete_old_nonces(checkpoint):
> > +    """Delete nonces older than the given nonce."""
> > +    nonce_to_delete = Nonce.objects.filter(id__lte=checkpoint.id)
> > +    count = nonce_to_delete.count()
> > +    nonce_to_delete.delete()
> > +    return count
> >
> > Do we need the count?
>
> Well, we could do without, but it's pretty cheap, and I think it's pretty
> useful to give the caller some feedback about the actual clean-up (i.e. did it
> really happen or not); especially considering the two-phase strategy.

The count is not too bad as long as indexes are being used -
http://wiki.postgresql.org/wiki/Slow_Counting - but note the caveat:

  PostgreSQL will still need to read the resulting rows to verify that
  they exist; other database systems may only need to reference the
  index in this situation.

I guess it's going to be visiting those rows again very soon to delete
them, so they'll be hot and the delete will be quicker, but that might
be a false assumption. When regularly deleting nonces it probably
won't be a significant overhead, but the first run of this code might
be punishing.

PostgreSQL does return a count of deleted rows, eliminating the need
for a separate count, which might be a way to mitigate this. I wonder
if Django surfaces that in some way?

See http://www.postgresql.org/docs/9.1/interactive/sql-delete.html.

Another thought: key__endswith probably creates a LIKE '%SUFFIX'
clause, which can be very inefficient. I suggest talking to Jeroen or
Stuart Bishop about this. Can this code be changed to use a prefix
match instead? Can you check that an index on Nonce.key exists, and
that it supports prefix (or suffix) matches?

Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

> PostgreSQL does return a count of deleted rows, eliminating the need
> for a separate count, which might be a way to mitigate this. I wonder
> if Django surfaces that in some way?

Even if it doesn't, psycopg2 does. It's cursor.rowcount, right after the deletion.

.

> Another thought: key__endswith probably creates a LIKE '%SUFFIX'
> clause, which can be very inefficient. I suggest talking to Jeroen or
> Stuart Bishop about this. Can this code be changed to use a prefix
> match instead? Can you check that an index on Nonce.key exists, and
> that it supports prefix (or suffix) matches?

You can create a function index on the result of "LIKE '%SUFFIX'". But I do wonder: after the initial cleanup, how many nonces can there ever be? Maybe it's worth dealing with the initial cleanup as a separate situation, where we can cut corners and/or just accept a delay.

Revision history for this message
Raphaël Badin (rvb) wrote :

> > > [1]
> > >
> > > +    create_checkpoint_nonce()
> > > +    # Delete old nonces.
> > > +    checkpoint = find_checkpoint_nonce()
> > > +    if checkpoint is None:
> > > +        return 0
> > > +    return delete_old_nonces(checkpoint)
> > >
> > > Seems odd to create the new nonce before deleting; there is a very
> > > slow race condition there.
> >
> > Where do you see the race condition exactly?
>
> In cleanup_old_nonces():
>
> +    create_checkpoint_nonce()
> +    # Delete old nonces.
> +    checkpoint = find_checkpoint_nonce()
>
> If there's a delay between create_checkpoint_nonce() and
> find_checkpoint_nonce() of more than timestamp_threshold - which I
> grant is *extremely* unlikely - then it'll delete the nonce just
> created.
>
> However, it doesn't matter anyway: by then it'll be time to delete the
> new nonce! So, it's fine to leave it as it is.

That's what I had in mind, we can call that a race, because technically, this is a race but it has no bad consequence whatsoever. Plus it's highly unlikely.

> PostgreSQL does return a count of deleted rows, eliminating the need
> for a separate count, which might be a way to mitigate this. I wonder
> if Django surfaces that in some way?

I don't think Django exposes that. There is a still-open tick for it in Django bugtracker: https://code.djangoproject.com/ticket/16891.

I tried to see if I could find the 'rowcount' in Django's guts after the query has been performed but I can't so far…

> Another thought: key__endswith probably creates a LIKE '%SUFFIX'
> clause, which can be very inefficient. I suggest talking to Jeroen or
> Stuart Bishop about this. Can this code be changed to use a prefix
> match instead? Can you check that an index on Nonce.key exists, and
> that it supports prefix (or suffix) matches?

I changed that to using prefixes.

But I'm not really concerned about the efficiency here considering that only the *second* run can be a bit inefficient.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== added file 'src/maasserver/nonces_cleanup.py'
2--- src/maasserver/nonces_cleanup.py 1970-01-01 00:00:00 +0000
3+++ src/maasserver/nonces_cleanup.py 2013-10-10 07:40:34 +0000
4@@ -0,0 +1,94 @@
5+# Copyright 2013 Canonical Ltd. This software is licensed under the
6+# GNU Affero General Public License version 3 (see the file LICENSE).
7+
8+"""Nonces cleanup utilities."""
9+
10+from __future__ import (
11+ absolute_import,
12+ print_function,
13+ unicode_literals,
14+ )
15+
16+str = None
17+
18+__metaclass__ = type
19+__all__ = [
20+ 'cleanup_old_nonces',
21+ ]
22+
23+
24+import time
25+
26+from oauth.oauth import OAuthServer
27+from piston.models import Nonce
28+
29+
30+timestamp_threshold = OAuthServer.timestamp_threshold
31+
32+
33+def cleanup_old_nonces():
34+ """Clean up old nonces.
35+
36+ This method does two things:
37+ - it creates a "checkpoint" nonce which stores its creation
38+ time.
39+ - it removes the expired nonces by using existing "checkpoint"
40+ nonces to figure out which nonces can be safely deleted.
41+
42+ Given the two-step nature of this method, it needs to be called
43+ at least twice, the two calls being separated by 5 minutes, to
44+ actually be able to delete old nonces.
45+ The typical usage is to call this at regular intervals in a
46+ cron-like fashion.
47+ """
48+ create_checkpoint_nonce()
49+ # Delete old nonces.
50+ checkpoint = find_checkpoint_nonce()
51+ if checkpoint is None:
52+ return 0
53+ return delete_old_nonces(checkpoint)
54+
55+
56+def create_checkpoint_nonce():
57+ """Create a "checkpoint" nonce.
58+
59+ A "checkpoint" nonce is a Nonce with empty strings in
60+ 'token_key' and 'consumer_key' and its creation time in the
61+ 'key' field.
62+ """
63+ now = time.time()
64+ Nonce.objects.get_or_create(
65+ token_key='', consumer_key='', key=get_time_string(now))
66+
67+# Key prefix used when creating checkpoint nonces to avoid clashing with
68+# real-world nonce created by piston.
69+key_prefix = "CHECKPOINT#"
70+
71+
72+def get_time_string(time_value):
73+ """Convert a time value, as returned by time.time() into a string."""
74+ return "%s%f" % (key_prefix, time_value)
75+
76+
77+def delete_old_nonces(checkpoint):
78+ """Delete nonces older than the given nonce."""
79+ nonce_to_delete = Nonce.objects.filter(id__lte=checkpoint.id)
80+ count = nonce_to_delete.count()
81+ nonce_to_delete.delete()
82+ return count
83+
84+
85+def find_checkpoint_nonce():
86+ """Return the most recent "checkpoint" nonce.
87+
88+ The returned nonce will be older than 'timestamp_threshold'.
89+ Returns None if such a nonce does not exist.
90+ """
91+ time_limit = get_time_string(time.time() - timestamp_threshold)
92+ nonces = Nonce.objects.filter(
93+ token_key='', consumer_key='',
94+ key__lte=time_limit, key__startswith=key_prefix)
95+ try:
96+ return nonces.latest('id')
97+ except Nonce.DoesNotExist:
98+ return None
99
100=== added file 'src/maasserver/tests/test_nonces_cleanup.py'
101--- src/maasserver/tests/test_nonces_cleanup.py 1970-01-01 00:00:00 +0000
102+++ src/maasserver/tests/test_nonces_cleanup.py 2013-10-10 07:40:34 +0000
103@@ -0,0 +1,119 @@
104+# Copyright 2013 Canonical Ltd. This software is licensed under the
105+# GNU Affero General Public License version 3 (see the file LICENSE).
106+
107+"""Tests for the nonces cleanup module."""
108+
109+from __future__ import (
110+ absolute_import,
111+ print_function,
112+ unicode_literals,
113+ )
114+
115+str = None
116+
117+__metaclass__ = type
118+__all__ = []
119+
120+
121+import time
122+
123+from maasserver.nonces_cleanup import (
124+ cleanup_old_nonces,
125+ create_checkpoint_nonce,
126+ delete_old_nonces,
127+ find_checkpoint_nonce,
128+ get_time_string,
129+ key_prefix,
130+ time as module_time,
131+ timestamp_threshold,
132+ )
133+from maasserver.testing.testcase import MAASServerTestCase
134+from maastesting.matchers import ContainsAll
135+from piston.models import Nonce
136+from testtools.matchers import StartsWith
137+
138+
139+class TestCleanupOldNonces(MAASServerTestCase):
140+
141+ def test_cleanup_old_nonces_returns_0_if_no_checkpoint(self):
142+ self.assertEquals(0, cleanup_old_nonces())
143+
144+ def test_cleanup_old_nonces_cleans_up_old_nonces(self):
145+ now = time.time()
146+ # Patch the module's time module so that the nonces appear as if
147+ # they were created now - timestamp_threshold seconds ago.
148+ timemod = self.patch(module_time, "time")
149+ timemod.return_value = now - timestamp_threshold
150+ old_nonces = [Nonce.objects.create() for i in range(3)]
151+ self.assertEquals(0, cleanup_old_nonces())
152+ # Patch the module's time module back.
153+ timemod.return_value = now
154+ new_nonces = [Nonce.objects.create() for i in range(3)]
155+
156+ cleanup_count = cleanup_old_nonces()
157+
158+ # The old nonces plus the checkpoint nonce are deleted.
159+ self.assertEquals(len(old_nonces) + 1, cleanup_count)
160+ self.assertThat(Nonce.objects.all(), ContainsAll(new_nonces))
161+ self.assertEqual(len(new_nonces) + 1, Nonce.objects.all().count())
162+
163+
164+class TestUtilities(MAASServerTestCase):
165+
166+ def test_create_checkpoint_nonce_creates_checkpoint_nonce(self):
167+ before = time.time()
168+ create_checkpoint_nonce()
169+ checkpoint = Nonce.objects.get(token_key='', consumer_key='')
170+ after = time.time()
171+ checkpoint_time = checkpoint.key[len(key_prefix):]
172+ self.assertLessEqual(before, float(checkpoint_time))
173+ self.assertGreaterEqual(after, float(checkpoint_time))
174+
175+ def test_create_checkpoint_nonce_gets_checkpoint_if_exists(self):
176+ now = time.time()
177+ self.patch(module_time, "time").return_value = now
178+ create_checkpoint_nonce()
179+ nonce1 = Nonce.objects.filter(
180+ token_key='', consumer_key='').latest('id')
181+ create_checkpoint_nonce()
182+ nonce2 = Nonce.objects.filter(
183+ token_key='', consumer_key='').latest('id')
184+ self.assertEqual(nonce1.id, nonce2.id)
185+
186+ def test_delete_old_nonces_delete_nonces(self):
187+ # Create old nonces.
188+ [Nonce.objects.create() for i in range(3)]
189+ checkpoint = Nonce.objects.create()
190+ new_nonces = [Nonce.objects.create() for i in range(3)]
191+ delete_old_nonces(checkpoint)
192+ self.assertItemsEqual(new_nonces, Nonce.objects.all())
193+
194+ def test_find_checkpoint_nonce_returns_None_if_no_checkpoint(self):
195+ self.assertIsNone(find_checkpoint_nonce())
196+
197+ def test_find_checkpoint_nonce_returns_most_recent_checkpoint(self):
198+ now = time.time()
199+ self.patch(module_time, "time").return_value = now
200+ # Create a "checkpoint" nonce created timestamp_threshold + 5
201+ # seconds ago.
202+ Nonce.objects.create(
203+ token_key='', consumer_key='',
204+ key=get_time_string(now - 5 - timestamp_threshold))
205+ # Create a "checkpoint" nonce created timestamp_threshold
206+ # seconds ago.
207+ checkpoint = Nonce.objects.create(
208+ token_key='', consumer_key='',
209+ key=get_time_string(now - timestamp_threshold))
210+ # Create a "checkpoint" nonce created 1 second ago.
211+ Nonce.objects.create(
212+ token_key='', consumer_key='', key=get_time_string(now - 1))
213+
214+ self.assertEqual(checkpoint, find_checkpoint_nonce())
215+
216+ def test_get_time_string_returns_comparable_string(self):
217+ now = time.time()
218+ self.assertGreater(get_time_string(now + 1), get_time_string(now))
219+
220+ def test_get_time_string_ends_with_suffix(self):
221+ now = time.time()
222+ self.assertThat(get_time_string(now), StartsWith(key_prefix))