Merge lp:~jcsackett/launchpad/dont-expose-domains into lp:launchpad

Proposed by j.c.sackett
Status: Merged
Approved by: j.c.sackett
Approved revision: no longer in the source branch.
Merged at revision: 13114
Proposed branch: lp:~jcsackett/launchpad/dont-expose-domains
Merge into: lp:launchpad
Diff against target: 206 lines (+75/-103)
3 files modified
lib/lp/registry/doc/nickname.txt (+0/-96)
lib/lp/registry/model/person.py (+1/-7)
lib/lp/registry/tests/test_nickname.py (+74/-0)
To merge this branch: bzr merge lp:~jcsackett/launchpad/dont-expose-domains
Reviewer Review Type Date Requested Status
Benji York (community) code Approve
Review via email: mp+62023@code.launchpad.net

Commit message

[r=benji][bug=228355] Removes the use of email domain information in the creation of nicks in generate_nick

Description of the change

Summary
=======
Even though users can set lp to not show their email address, if their id has been automatically generated the email can still essentially be divulged b/c it's used to create the idea. (e.g. <email address hidden> can become bob-gmail if bob is already taken or bob-gmail-com).

This code removes the domain bit from the nick generation, as there is already a random suffix generation piece as a final fall back. If bob is already taken as a username, than the next bob@somedomain becomes bob-c or similar.

Preimplementation
=================
Spoke with Curtis Hovey

Implementation
==============
lib/lp/registry/model/person.py
-------------------------------
Removed the portion of generate_nick that used the email domain to create nicks.

lib/lp/registry/tests/test_nickname.py
lib/lp/registry/doc/nickname.txt
--------------------------------
Removed the doc tests and replaced it with unittests, which are more appropriate.

Tests
=====
bin/test -t test_nickname

QA
==
Create a user on qastaging with a an email that could collide with an existing user (e.g. <email address hidden>). The domain should not be part of the generated nick.

Lint
====

= Launchpad lint =

Checking for conflicts and issues in changed files.

Linting changed files:
  lib/lp/registry/model/person.py
  lib/lp/registry/tests/test_nickname.py

To post a comment you must log in.
Revision history for this message
Benji York (benji) wrote :

The conversion from doctest to unittest looks like it could use some
more fleshing out.

Two of the scenarios from the doctest that look important aren't
represented in the unittests:

    >>> generate_nick("<email address hidden>")
    'foo-bar'
    >>> generate_nick("<email address hidden>")
    'foo+bar'

The second half of the doctest (starting with "It should be nearly
impossible to..." and going through the testing of
NicknameGenerationError) looks important too and those tests aren't
represented in the unittests.

Here are some smaller things that occurred to me during the review:

The regex in lib/lp/registry/model/person.py doesn't need the second set
of capturing parens now that we're not using the domain name.

These two lines from the end of test_can_create_noncolliding_nicknames
appear to be copypasta (and therefore can be removed):

    self._useNicknames(['bar-c'])
    self.assertEqual('bar-c', nick)

It would be nice if test_enforces_minimum_length asserts that "i" isn't
registered at test start to ensure that we're really seeing
nick-too-short behavior and not nick-already-taken behavior.

Revision history for this message
j.c.sackett (jcsackett) wrote :

> The conversion from doctest to unittest looks like it could use some
> more fleshing out.
>
> Two of the scenarios from the doctest that look important aren't
> represented in the unittests:
>
> >>> generate_nick("<email address hidden>")
> 'foo-bar'
> >>> generate_nick("<email address hidden>")
> 'foo+bar'

Fair, I've updated the symbol handling test to show several versions.

> The second half of the doctest (starting with "It should be nearly
> impossible to..." and going through the testing of
> NicknameGenerationError) looks important too and those tests aren't
> represented in the unittests.

They're not anymore; that part tests the use of email domains as part of what's being used for nick generation to the point where random prefixes and suffixes start being used. Now we skip straight to the randomized stuff, and that's tested in the non-colliding part.

> Here are some smaller things that occurred to me during the review:
>
> The regex in lib/lp/registry/model/person.py doesn't need the second set
> of capturing parens now that we're not using the domain name.

Correct, I've updated the second group to noncapturing.

> These two lines from the end of test_can_create_noncolliding_nicknames
> appear to be copypasta (and therefore can be removed):
>
> self._useNicknames(['bar-c'])
> self.assertEqual('bar-c', nick)

Actually, that was supposed to be a test to show what happened if it got another bar@ email address after one collision. It's been updated.

> It would be nice if test_enforces_minimum_length asserts that "i" isn't
> registered at test start to ensure that we're really seeing
> nick-too-short behavior and not nick-already-taken behavior.

Done.

Revision history for this message
Benji York (benji) wrote :

Thanks for the fixes and for fixing my misconceptions. Some parting
thoughts:

The bit verifying that "i" name hasn't been used could use a comment as
to what's going on. Similarly for the second half of
test_can_create_noncolliding_nicknames where you're verifying that
prefixes are added.

Also, I think the last line of _useNicknames is commented out code that
you meant to remove.

review: Approve (code)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== removed file 'lib/lp/registry/doc/nickname.txt'
2--- lib/lp/registry/doc/nickname.txt 2010-10-09 16:36:22 +0000
3+++ lib/lp/registry/doc/nickname.txt 1970-01-01 00:00:00 +0000
4@@ -1,96 +0,0 @@
5-= Nickname =
6-
7-For each Person we create a unique nickname that must be generated using
8-generate_nick.
9-
10-A valid nick can contain lower case letters, dashes, and numbers,
11-must start with a letter or a number, and must be a minimum of
12-four characters.
13-
14- >>> from lp.registry.model.person import generate_nick
15- >>> generate_nick("foop@example@com")
16- Traceback (most recent call last):
17- ...
18- NicknameGenerationError: foop@example@com is not a valid email address
19- >>> generate_nick("foop@example.com")
20- 'foop'
21- >>> generate_nick("bar@example.com")
22- 'bar'
23- >>> generate_nick("spam@example.com")
24- 'spam'
25- >>> generate_nick("foo.bar@canonical.com")
26- 'foo-bar'
27- >>> generate_nick("foo+bar@example.com")
28- 'foo+bar'
29- >>> generate_nick("--foo@example.com")
30- 'foo'
31-
32-We already have a salgado, so generate_nick needs to try a little harder.
33-
34- >>> generate_nick("salgado@async.co.br")
35- 'salgado-async'
36-
37-And it needs to maintain a minimum length
38-
39- >>> generate_nick("i@tv")
40- 'i-tv'
41-
42- >>> _counter = 0
43- >>> from zope.security.proxy import removeSecurityProxy
44- >>> def new_person(email):
45- ... from lp.registry.interfaces.person import IPersonSet
46- ... from canonical.launchpad.interfaces.emailaddress import IEmailAddressSet
47- ... from lp.registry.interfaces.person import (
48- ... PersonCreationRationale)
49- ... UNKNOWN = PersonCreationRationale.UNKNOWN
50- ... person_set = getUtility(IPersonSet)
51- ... person, ignored = person_set.createPersonAndEmail(
52- ... email, UNKNOWN)
53- ... # Switch the email to avoid conflicts when generating clashing names
54- ... email_address_set = getUtility(IEmailAddressSet)
55- ... global _counter
56- ... for email_address in email_address_set.getByPerson(person):
57- ... removeSecurityProxy(email_address).email = (
58- ... 'nodupe%d@example.com' % _counter)
59- ... _counter += 1
60- ... flush_database_updates()
61- ... assert email_address_set.getByEmail(email) is None, 'Oops'
62- ... return person
63-
64-The email address is used to generate nicknames. If domain is used to
65-attempt to avoid clashes.
66-
67- >>> new_person('bongo@example.com').name
68- u'bongo'
69- >>> new_person('bongo@example.com').name
70- u'bongo-example'
71- >>> new_person('bongo@example.com').name
72- u'bongo-example-com'
73-
74-It should be nearly impossible to make the method fail even in the
75-extreme cases where using all components of the domain name fails.
76-If nicknames still clash, random suffixes are tried:
77-
78- >>> new_person('bongo@example.com').name
79- u'bongo-example-com-c'
80-
81-Random prefixes are tried:
82-
83- >>> new_person('bongo@example.com').name
84- u'q-bongo-example-com'
85-
86-And random mutations are tried:
87-
88- >>> new_person('bongo@example.com').name
89- u'bongo-examale-com'
90-
91-The only way it can fail is if some idiot admin goes and registers a
92-'match-everything' pattern in the blacklist:
93-
94- >>> from canonical.database.sqlbase import cursor
95- >>> cur = cursor()
96- >>> cur.execute("INSERT INTO NameBlacklist (regexp) VALUES ('.')")
97- >>> generate_nick("foo@example.com")
98- Traceback (most recent call last):
99- ...
100- NicknameGenerationError: No nickname could be generated...
101
102=== modified file 'lib/lp/registry/model/person.py'
103--- lib/lp/registry/model/person.py 2011-05-18 03:36:29 +0000
104+++ lib/lp/registry/model/person.py 2011-05-24 13:43:18 +0000
105@@ -4541,9 +4541,8 @@
106 raise NicknameGenerationError("%s is not a valid email address"
107 % email_addr)
108
109- user, domain = re.match("^(\S+)@(\S+)$", email_addr).groups()
110+ user = re.match("^(\S+)@(?:\S+)$", email_addr).groups()[0]
111 user = user.replace(".", "-").replace("_", "-")
112- domain_parts = domain.split(".")
113
114 person_set = PersonSet()
115
116@@ -4561,11 +4560,6 @@
117 if _valid_nick(generated_nick):
118 return generated_nick
119
120- for domain_part in domain_parts:
121- generated_nick = sanitize_name(generated_nick + "-" + domain_part)
122- if _valid_nick(generated_nick):
123- return generated_nick
124-
125 # We seed the random number generator so we get consistent results,
126 # making the algorithm repeatable and thus testable.
127 random_state = random.getstate()
128
129=== added file 'lib/lp/registry/tests/test_nickname.py'
130--- lib/lp/registry/tests/test_nickname.py 1970-01-01 00:00:00 +0000
131+++ lib/lp/registry/tests/test_nickname.py 2011-05-24 13:43:18 +0000
132@@ -0,0 +1,74 @@
133+# Copyright 2011 Canonical Ltd. This software is licensed under the
134+# GNU Affero General Public License version 3 (see the file LICENSE).
135+
136+"""Tests for nickname generation"""
137+
138+__metaclass__ = type
139+
140+from zope.component import getUtility
141+
142+from canonical.testing.layers import DatabaseFunctionalLayer
143+from lp.registry.interfaces.person import IPersonSet
144+from lp.registry.model.person import (
145+ generate_nick,
146+ NicknameGenerationError,
147+ )
148+from lp.testing import TestCaseWithFactory
149+
150+
151+class TestNicknameGeneration(TestCaseWithFactory):
152+
153+ layer = DatabaseFunctionalLayer
154+
155+ def test_rejects_invalid_emails(self):
156+ # generate_nick rejects invalid email addresses
157+ self.assertRaises(
158+ NicknameGenerationError,
159+ generate_nick,
160+ 'foo@example@com')
161+
162+ def test_uses_email_address(self):
163+ # generate_nick uses the first part of the email address to create
164+ # the nick.
165+ nick = generate_nick('bar@example.com')
166+ self.assertEqual('bar', nick)
167+
168+ def test_handles_symbols(self):
169+ # If an email starts with symbols, generate_nick still creates a
170+ # valid nick that doesn't start with symbols.
171+ nicks = [generate_nick(email) for email in [
172+ '---bar@example.com',
173+ 'foo.bar@example.com',
174+ 'foo-bar@example.com',
175+ 'foo+bar@example.com',
176+ ]]
177+ self.assertEqual(
178+ ['bar', 'foo-bar', 'foo-bar', 'foo+bar'],
179+ nicks)
180+
181+ def test_enforces_minimum_length(self):
182+ # Nicks must be a minimum length. generate_nick enforces this by
183+ # adding random suffixes to the required length.
184+ # The nick 'i' isn't used, so we know any additional prefi
185+ person = getUtility(IPersonSet).getByName('i')
186+ self.assertIs(None, person)
187+ nick = generate_nick('i@example.com')
188+ self.assertEqual('i-5', nick)
189+
190+ def test_can_create_noncolliding_nicknames(self):
191+ # Given the same email address, generate_nick doesn't recreate the
192+ # same nick once that nick is used.
193+ self._useNicknames(['bar'])
194+ nick = generate_nick('bar@example.com')
195+ self.assertEqual('bar-c', nick)
196+
197+ # If we used the previously created nick and get another bar@ email
198+ # address, another new nick is generated.
199+ self._useNicknames(['bar-c'])
200+ nick = generate_nick('bar@example.com')
201+ self.assertEqual('a-bar', nick)
202+
203+ def _useNicknames(self, nicknames):
204+ # Helper method to consume a nickname
205+ for nick in nicknames:
206+ self.factory.makePerson(name=nick)