Merge lp:~jcsackett/launchpad/dont-expose-domains into lp:launchpad

Proposed by j.c.sackett
Status: Merged
Approved by: j.c.sackett
Approved revision: no longer in the source branch.
Merged at revision: 13114
Proposed branch: lp:~jcsackett/launchpad/dont-expose-domains
Merge into: lp:launchpad
Diff against target: 206 lines (+75/-103)
3 files modified
lib/lp/registry/doc/nickname.txt (+0/-96)
lib/lp/registry/model/person.py (+1/-7)
lib/lp/registry/tests/test_nickname.py (+74/-0)
To merge this branch: bzr merge lp:~jcsackett/launchpad/dont-expose-domains
Reviewer Review Type Date Requested Status
Benji York (community) code Approve
Review via email: mp+62023@code.launchpad.net

Commit message

[r=benji][bug=228355] Removes the use of email domain information in the creation of nicks in generate_nick

Description of the change

Summary
=======
Even though users can set lp to not show their email address, if their id has been automatically generated the email can still essentially be divulged b/c it's used to create the idea. (e.g. <email address hidden> can become bob-gmail if bob is already taken or bob-gmail-com).

This code removes the domain bit from the nick generation, as there is already a random suffix generation piece as a final fall back. If bob is already taken as a username, than the next bob@somedomain becomes bob-c or similar.

Preimplementation
=================
Spoke with Curtis Hovey

Implementation
==============
lib/lp/registry/model/person.py
-------------------------------
Removed the portion of generate_nick that used the email domain to create nicks.

lib/lp/registry/tests/test_nickname.py
lib/lp/registry/doc/nickname.txt
--------------------------------
Removed the doc tests and replaced it with unittests, which are more appropriate.

Tests
=====
bin/test -t test_nickname

QA
==
Create a user on qastaging with a an email that could collide with an existing user (e.g. <email address hidden>). The domain should not be part of the generated nick.

Lint
====

= Launchpad lint =

Checking for conflicts and issues in changed files.

Linting changed files:
  lib/lp/registry/model/person.py
  lib/lp/registry/tests/test_nickname.py

To post a comment you must log in.
Revision history for this message
Benji York (benji) wrote :

The conversion from doctest to unittest looks like it could use some
more fleshing out.

Two of the scenarios from the doctest that look important aren't
represented in the unittests:

    >>> generate_nick("<email address hidden>")
    'foo-bar'
    >>> generate_nick("<email address hidden>")
    'foo+bar'

The second half of the doctest (starting with "It should be nearly
impossible to..." and going through the testing of
NicknameGenerationError) looks important too and those tests aren't
represented in the unittests.

Here are some smaller things that occurred to me during the review:

The regex in lib/lp/registry/model/person.py doesn't need the second set
of capturing parens now that we're not using the domain name.

These two lines from the end of test_can_create_noncolliding_nicknames
appear to be copypasta (and therefore can be removed):

    self._useNicknames(['bar-c'])
    self.assertEqual('bar-c', nick)

It would be nice if test_enforces_minimum_length asserts that "i" isn't
registered at test start to ensure that we're really seeing
nick-too-short behavior and not nick-already-taken behavior.

Revision history for this message
j.c.sackett (jcsackett) wrote :

> The conversion from doctest to unittest looks like it could use some
> more fleshing out.
>
> Two of the scenarios from the doctest that look important aren't
> represented in the unittests:
>
> >>> generate_nick("<email address hidden>")
> 'foo-bar'
> >>> generate_nick("<email address hidden>")
> 'foo+bar'

Fair, I've updated the symbol handling test to show several versions.

> The second half of the doctest (starting with "It should be nearly
> impossible to..." and going through the testing of
> NicknameGenerationError) looks important too and those tests aren't
> represented in the unittests.

They're not anymore; that part tests the use of email domains as part of what's being used for nick generation to the point where random prefixes and suffixes start being used. Now we skip straight to the randomized stuff, and that's tested in the non-colliding part.

> Here are some smaller things that occurred to me during the review:
>
> The regex in lib/lp/registry/model/person.py doesn't need the second set
> of capturing parens now that we're not using the domain name.

Correct, I've updated the second group to noncapturing.

> These two lines from the end of test_can_create_noncolliding_nicknames
> appear to be copypasta (and therefore can be removed):
>
> self._useNicknames(['bar-c'])
> self.assertEqual('bar-c', nick)

Actually, that was supposed to be a test to show what happened if it got another bar@ email address after one collision. It's been updated.

> It would be nice if test_enforces_minimum_length asserts that "i" isn't
> registered at test start to ensure that we're really seeing
> nick-too-short behavior and not nick-already-taken behavior.

Done.

Revision history for this message
Benji York (benji) wrote :

Thanks for the fixes and for fixing my misconceptions. Some parting
thoughts:

The bit verifying that "i" name hasn't been used could use a comment as
to what's going on. Similarly for the second half of
test_can_create_noncolliding_nicknames where you're verifying that
prefixes are added.

Also, I think the last line of _useNicknames is commented out code that
you meant to remove.

review: Approve (code)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== removed file 'lib/lp/registry/doc/nickname.txt'
--- lib/lp/registry/doc/nickname.txt 2010-10-09 16:36:22 +0000
+++ lib/lp/registry/doc/nickname.txt 1970-01-01 00:00:00 +0000
@@ -1,96 +0,0 @@
1= Nickname =
2
3For each Person we create a unique nickname that must be generated using
4generate_nick.
5
6A valid nick can contain lower case letters, dashes, and numbers,
7must start with a letter or a number, and must be a minimum of
8four characters.
9
10 >>> from lp.registry.model.person import generate_nick
11 >>> generate_nick("foop@example@com")
12 Traceback (most recent call last):
13 ...
14 NicknameGenerationError: foop@example@com is not a valid email address
15 >>> generate_nick("foop@example.com")
16 'foop'
17 >>> generate_nick("bar@example.com")
18 'bar'
19 >>> generate_nick("spam@example.com")
20 'spam'
21 >>> generate_nick("foo.bar@canonical.com")
22 'foo-bar'
23 >>> generate_nick("foo+bar@example.com")
24 'foo+bar'
25 >>> generate_nick("--foo@example.com")
26 'foo'
27
28We already have a salgado, so generate_nick needs to try a little harder.
29
30 >>> generate_nick("salgado@async.co.br")
31 'salgado-async'
32
33And it needs to maintain a minimum length
34
35 >>> generate_nick("i@tv")
36 'i-tv'
37
38 >>> _counter = 0
39 >>> from zope.security.proxy import removeSecurityProxy
40 >>> def new_person(email):
41 ... from lp.registry.interfaces.person import IPersonSet
42 ... from canonical.launchpad.interfaces.emailaddress import IEmailAddressSet
43 ... from lp.registry.interfaces.person import (
44 ... PersonCreationRationale)
45 ... UNKNOWN = PersonCreationRationale.UNKNOWN
46 ... person_set = getUtility(IPersonSet)
47 ... person, ignored = person_set.createPersonAndEmail(
48 ... email, UNKNOWN)
49 ... # Switch the email to avoid conflicts when generating clashing names
50 ... email_address_set = getUtility(IEmailAddressSet)
51 ... global _counter
52 ... for email_address in email_address_set.getByPerson(person):
53 ... removeSecurityProxy(email_address).email = (
54 ... 'nodupe%d@example.com' % _counter)
55 ... _counter += 1
56 ... flush_database_updates()
57 ... assert email_address_set.getByEmail(email) is None, 'Oops'
58 ... return person
59
60The email address is used to generate nicknames. If domain is used to
61attempt to avoid clashes.
62
63 >>> new_person('bongo@example.com').name
64 u'bongo'
65 >>> new_person('bongo@example.com').name
66 u'bongo-example'
67 >>> new_person('bongo@example.com').name
68 u'bongo-example-com'
69
70It should be nearly impossible to make the method fail even in the
71extreme cases where using all components of the domain name fails.
72If nicknames still clash, random suffixes are tried:
73
74 >>> new_person('bongo@example.com').name
75 u'bongo-example-com-c'
76
77Random prefixes are tried:
78
79 >>> new_person('bongo@example.com').name
80 u'q-bongo-example-com'
81
82And random mutations are tried:
83
84 >>> new_person('bongo@example.com').name
85 u'bongo-examale-com'
86
87The only way it can fail is if some idiot admin goes and registers a
88'match-everything' pattern in the blacklist:
89
90 >>> from canonical.database.sqlbase import cursor
91 >>> cur = cursor()
92 >>> cur.execute("INSERT INTO NameBlacklist (regexp) VALUES ('.')")
93 >>> generate_nick("foo@example.com")
94 Traceback (most recent call last):
95 ...
96 NicknameGenerationError: No nickname could be generated...
970
=== modified file 'lib/lp/registry/model/person.py'
--- lib/lp/registry/model/person.py 2011-05-18 03:36:29 +0000
+++ lib/lp/registry/model/person.py 2011-05-24 13:43:18 +0000
@@ -4541,9 +4541,8 @@
4541 raise NicknameGenerationError("%s is not a valid email address"4541 raise NicknameGenerationError("%s is not a valid email address"
4542 % email_addr)4542 % email_addr)
45434543
4544 user, domain = re.match("^(\S+)@(\S+)$", email_addr).groups()4544 user = re.match("^(\S+)@(?:\S+)$", email_addr).groups()[0]
4545 user = user.replace(".", "-").replace("_", "-")4545 user = user.replace(".", "-").replace("_", "-")
4546 domain_parts = domain.split(".")
45474546
4548 person_set = PersonSet()4547 person_set = PersonSet()
45494548
@@ -4561,11 +4560,6 @@
4561 if _valid_nick(generated_nick):4560 if _valid_nick(generated_nick):
4562 return generated_nick4561 return generated_nick
45634562
4564 for domain_part in domain_parts:
4565 generated_nick = sanitize_name(generated_nick + "-" + domain_part)
4566 if _valid_nick(generated_nick):
4567 return generated_nick
4568
4569 # We seed the random number generator so we get consistent results,4563 # We seed the random number generator so we get consistent results,
4570 # making the algorithm repeatable and thus testable.4564 # making the algorithm repeatable and thus testable.
4571 random_state = random.getstate()4565 random_state = random.getstate()
45724566
=== added file 'lib/lp/registry/tests/test_nickname.py'
--- lib/lp/registry/tests/test_nickname.py 1970-01-01 00:00:00 +0000
+++ lib/lp/registry/tests/test_nickname.py 2011-05-24 13:43:18 +0000
@@ -0,0 +1,74 @@
1# Copyright 2011 Canonical Ltd. This software is licensed under the
2# GNU Affero General Public License version 3 (see the file LICENSE).
3
4"""Tests for nickname generation"""
5
6__metaclass__ = type
7
8from zope.component import getUtility
9
10from canonical.testing.layers import DatabaseFunctionalLayer
11from lp.registry.interfaces.person import IPersonSet
12from lp.registry.model.person import (
13 generate_nick,
14 NicknameGenerationError,
15 )
16from lp.testing import TestCaseWithFactory
17
18
19class TestNicknameGeneration(TestCaseWithFactory):
20
21 layer = DatabaseFunctionalLayer
22
23 def test_rejects_invalid_emails(self):
24 # generate_nick rejects invalid email addresses
25 self.assertRaises(
26 NicknameGenerationError,
27 generate_nick,
28 'foo@example@com')
29
30 def test_uses_email_address(self):
31 # generate_nick uses the first part of the email address to create
32 # the nick.
33 nick = generate_nick('bar@example.com')
34 self.assertEqual('bar', nick)
35
36 def test_handles_symbols(self):
37 # If an email starts with symbols, generate_nick still creates a
38 # valid nick that doesn't start with symbols.
39 nicks = [generate_nick(email) for email in [
40 '---bar@example.com',
41 'foo.bar@example.com',
42 'foo-bar@example.com',
43 'foo+bar@example.com',
44 ]]
45 self.assertEqual(
46 ['bar', 'foo-bar', 'foo-bar', 'foo+bar'],
47 nicks)
48
49 def test_enforces_minimum_length(self):
50 # Nicks must be a minimum length. generate_nick enforces this by
51 # adding random suffixes to the required length.
52 # The nick 'i' isn't used, so we know any additional prefi
53 person = getUtility(IPersonSet).getByName('i')
54 self.assertIs(None, person)
55 nick = generate_nick('i@example.com')
56 self.assertEqual('i-5', nick)
57
58 def test_can_create_noncolliding_nicknames(self):
59 # Given the same email address, generate_nick doesn't recreate the
60 # same nick once that nick is used.
61 self._useNicknames(['bar'])
62 nick = generate_nick('bar@example.com')
63 self.assertEqual('bar-c', nick)
64
65 # If we used the previously created nick and get another bar@ email
66 # address, another new nick is generated.
67 self._useNicknames(['bar-c'])
68 nick = generate_nick('bar@example.com')
69 self.assertEqual('a-bar', nick)
70
71 def _useNicknames(self, nicknames):
72 # Helper method to consume a nickname
73 for nick in nicknames:
74 self.factory.makePerson(name=nick)