Merge lp:~vila/uci-engine/more-unique into lp:uci-engine

Proposed by Vincent Ladeuil
Status: Merged
Approved by: Vincent Ladeuil
Approved revision: 866
Merged at revision: 874
Proposed branch: lp:~vila/uci-engine/more-unique
Merge into: lp:uci-engine
Diff against target: 94 lines (+9/-9)
7 files modified
britney_proxy/britney/process_requests.py (+1/-1)
image-builder/imagebuilder/cloud_image.py (+1/-1)
lander/lander/tests/test_service_wrapper.py (+3/-3)
lander/lander/workflow.py (+1/-1)
test_runner/bin/check_worker.py (+1/-1)
test_runner/run-integration.py (+1/-1)
tests/test_ppacreator.py (+1/-1)
To merge this branch: bzr merge lp:~vila/uci-engine/more-unique
Reviewer Review Type Date Requested Status
Joe Talbott (community) Approve
Celso Providelo (community) Approve
PS Jenkins bot (community) continuous-integration Approve
Review via email: mp+240173@code.launchpad.net

Commit message

Switch to uuid4 since uuid1 have been seen colliding in real deployments

Description of the change

uuid1() returned 1c8fb398-5f63-11e4-a0d4-fa163ef04d7 for two different workers:

./ci-airline-tr-rabbit-worker-0-upstart.log:3733:2014-10-29 12:00:01,423 INFO on_message: {"progress_trigger": "health-check-test-runner-1c8fb398-5f63-11e4-a0d4-fa163ef04d7b", "series": "trusty", "ppa_list": [], "package_list": ["libpng"], "image_id": "uci/cloudimg/trusty-amd64.img", "architecture": "amd64", "ticket_id": "test-runner-test"}

./ci-airline-tr-rabbit-worker-17-cron.log:13:2014-10-29 12:00:01,371 INFO STATUS: {u'progress_trigger': u'health-check-test-runner-1c8fb398-5f63-11e4-a0d4-fa163ef04d7b', u'series': u'trusty', u'ppa_list': [], u'package_list': [u'libpng'], u'image_id': u'uci/cloudimg/trusty-amd64.img', u'architecture': u'amd64', u'ticket_id': u'test-runner-test'}

Unlucky ? Can happen only once in a lifetime ?

No.

I've got plenty of occurrences (76) that happened while swift was acting (not sure it's *required* to reproduce though).

Can someone more db/south knowledgeable than me make a followup MP for:

=== modified file 'ticket_system/ticket/migrations/0005_creating_uuids.py'
--- ticket_system/ticket/migrations/0005_creating_uuids.py 2014-07-23 23:14:02 +0000
+++ ticket_system/ticket/migrations/0005_creating_uuids.py 2014-10-30 18:55:23 +0000
@@ -10,7 +10,7 @@
         "Fill 'uuid' on existing tickets."
         import uuid
         for ticket in orm['ticket.Ticket'].objects.all():
- ticket.uuid = str(uuid.uuid1())
+ ticket.uuid = str(uuid.uuid4())
             ticket.save()

     def backwards(self, orm):

We haven't (yet) used several instances for that code so we're safe for all existing tickets I think.

To post a comment you must log in.
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :

PASSED: Continuous integration, rev:864
http://s-jenkins.ubuntu-ci:8080/job/uci-engine-ci/1651/
Executed test runs:

Click here to trigger a rebuild:
http://s-jenkins.ubuntu-ci:8080/job/uci-engine-ci/1651/rebuild

review: Approve (continuous-integration)
Revision history for this message
Celso Providelo (cprov) wrote :
Download full text (3.9 KiB)

Well, that's unexpected since all bootstack instances seems to have
unique MACs [1] and RTC are synced [2]

UUIDv1 are composed from the generating unit MAC and a .1 ms precision
timestamp [3], there is a theoretic very narrow window for collision
if the MACs are unique, there must be something else going on.

In a pseudo-random scenario (VMs w/o any true-random HW) I'd say there
is more chance for system-wide collision using UUIDv4 than v1.

I don't really see collisions happening in the TS/DB domain, even if
the ticket populating rate was as high as .1ms(it's far from it),
postgres enforces uniqueness, so the ticket creation would fail. From
that time on, we can assume the TS and the GK would be always operated
with unique UUIDs or fail normally with 404. Moreover, soon we will
need to order/recover-timestamp from UUID-named swift containers and
that is only possible with v1.

So, IMO, TS/DB/GK don't have to be patched, since they do not have
problems with v1, then you can safe yourself from the south migration
embarrassment (never edit existing migrations, it default the whole
purpose of having them).

That restrict us to the rabbit/queues environment, where you recently
found that timestamp collision. In that case, presumably, UUIDv1 would
colide as well if the generation happens in the same machine within a
.1 ms interval (quite plausible these days). I think that in this
context, and only this, v4 could actually solve your problem.

I am happy to help you investigate it further and find an appropriate
solution for this.

[1] juju run --all ifconfig | grep HWaddr
[2] juju run --all date
[3] http://en.wikipedia.org/wiki/Universally_unique_identifier#Version_1_.28MAC_address_.26_date-time.29

On Thu, Oct 30, 2014 at 8:00 PM, Vincent Ladeuil <email address hidden> wrote:
> The proposal to merge lp:~vila/uci-engine/more-unique into lp:uci-engine has been updated.
>
> Description changed to:
>
> uuid1() returned 1c8fb398-5f63-11e4-a0d4-fa163ef04d7 for two different workers:
>
> ./ci-airline-tr-rabbit-worker-0-upstart.log:3733:2014-10-29 12:00:01,423 INFO on_message: {"progress_trigger": "health-check-test-runner-1c8fb398-5f63-11e4-a0d4-fa163ef04d7b", "series": "trusty", "ppa_list": [], "package_list": ["libpng"], "image_id": "uci/cloudimg/trusty-amd64.img", "architecture": "amd64", "ticket_id": "test-runner-test"}
>
> ./ci-airline-tr-rabbit-worker-17-cron.log:13:2014-10-29 12:00:01,371 INFO STATUS: {u'progress_trigger': u'health-check-test-runner-1c8fb398-5f63-11e4-a0d4-fa163ef04d7b', u'series': u'trusty', u'ppa_list': [], u'package_list': [u'libpng'], u'image_id': u'uci/cloudimg/trusty-amd64.img', u'architecture': u'amd64', u'ticket_id': u'test-runner-test'}
>
> Unlucky ? Can happen only once in a lifetime ?
>
> No.
>
> I've got plenty of occurrences (76) that happened while swift was acting (not sure it's *required* to reproduce though).
>
> Can someone more db/south knowledgeable than me make a followup MP for:
>
> === modified file 'ticket_system/ticket/migrations/0005_creating_uuids.py'
> --- ticket_system/ticket/migrations/0005_creating_uuids.py 2014-07-23 23:14:02 +0000
> +++ ticket_system/ticket/migrations/0005_creating_uuids.py ...

Read more...

Revision history for this message
Vincent Ladeuil (vila) wrote :
Download full text (5.9 KiB)

> Well, that's unexpected since all bootstack instances seems to have
> unique MACs [1]

I've checked that before proposing this MP with 'juju run --service
ci-airline-tr-rabbit-worker "python -c 'import uuid; print
uuid.uuid1().get_node()'"' (edited for readability):

- MachineId: "4" Stdout: 274973446818410 UnitId: ci-airline-tr-rabbit-worker/0
- MachineId: "5" Stdout: 274973443101009 UnitId: ci-airline-tr-rabbit-worker/1
- MachineId: "6" Stdout: 274973449401742 UnitId: ci-airline-tr-rabbit-worker/2
- MachineId: "7" Stdout: 274973436426391 UnitId: ci-airline-tr-rabbit-worker/3
- MachineId: "8" Stdout: 274973440939877 UnitId: ci-airline-tr-rabbit-worker/4
- MachineId: "9" Stdout: 274973446307917 UnitId: ci-airline-tr-rabbit-worker/5
- MachineId: "10" Stdout: 274973450245143 UnitId: ci-airline-tr-rabbit-worker/6
- MachineId: "11" Stdout: 274973446513838 UnitId: ci-airline-tr-rabbit-worker/7
- MachineId: "12" Stdout: 274973437559701 UnitId: ci-airline-tr-rabbit-worker/8
- MachineId: "13" Stdout: 274973448825124 UnitId: ci-airline-tr-rabbit-worker/9
- MachineId: "14" Stdout: 274973436691689 UnitId: ci-airline-tr-rabbit-worker/10
- MachineId: "15" Stdout: 274973452322400 UnitId: ci-airline-tr-rabbit-worker/11
- MachineId: "16" Stdout: 274973444235950 UnitId: ci-airline-tr-rabbit-worker/12
- MachineId: "17" Stdout: 274973449976901 UnitId: ci-airline-tr-rabbit-worker/13
- MachineId: "18" Stdout: 274973447132163 UnitId: ci-airline-tr-rabbit-worker/14
- MachineId: "19" Stdout: 274973440732073 UnitId: ci-airline-tr-rabbit-worker/15
- MachineId: "20" Stdout: 274973446679760 UnitId: ci-airline-tr-rabbit-worker/16
- MachineId: "21" Stdout: 274973442815065 UnitId: ci-airline-tr-rabbit-worker/17
- MachineId: "22" Stdout: 274973443908206 UnitId: ci-airline-tr-rabbit-worker/18
- MachineId: "23" Stdout: 274973442948691 UnitId: ci-airline-tr-rabbit-worker/19

Ok, host IDs are unique among all the workers.

> and RTC are synced [2]

Well, 'juju run --all date' can only reveal big differences but the
timestamps in the logs are more precise anyway and they do tell that the
clocks are roughly synced (no significant divergence).

So despite wikipedia, your intuition and mine about how this is supposed to
work, real life say it doesn't work that way.

>
> UUIDv1 are composed from the generating unit MAC and a .1 ms precision
> timestamp [3], there is a theoretic very narrow window for collision
> if the MACs are unique, there must be something else going on.

Yup, what can be that something else then ?

Looking at the code:

def uuid1(node=None, clock_seq=None):
    """Generate a UUID from a host ID, sequence number, and the current time.
    If 'node' is not given, getnode() is used to obtain the hardware
    address. If 'clock_seq' is given, it is used as the sequence number;
    otherwise a random 14-bit sequence number is chosen."""

getnode() is used they say but:

    if _uuid_generate_time and node is clock_seq is None:
        _buffer = ctypes.create_string_buffer(16)
        _uuid_generate_time(_buffer)
        return UUID(bytes=_buffer.raw)

No get_node() there and:

    if node is None:
        node = getnode()
    return UUID(fields=(time_low, t...

Read more...

lp:~vila/uci-engine/more-unique updated
865. By Vincent Ladeuil

Revert to uuid1 for gatekeeper.

866. By Vincent Ladeuil

Revert to uuid1 for ticket system.

Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :

PASSED: Continuous integration, rev:866
http://s-jenkins.ubuntu-ci:8080/job/uci-engine-ci/1657/
Executed test runs:

Click here to trigger a rebuild:
http://s-jenkins.ubuntu-ci:8080/job/uci-engine-ci/1657/rebuild

review: Approve (continuous-integration)
Revision history for this message
Celso Providelo (cprov) :
review: Approve
Revision history for this message
Joe Talbott (joetalbott) :
review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'britney_proxy/britney/process_requests.py'
2--- britney_proxy/britney/process_requests.py 2014-10-29 15:06:59 +0000
3+++ britney_proxy/britney/process_requests.py 2014-10-31 14:27:18 +0000
4@@ -64,7 +64,7 @@
5 # debug purposes).
6 output_queue = params.get('output_queue', self.output_queue)
7 report_queue = params.get('report_queue', None)
8- params = dict(ticket_id=str(uuid.uuid1()),
9+ params = dict(ticket_id=str(uuid.uuid4()),
10 progress_trigger=output_queue,
11 series=series,
12 architecture=architecture,
13
14=== modified file 'image-builder/imagebuilder/cloud_image.py'
15--- image-builder/imagebuilder/cloud_image.py 2014-10-07 10:04:01 +0000
16+++ image-builder/imagebuilder/cloud_image.py 2014-10-31 14:27:18 +0000
17@@ -422,7 +422,7 @@
18 status_cb('Shrinking the converted image...')
19 runcmd('qemu-img convert -O qcow2 -c %s %s' % (
20 image_path, converted_image_path))
21- upload_filename = str(uuid.uuid1()) + '.img'
22+ upload_filename = str(uuid.uuid4()) + '.img'
23 status_cb(
24 'Uploading converted image to glance as %s' % upload_filename)
25 glance.image_create(upload_filename, converted_image_path)
26
27=== modified file 'lander/lander/tests/test_service_wrapper.py'
28--- lander/lander/tests/test_service_wrapper.py 2014-10-27 19:03:16 +0000
29+++ lander/lander/tests/test_service_wrapper.py 2014-10-31 14:27:18 +0000
30@@ -38,9 +38,9 @@
31 patch = mock.patch('ci_utils.amqp_utils.get_config')
32 patch.start()
33 self.addCleanup(patch.stop)
34- patch_uuid1 = mock.patch('uuid.uuid1', return_value='1234567890')
35- patch_uuid1.start()
36- self.addCleanup(patch_uuid1.stop)
37+ patch_uuid4 = mock.patch('uuid.uuid4', return_value='1234567890')
38+ patch_uuid4.start()
39+ self.addCleanup(patch_uuid4.stop)
40
41 self.results = tempfile.mkstemp(prefix='lander-test-')[1]
42 self.addCleanup(os.unlink, self.results)
43
44=== modified file 'lander/lander/workflow.py'
45--- lander/lander/workflow.py 2014-10-23 19:52:38 +0000
46+++ lander/lander/workflow.py 2014-10-31 14:27:18 +0000
47@@ -62,7 +62,7 @@
48
49 def create(ts_url, ticket_data):
50 data = {
51- 'progress_trigger': 'lander-{}'.format(uuid.uuid1()),
52+ 'progress_trigger': 'lander-{}'.format(uuid.uuid4()),
53 'ticket_id': ticket_data['id'],
54 'ts_url': ts_url,
55 'steps': ticket_data['steps'],
56
57=== modified file 'test_runner/bin/check_worker.py'
58--- test_runner/bin/check_worker.py 2014-10-29 15:06:59 +0000
59+++ test_runner/bin/check_worker.py 2014-10-31 14:27:18 +0000
60@@ -74,7 +74,7 @@
61
62
63 def check_worker():
64- queue = 'testrun-test-{}'.format(uuid.uuid1())
65+ queue = 'testrun-test-{}'.format(uuid.uuid4())
66 series = 'trusty'
67 architecture = 'amd64'
68 image_id = image_store.uci_image_name('cloudimg', series, architecture)
69
70=== modified file 'test_runner/run-integration.py'
71--- test_runner/run-integration.py 2014-10-29 15:06:59 +0000
72+++ test_runner/run-integration.py 2014-10-31 14:27:18 +0000
73@@ -57,7 +57,7 @@
74
75
76 def run_test():
77- queue = 'testrun-test-{}'.format(uuid.uuid1())
78+ queue = 'testrun-test-{}'.format(uuid.uuid4())
79 series = 'trusty'
80 architecture = 'amd64'
81 image_id = image_store.uci_image_name('cloudimg', series, architecture)
82
83=== modified file 'tests/test_ppacreator.py'
84--- tests/test_ppacreator.py 2014-10-08 13:06:54 +0000
85+++ tests/test_ppacreator.py 2014-10-31 14:27:18 +0000
86@@ -64,7 +64,7 @@
87 def test_create_ppa(self):
88 """Ensure that ppa's can be created or their name can
89 be retrieved if they are already existing"""
90- unique_id = uuid.uuid1()
91+ unique_id = uuid.uuid4()
92 progress_queue_name = 'progress-trigger'
93 params = dict(
94 name='ppacreator-inttest-{}'.format(unique_id),

Subscribers

People subscribed via source and target branches