> Well, that's unexpected since all bootstack instances seems to have
> unique MACs [1]

I've checked that before proposing this MP with 'juju run --service
ci-airline-tr-rabbit-worker "python -c 'import uuid; print
uuid.uuid1().get_node()'"' (edited for readability):

- MachineId:  "4" Stdout: 274973446818410 UnitId: ci-airline-tr-rabbit-worker/0
- MachineId:  "5" Stdout: 274973443101009 UnitId: ci-airline-tr-rabbit-worker/1
- MachineId:  "6" Stdout: 274973449401742 UnitId: ci-airline-tr-rabbit-worker/2
- MachineId:  "7" Stdout: 274973436426391 UnitId: ci-airline-tr-rabbit-worker/3
- MachineId:  "8" Stdout: 274973440939877 UnitId: ci-airline-tr-rabbit-worker/4
- MachineId:  "9" Stdout: 274973446307917 UnitId: ci-airline-tr-rabbit-worker/5
- MachineId: "10" Stdout: 274973450245143 UnitId: ci-airline-tr-rabbit-worker/6
- MachineId: "11" Stdout: 274973446513838 UnitId: ci-airline-tr-rabbit-worker/7
- MachineId: "12" Stdout: 274973437559701 UnitId: ci-airline-tr-rabbit-worker/8
- MachineId: "13" Stdout: 274973448825124 UnitId: ci-airline-tr-rabbit-worker/9
- MachineId: "14" Stdout: 274973436691689 UnitId: ci-airline-tr-rabbit-worker/10
- MachineId: "15" Stdout: 274973452322400 UnitId: ci-airline-tr-rabbit-worker/11
- MachineId: "16" Stdout: 274973444235950 UnitId: ci-airline-tr-rabbit-worker/12
- MachineId: "17" Stdout: 274973449976901 UnitId: ci-airline-tr-rabbit-worker/13
- MachineId: "18" Stdout: 274973447132163 UnitId: ci-airline-tr-rabbit-worker/14
- MachineId: "19" Stdout: 274973440732073 UnitId: ci-airline-tr-rabbit-worker/15
- MachineId: "20" Stdout: 274973446679760 UnitId: ci-airline-tr-rabbit-worker/16
- MachineId: "21" Stdout: 274973442815065 UnitId: ci-airline-tr-rabbit-worker/17
- MachineId: "22" Stdout: 274973443908206 UnitId: ci-airline-tr-rabbit-worker/18
- MachineId: "23" Stdout: 274973442948691 UnitId: ci-airline-tr-rabbit-worker/19

Ok, host IDs are unique among all the workers.

> and RTC are synced [2]

Well, 'juju run --all date' can only reveal big differences but the
timestamps in the logs are more precise anyway and they do tell that the
clocks are roughly synced (no significant divergence).

So despite wikipedia, your intuition and mine about how this is supposed to
work, real life say it doesn't work that way.

> 
> UUIDv1 are composed from the generating unit MAC and a .1 ms precision
> timestamp [3], there is a theoretic very narrow window for collision
> if the MACs are unique, there must be something else going on.

Yup, what can be that something else then ?

Looking at the code:

def uuid1(node=None, clock_seq=None):
    """Generate a UUID from a host ID, sequence number, and the current time.
    If 'node' is not given, getnode() is used to obtain the hardware
    address.  If 'clock_seq' is given, it is used as the sequence number;
    otherwise a random 14-bit sequence number is chosen."""

getnode() is used they say but:

    if _uuid_generate_time and node is clock_seq is None:
        _buffer = ctypes.create_string_buffer(16)
        _uuid_generate_time(_buffer)
        return UUID(bytes=_buffer.raw)

No get_node() there and:

    if node is None:
        node = getnode()
    return UUID(fields=(time_low, time_mid, time_hi_version,
                        clock_seq_hi_variant, clock_seq_low, node), version=1)

But that's too late.

So at first glance, I'd say that this is genuine bug in uuid.uuid1() that
get_node() is not called despite receiving node=None as a parameter and that
we're only using time here which means our clocks are synced enough to
collide.

And *that* I could accept ;)

> 
> In a pseudo-random scenario (VMs w/o any true-random HW) I'd say there
> is more chance for system-wide collision using UUIDv4 than v1.

That's quite extreme indeed but is not what we're running into: we're only
relying on time and we happen to have good sync between our units, that's
enough to run into collisions.

> 
> I don't really see collisions happening in the TS/DB domain, even if
> the ticket populating rate was as high as .1ms(it's far from it),
> postgres enforces uniqueness, so the ticket creation would fail.

As long as you have a unique worker creating the uuids, yes, you're safe. If
you start having more then you may run into the same issue and the other
worker will fail, so you'll have to deal with that failure by re-trying
right (or just switch to uuid4() to avoid collisions) ?

> From
> that time on, we can assume the TS and the GK would be always operated
> with unique UUIDs

Only if you have a single worker relyig on uuid1().

> or fail normally with 404.

Not sure I follow here, do you mean you let the error propagate to the user
bu a 404 ? I for one will probably will confused by getting a 404 when
trying to *create* a new object...

> Moreover, soon we will
> need to order/recover-timestamp from UUID-named swift containers and
> that is only possible with v1.

Then the sonner the better, but let's find a working uuid scheme first ;)

> 
> So, IMO, TS/DB/GK don't have to be patched, since they do not have
> problems with v1, then you can safe yourself from the south migration
> embarrassment


> (never edit existing migrations, it default the whole
> purpose of having them).

Sure, the patch was mentioned only to outlight where we're using uuid1(), I
had no intention to modify a generated file ;)

> 
> That restrict us to the rabbit/queues environment, where you recently
> found that timestamp collision.

I don't think we should restrict the area that much, I've tested uuid4() for
my issue and it fixes it, so the issue really seems to be that uuid1() is
not good enough despite what you and I can think, Paul was right, no matter
how hard this is to believe, uuid1() collides with synced clocks
probably because it ignores host IDs.

> 
> I am happy to help you investigate it further and find an appropriate
> solution for this.

Thanks, highly appreciated !

Can you have a look at the uuid code and tell me if you agree or disagree
with the analysis above ?