lp:~jameinel/+junk/murmurhash3-go

Created by John A Meinel and last modified
Get this branch:
bzr branch lp:~jameinel/+junk/murmurhash3-go
Only John A Meinel can upload to this branch. If you are John A Meinel please log in for upload directions.

Related bugs

Related blueprints

Branch information

Owner:
John A Meinel
Status:
Development

Recent revisions

36. By John A Meinel

Update the Makefile to make it a bit easier to build with gccgo.
It has a bit of hard-coded stuff that should be brought from
the environment, but meh... it works.

35. By John A Meinel

Reverting to the fastest version.
gccgo: 10000 loops in 0.578s, 57791 ns/op

34. By John A Meinel

Going around the uintptr and switching to a cast to
'double_block' type that has the low and high uint64 members.
Gets us to:
gccgo: 10000 loops in 0.592s, 59201 ns/op

So uintptr is still the fastest, but this is at least close.

33. By John A Meinel

Switching to unsafe at the last moment costs us a bit:
gccgo: 10000 loops in 0.792s, 79154 ns/op

That is probably the boundary checking overhead.

32. By John A Meinel

We got it all put together.

It does turn out that gccgo creates a faster binary than the 6g compiler.
Here are some numbers:
vanilla: 10000 loops in 9.202s, 920216 ns/op
inline: 10000 loops in 4.422s, 442162 ns/op
risky: 10000 loops in 1.647s, 164672 ns/op
gccgo: 10000 loops in 3.650s, 365037 ns/op

vs
vanilla: 10000 loops in 3.881s, 388128 ns/op
inline: 10000 loops in 3.982s, 398245 ns/op
risky: 10000 loops in 0.580s, 58002 ns/op
gccgo: 10000 loops in 0.575s, 57477 ns/op

vs the C++ program
Time per hash: 57167ns

That puts the C++ version only marginally faster (57.1us vs 57.4us)
Time to see how much we can relax the Go version and keep the
performance.

31. By John A Meinel

Refactor the code to go via interfaces.

This should make it easier to change the speed_murmur3.go function
so that it can take a flag for what implementation to use.

30. By John A Meinel

revert previous.

29. By John A Meinel

Using a for loop + range to avoid bounds checking is not a good idea. Slows down inline version from 443us to 850us.

28. By John A Meinel

Avoid bounds-checking by going to uintptr types.
This gives us pointer arithmetic again, which go makes it hard to
go around. However, it drops _unsafe to 165us (from about 189us).
My ASM tests seem to indicate that if the compiler generated ROLQ
instructions, it would speed things by another 60us or so.
Which would bring us to 100us vs ~60us for the C++ code.
Probably it all comes down to gcc generating better optimized
assembly than the go compiler.

27. By John A Meinel

As expected, the call overhead is much higher than the
'use an assembly command' benefit.
Assembly shows a benefit of 930us to 871us if you are going to make
function calls. However doing a CALL rather than inline shifts drops
us from 189us down to 285us for the _unsafe version.

Branch metadata

Branch format:
Branch format 7
Repository format:
Bazaar repository format 2a (needs bzr 1.16 or later)
This branch contains Public information 
Everyone can see this information.

Subscribers