John A Meinel

lp:~jameinel/+junk/murmurhash3-go

Created by John A Meinel on 2011-06-16 and last modified on 2011-06-18

Get this branch:: bzr branch lp:~jameinel/+junk/murmurhash3-go

Only John A Meinel can upload to this branch. If you are John A Meinel please log in for upload directions.

Related bugs

Link a bug report

Related blueprints

Branch information

Owner:: John A Meinel

Status:: Development

Recent revisions

36. By John A Meinel on 2011-06-18

Update the Makefile to make it a bit easier to build with gccgo.
It has a bit of hard-coded stuff that should be brought from
the environment, but meh... it works.

35. By John A Meinel on 2011-06-18

Reverting to the fastest version.
gccgo: 10000 loops in 0.578s, 57791 ns/op

34. By John A Meinel on 2011-06-18

Going around the uintptr and switching to a cast to
'double_block' type that has the low and high uint64 members.
Gets us to:
gccgo: 10000 loops in 0.592s, 59201 ns/op

So uintptr is still the fastest, but this is at least close.

33. By John A Meinel on 2011-06-18

Switching to unsafe at the last moment costs us a bit:
gccgo: 10000 loops in 0.792s, 79154 ns/op

That is probably the boundary checking overhead.

32. By John A Meinel on 2011-06-18

We got it all put together.

It does turn out that gccgo creates a faster binary than the 6g compiler.
Here are some numbers:
vanilla: 10000 loops in 9.202s, 920216 ns/op
inline: 10000 loops in 4.422s, 442162 ns/op
risky: 10000 loops in 1.647s, 164672 ns/op
gccgo: 10000 loops in 3.650s, 365037 ns/op

vs
vanilla: 10000 loops in 3.881s, 388128 ns/op
inline: 10000 loops in 3.982s, 398245 ns/op
risky: 10000 loops in 0.580s, 58002 ns/op
gccgo: 10000 loops in 0.575s, 57477 ns/op

vs the C++ program
Time per hash: 57167ns

That puts the C++ version only marginally faster (57.1us vs 57.4us)
Time to see how much we can relax the Go version and keep the
performance.

31. By John A Meinel on 2011-06-18

Refactor the code to go via interfaces.

This should make it easier to change the speed_murmur3.go function
so that it can take a flag for what implementation to use.

30. By John A Meinel on 2011-06-18

revert previous.

29. By John A Meinel on 2011-06-18

Using a for loop + range to avoid bounds checking is not a good idea. Slows down inline version from 443us to 850us.

28. By John A Meinel on 2011-06-17

Avoid bounds-checking by going to uintptr types.
This gives us pointer arithmetic again, which go makes it hard to
go around. However, it drops _unsafe to 165us (from about 189us).
My ASM tests seem to indicate that if the compiler generated ROLQ
instructions, it would speed things by another 60us or so.
Which would bring us to 100us vs ~60us for the C++ code.
Probably it all comes down to gcc generating better optimized
assembly than the go compiler.

27. By John A Meinel on 2011-06-17

As expected, the call overhead is much higher than the
'use an assembly command' benefit.
Assembly shows a benefit of 930us to 871us if you are going to make
function calls. However doing a CALL rather than inline shifts drops
us from 189us down to 285us for the _unsafe version.

Branch metadata

Branch format:: Branch format 7

Repository format:: Bazaar repository format 2a (needs bzr 1.16 or later)

This branch contains Public information

Everyone can see this information.

Subscribers

John A Meinel

John A Meinel

lp:~jameinel/+junk/murmurhash3-go

Related source package recipes

Related snap packages

Related bugs

Related blueprints

Branch information

Recent revisions

Branch metadata

Nearby

Subscribers