The regression is because any call getting a timer is relatively expensive. Also, when you have inline asm somewhere, it does limit what the compiler can do to optimise (think about instruction re-ordering around the getting of the timer).
I'd much prefer to see something based on the Linux perf-events infrastructure instead - it could be easily used to get a much better profile of what's going on instead of just what we think would ever be a bottleneck.
I guess you could hook into dtrace on OSX/Solaris but I don't think anybody really cares :)
The regression is because any call getting a timer is relatively expensive. Also, when you have inline asm somewhere, it does limit what the compiler can do to optimise (think about instruction re-ordering around the getting of the timer).
I'd much prefer to see something based on the Linux perf-events infrastructure instead - it could be easily used to get a much better profile of what's going on instead of just what we think would ever be a bottleneck.
I guess you could hook into dtrace on OSX/Solaris but I don't think anybody really cares :)