intel: Only call clock_gettime once per unreference_final.
Notably when freeing a batchbuffer, we often end up freeing many of the
buffers it points at as well. Avoiding repeated calls brings us a 9% CPU
win for cairo-gl.
intel: Improve bo_references performance by skipping the tree walk.
If the target we're asking about hasn't ever been used as a relocation
target, then it obviously hasn't been used as a target by the batch's reloc
tree. This is the common case for good GL programming where you only map
fresh buffers, and gives us a 5% win in cairo-gl.