Code review comment for lp:~abentley/bzr/fix_get_mtime

Revision history for this message
Robert Collins (lifeless) wrote :

On Tue, 2009-09-29 at 03:25 +0000, Aaron Bentley wrote:
> > Creating a path<->id map is expensive on large trees;
>
> Generally, this price will have already been paid.
>
> trans_id_tree_file_id and trans_id_file_id will hit id2path, so a lot
> of the client code that constructs a TreeTransform will hit id2path.
>
> Even if id2path isn't used in the construction of the TreeTransform,
> trans_id_file_id is heavily used in the implementation of PreviewTree,
> so using the tree will generally hit id2path.
>
> I don't understand why there needs to be such a map. Can we not
> retrieve only the data we care about? That would scale with the
> amount of data we're actually showing, which I think is an acceptable
> scaling factor.

Ok, land it.

The map is needed because dirstate is laid out internally to match the
IO pattern of 'status', and thats very heavily biased to paths, to match
disk layout and locality of reference.
So we can answer 'path2id' _very_ fast, but we cannot answer 'id2path'
without scanning the whole dirstate to find out if 'id' exists, and thus
where it is.

-Rob

« Back to merge proposal