Merge lp:~abentley/bzr/fix_get_mtime into lp:~bzr/bzr/trunk-old

Proposed by Aaron Bentley
Status: Merged
Merge reported by: Aaron Bentley
Merged at revision: not available
Proposed branch: lp:~abentley/bzr/fix_get_mtime
Merge into: lp:~bzr/bzr/trunk-old
Diff against target: None lines
To merge this branch: bzr merge lp:~abentley/bzr/fix_get_mtime
Reviewer Review Type Date Requested Status
Robert Collins (community) Needs Fixing
Review via email: mp+10544@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Aaron Bentley (abentley) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi all,

This patch fixes bug 251532 by omitting the path when falling back to
self._transform._tree. The file's path may have changed, and since the
path isn't mandatory, it seems simplest to skip it. It would be
possible to generate the path, but it isn't even needed for some tree
types, so it seems wasteful.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkqO5ScACgkQ0F+nu1YWqI2LXwCeL2T8AA4rEbNhRMKjz1U/W5bn
n20An099tD8dfwVcGIUKTxVwMbUAGGOO
=j3Mm
-----END PGP SIGNATURE-----

Revision history for this message
Robert Collins (lifeless) wrote :

measuring this would be good.... but I suspect there are too many confounding factors to get accurate results today. At least for dirstate trees it can avoid building a fileid<->path mapping, so I would be a lot happier if this passed the right path in, rather than not passing it in. Tree transform should know that the rename has occured and be able to determine this easily.

review: Needs Fixing
Revision history for this message
Aaron Bentley (abentley) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> Review: Needs Fixing
> measuring this would be good.... but I suspect there are too many confounding factors to get accurate results today. At least for dirstate trees it can avoid building a fileid<->path mapping, so I would be a lot happier if this passed the right path in, rather than not passing it in.

The only surefire way to get an accurate path is to call Tree.id2path,
so I don't think that would save anything.

> Tree transform should know that the rename has occured and be able to determine this easily.

The TreeTransform would need to check for renames and moves of the
current file and all its parents in order to give accurate results-- not
trivial. Not worth it, IMO.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkqSzYoACgkQ0F+nu1YWqI2mMACfaMzr0Z4akpF/vYFxXbQPoujN
RokAnRpcNyYO0L+j1K9D3wzvFgSDoCsi
=qZys
-----END PGP SIGNATURE-----

Revision history for this message
Robert Collins (lifeless) wrote :

Creating a path<->id map is expensive on large trees; I've found significant benefits in commit etc when enough layers are pushed through to avoid this; not passing in the path will cause that map to be created. My understanding from Martin is that we want to be heading across the board to proving path data as much as possible - to avoid unnecessary work.

I'm not sure why its complete for TreeTransform to calculate the path, surely its just a walk of transaction parents ?

Revision history for this message
Aaron Bentley (abentley) wrote :

> Creating a path<->id map is expensive on large trees;

Generally, this price will have already been paid.

trans_id_tree_file_id and trans_id_file_id will hit id2path, so a lot of the client code that constructs a TreeTransform will hit id2path.

Even if id2path isn't used in the construction of the TreeTransform, trans_id_file_id is heavily used in the implementation of PreviewTree, so using the tree will generally hit id2path.

I don't understand why there needs to be such a map. Can we not retrieve only the data we care about? That would scale with the amount of data we're actually showing, which I think is an acceptable scaling factor.

> I've found significant
> benefits in commit etc when enough layers are pushed through to avoid this;
> not passing in the path will cause that map to be created. My understanding
> from Martin is that we want to be heading across the board to proving path
> data as much as possible - to avoid unnecessary work.

I wouldn't dismiss these benefits, but I don't know that they apply here.

This is a codepath that you will hit only when querying the mtime of an unmodified file. Our diff code doesn't do that; only qbzr's does.

> I'm not sure why its complete for TreeTransform to calculate the path, surely
> its just a walk of transaction parents ?

If the transaction does not create or rename the parents, there's no guarantee we have their names, much less their paths.

Aaron

Revision history for this message
Robert Collins (lifeless) wrote :

On Tue, 2009-09-29 at 03:25 +0000, Aaron Bentley wrote:
> > Creating a path<->id map is expensive on large trees;
>
> Generally, this price will have already been paid.
>
> trans_id_tree_file_id and trans_id_file_id will hit id2path, so a lot
> of the client code that constructs a TreeTransform will hit id2path.
>
> Even if id2path isn't used in the construction of the TreeTransform,
> trans_id_file_id is heavily used in the implementation of PreviewTree,
> so using the tree will generally hit id2path.
>
> I don't understand why there needs to be such a map. Can we not
> retrieve only the data we care about? That would scale with the
> amount of data we're actually showing, which I think is an acceptable
> scaling factor.

Ok, land it.

The map is needed because dirstate is laid out internally to match the
IO pattern of 'status', and thats very heavily biased to paths, to match
disk layout and locality of reference.
So we can answer 'path2id' _very_ fast, but we cannot answer 'id2path'
without scanning the whole dirstate to find out if 'id' exists, and thus
where it is.

-Rob

Updating diff...

An updated diff will be available in a few minutes. Reload to see the changes.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrlib/tests/test_transform.py'
2--- bzrlib/tests/test_transform.py 2009-08-13 01:33:15 +0000
3+++ bzrlib/tests/test_transform.py 2009-08-21 18:07:25 +0000
4@@ -2274,6 +2274,18 @@
5 self.assertEqual(os.stat(limbo_path).st_mtime,
6 preview_tree.get_file_mtime('file-id'))
7
8+ def test_get_file_mtime_renamed(self):
9+ work_tree = self.make_branch_and_tree('tree')
10+ self.build_tree(['tree/file'])
11+ work_tree.add('file', 'file-id')
12+ preview = TransformPreview(work_tree)
13+ self.addCleanup(preview.finalize)
14+ file_trans_id = preview.trans_id_tree_file_id('file-id')
15+ preview.adjust_path('renamed', preview.root, file_trans_id)
16+ preview_tree = preview.get_preview_tree()
17+ preview_mtime = preview_tree.get_file_mtime('file-id', 'renamed')
18+ work_mtime = work_tree.get_file_mtime('file-id', 'file')
19+
20 def test_get_file(self):
21 preview = self.get_empty_preview()
22 preview.new_file('file', preview.root, 'contents', 'file-id')
23
24=== modified file 'bzrlib/transform.py'
25--- bzrlib/transform.py 2009-08-04 11:40:59 +0000
26+++ bzrlib/transform.py 2009-08-21 18:07:25 +0000
27@@ -1883,7 +1883,7 @@
28 def get_file_mtime(self, file_id, path=None):
29 """See Tree.get_file_mtime"""
30 if not self._content_change(file_id):
31- return self._transform._tree.get_file_mtime(file_id, path)
32+ return self._transform._tree.get_file_mtime(file_id)
33 return self._stat_limbo_file(file_id).st_mtime
34
35 def _file_size(self, entry, stat_value):