Merge lp:~craighewetson-deactivatedaccount/bzr-search/index-fileids into lp:bzr-search

Proposed by Craig Hewetson on 2009-09-19
Status: Rejected
Rejected by: Jelmer Vernooij on 2011-09-18
Proposed branch: lp:~craighewetson-deactivatedaccount/bzr-search/index-fileids
Merge into: lp:bzr-search
Diff against target: None lines
To merge this branch: bzr merge lp:~craighewetson-deactivatedaccount/bzr-search/index-fileids
Reviewer Review Type Date Requested Status
Robert Collins 2009-09-19 Pending
Review via email: mp+12113@code.launchpad.net
To post a comment you must log in.

I would like to allow the user to search on paths that "changed" (added, modified, renamed etc) at each revision and not just Messages and Text.

This will make the index process slower, but I believe this feature will work as a nice replacement for when bzr log <dir> is too slow (and therefore useless).

When or if this change gets merged, I can then propose a minor change on qbzr's qlog ui to indicate that path searching is also an option.

Currently, not sure on how to handle paths with whitespace. i.e: "temp/the directory/test"

75. By Craig Hewetson <email address hidden> on 2009-09-20

Replaced white space with % character. This character was chosen because of its use in URL's for the purpose.

Robert Collins (lifeless) wrote :

Hi, I'm not sure this is needed with the 2a formats facilities and
speed.

log -v is _extremely_ fast in that format.

What format are you testing against?

-Rob

Hi Rob

I'm using pack-0.92. Until bzr 2.0 is released I suppose I can't use the 2a format. The branches I'm working on are not for personal use but are being actively used within the company I work for.

The senior developers are sceptical about using bazaar and now that they where "forced" to use it, I got complaints about bzr qlog <dir> taking too long. Also if I tell them to install a release candidate it won't go down well.

I suppose as soon as bzr 2.0 is released I can upgrade all their bazaar installations and upgrade our existing shared repository and branches to make use of 2a. Just hope its not going to be a mission.

I'll ditch this branch in the hope that upgrading to 2a will greatly improve the performance of bzr log <dir>

Robert Collins (lifeless) wrote :

On Sun, 2009-09-20 at 08:38 +0000, Craig Hewetson wrote:
> Hi Rob
>
> I'm using pack-0.92. Until bzr 2.0 is released I suppose I can't use the 2a format. The branches I'm working on are not for personal use but are being actively used within the company I work for.
>
> The senior developers are sceptical about using bazaar and now that they where "forced" to use it, I got complaints about bzr qlog <dir> taking too long. Also if I tell them to install a release candidate it won't go down well.
>
> I suppose as soon as bzr 2.0 is released I can upgrade all their bazaar installations and upgrade our existing shared repository and branches to make use of 2a. Just hope its not going to be a mission.
>
>
> I'll ditch this branch in the hope that upgrading to 2a will greatly improve the performance of bzr log <dir>

What you can do now is on a test machine do some testing:
 - install 2.0.0rc2 (this is what 2.0.0 will be - no further changes are
being made)
 - convert a spare branch
 - test performance
 - *file bugs* in bzr on this.

I think its a good idea you had to use bzr-search to workaround bzr's
performance in this area; however 2a should be capable of being
extremely fast - if its not, it should be something we can do in a point
release, and I know that folk are interested in this.

-Rob

>
>
> What you can do now is on a test machine do some testing:
> - install 2.0.0rc2 (this is what 2.0.0 will be - no further changes are
> being made)
> - convert a spare branch
> - test performance
> - *file bugs* in bzr on this.
>
> Alrightly, I'll do this.

I think its a good idea you had to use bzr-search to workaround bzr's
> performance in this area; however 2a should be capable of being
> extremely fast - if its not, it should be something we can do in a point
> release, and I know that folk are interested in this.
>

In that case, I'll leave the branch intact for a while.

76. By Craig Hewetson <email address hidden> on 2009-09-21

fixed import

77. By Craig Hewetson <email address hidden> on 2009-09-21

added progress.

I made a few changes: Added progress and "handled" whitespace.
Maybe this indexing feature could be an option, like commits_only.

> What you can do now is on a test machine do some testing:
> - install 2.0.0rc2 (this is what 2.0.0 will be - no further changes are
> being made)
> - convert a spare branch
> - test performance
> - *file bugs* in bzr on this.

I've tested bzr log dir on a 2a formatted branch and it is slower than the 0.92 formatted branch.
I'm using Bazaar 2.1dev.
see details on 14th comment:
https://bugs.launchpad.net/bzr/+bug/374730

Based on this, I think that this change to bzr search might be more useful.
To reduce the size of the index maybe I need to allow the user to give options to bzr search to index only certain aspects of the history. Paths only, Text messages only or combinations of both... etc

Robert Collins (lifeless) wrote :

John has just landed some improvements for this (10 times performance
improvement) on bzr.dev.

-Rob

Jelmer Vernooij (jelmer) wrote :

Hi Craig,

Are path based searches still slow for you in current versions of Bazaar?

I started working on this feature because I wanted a workaround for the poor performance in bzr log <dir> and even worse in bzr qlog <dir>. The performance in bzr log <dir> has improved (bzr version 2.3.3) but its still not good enough. (Probably because we have very deep directory structures in our source repository, like many java projects.)

So I would prefer if this branch be Abandoned because its probably not a good idea in the first place to implement this workaround in bzr-search.

Jelmer Vernooij (jelmer) wrote :

Thanks Craig, I'll mark it as such. 2.4 should have some improvements that will help with "bzr log -v" if you have a large tree.

Unmerged revisions

77. By Craig Hewetson <email address hidden> on 2009-09-21

added progress.

76. By Craig Hewetson <email address hidden> on 2009-09-21

fixed import

75. By Craig Hewetson <email address hidden> on 2009-09-20

Replaced white space with % character. This character was chosen because of its use in URL's for the purpose.

74. By Craig Hewetson <email address hidden> on 2009-09-19

Indexing on the file ID made no sense.
So it was changed to indexing on the file path. It will index the changes (added, renamed, modified) to file paths.
This allows the user to search on revisions where certain files were "changed".

73. By Craig Hewetson <email address hidden> on 2009-09-18

FileIds at each revision delta is now indexed. Allowing the user to bzr search fileid, and most importantly allowing qbzr to make use of this feature. So that it would be alot faster to do a bzr qlog and search for a specific directory.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'index.py'
2--- index.py 2009-06-03 21:44:06 +0000
3+++ index.py 2009-09-19 09:42:06 +0000
4@@ -752,6 +752,21 @@
5 # other filters?
6 message_utf8 = revision.message.encode('utf8')
7 commit_terms = _tokeniser_re.split(message_utf8)
8+ # add fileIDs to commit terms.
9+ delta = repository.get_revision_delta(revision.revision_id)
10+ for path, id, kind in delta.added:
11+ commit_terms.append(path.encode('utf8'))
12+
13+ for path, id, kind, text_modified, meta_modified in delta.modified:
14+ commit_terms.append(path.encode('utf8'))
15+
16+ for path, id, kind in delta.removed:
17+ commit_terms.append(path.encode('utf8'))
18+
19+ for (oldpath, newpath, id, kind,
20+ text_modified, meta_modified) in delta.renamed:
21+ commit_terms.append(newpath.encode('utf8'))
22+
23 for term in commit_terms:
24 if not term:
25 continue

Subscribers

No one subscribed via source and target branches