Merge lp:~doxxx/bzr/572092-ignore-dupes into lp:bzr
| Status: | Merged | ||||
|---|---|---|---|---|---|
| Approved by: | Parth Malwankar on 2010-05-04 | ||||
| Approved revision: | 5203 | ||||
| Merged at revision: | 5207 | ||||
| Proposed branch: | lp:~doxxx/bzr/572092-ignore-dupes | ||||
| Merge into: | lp:bzr | ||||
| Diff against target: |
151 lines (+69/-20) 3 files modified
NEWS (+3/-0) bzrlib/ignores.py (+31/-13) bzrlib/tests/test_ignores.py (+35/-7) |
||||
| To merge this branch: | bzr merge lp:~doxxx/bzr/572092-ignore-dupes | ||||
| Related bugs: |
|
| Reviewer | Review Type | Date Requested | Status |
|---|---|---|---|
| Parth Malwankar | 2010-05-01 | Approve on 2010-05-04 | |
| Alexander Belchenko | Approve on 2010-05-04 | ||
|
Review via email:
|
|||
Commit Message
(doxxx) 'ignore' command avoids putting duplicate entry into .bzrignore
Description of the Change
Fixes bug 572092 by changing tree_ignores_
| Gordon Tyler (doxxx) wrote : | # |
On 5/1/2010 2:12 AM, Alexander Belchenko wrote:
> Perhaps, reading the file twice from the disk is not best idea, but
> we should hit disk cache easily. And this operation is not
> performance critical one.
My initial implementation just read it into the set, updated the set
with the new patterns and then wrote the set back out to disk. The
problem with that was that it changed the order of the patterns in the file.
I could probably read the file once into a StringIO object and then
parse from that...
| Parth Malwankar (parthm) wrote : | # |
> I could probably read the file once into a StringIO object and then
> parse from that...
StringIO sounds like a good idea to me, in the edge case that
the file gets modified between reads the buffer will be
consistent for the program.
| John A Meinel (jameinel) wrote : | # |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Parth Malwankar wrote:
>> I could probably read the file once into a StringIO object and then
>> parse from that...
>
> StringIO sounds like a good idea to me, in the edge case that
> the file gets modified between reads the buffer will be
> consistent for the program.
Why not read into a list with a set check:
patterns = []
pattern_set = set()
for line in disk_file:
ignore = XXXX(line)
if ignore not in pattern_set:
pattern_
patterns.
return patterns
You get filtering and still retain the ordering. Also note that you can
have comments in the ignore file, which we would also like to preserve.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAkv
vDQAoMIJXrt0sbT
=pz71
-----END PGP SIGNATURE-----
| Gordon Tyler (doxxx) wrote : | # |
> Why not read into a list with a set check:
>
> patterns = []
> pattern_set = set()
>
> for line in disk_file:
> ignore = XXXX(line)
> if ignore not in pattern_set:
> pattern_
> patterns.
>
> return patterns
I wanted to reuse the parse_ignore_file so that I didn't have to duplicate all the reading and error handling code.
> You get filtering and still retain the ordering. Also note that you can
> have comments in the ignore file, which we would also like to preserve.
Yeah, that was my intention.
| Parth Malwankar (parthm) wrote : | # |
I think assertPatternEquals needs to use sorted list instead
of set for tests to catch duplication error. Apart from that
this looks good to me.
def assertPatternsE
| Gordon Tyler (doxxx) wrote : | # |
Good point and done.
| Alexander Belchenko (bialix) wrote : | # |
Usually we importing StringIO from c-module cStringIO
from cStringIO import StringIO
Is there any special reason why you prefer python implementation?
| Gordon Tyler (doxxx) wrote : | # |
On 5/2/2010 3:28 PM, Alexander Belchenko wrote:
> Usually we importing StringIO from c-module cStringIO
>
> from cStringIO import StringIO
>
> Is there any special reason why you prefer python implementation?
This isn't a particular performance-
StringIO would be okay. I'm assuming there's a cost to import cStringIO?
Otherwise why would the python implementation exist at all?
| Andrew Bennetts (spiv) wrote : | # |
Gordon Tyler wrote:
> On 5/2/2010 3:28 PM, Alexander Belchenko wrote:
> > Usually we importing StringIO from c-module cStringIO
[...]
>
> This isn't a particular performance-
> StringIO would be okay. I'm assuming there's a cost to import cStringIO?
> Otherwise why would the python implementation exist at all?
Because it's a very old module in the standard library, and people know
better now :)
FWIW, in Python 3, there's no cStringIO any more. There's just
io.StringIO (and io.BytesIO), and they are automatically C
implementations.
| Gordon Tyler (doxxx) wrote : | # |
Well, alrighty then. :) cStringIO it is.
| Parth Malwankar (parthm) wrote : | # |
> Usually we importing StringIO from c-module cStringIO
>
> from cStringIO import StringIO
>
> Is there any special reason why you prefer python implementation?
I don't know the impact in this specific situation but
IIUC StringIO works with unicode cStringIO does not.
From the cStringIO docs[1]:
> Unlike the memory files implemented by the StringIO
> module, those provided by this module are not able
> to accept Unicode strings that cannot be encoded
> as plain ASCII strings.
[1] http://
| John A Meinel (jameinel) wrote : | # |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Gordon Tyler wrote:
> Well, alrighty then. :) cStringIO it is.
Well that and because doing stuff like:
cStringIO.
will give you a pointer to the *raw* unicode bytes. (So on UCS2
versions, you get '\0u\0n\
'\0\0\0u\
However, the really good bit is that cStringIO forces it to be in 8-bit
encoding, which is useful when you want to make sure you aren't
switching between 8-bit strings and Unicode strings accidentally.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAkv
ke4AmQE4OT/
=oCIc
-----END PGP SIGNATURE-----
| Gordon Tyler (doxxx) wrote : | # |
On 5/2/2010 10:57 PM, John A Meinel wrote:
> Well that and because doing stuff like:
>
> cStringIO.
>
> will give you a pointer to the *raw* unicode bytes. (So on UCS2
> versions, you get '\0u\0n\
> '\0\0\0u\
>
> However, the really good bit is that cStringIO forces it to be in 8-bit
> encoding, which is useful when you want to make sure you aren't
> switching between 8-bit strings and Unicode strings accidentally.
The new patterns, which may be Unicode strings, are encoded to UTF-8
before being written out to the file.
However, I added some more tests to see how it deals with non-Unix line
endings in the .bzrignore file and it isn't preserving them properly
when writing out the new contents. So back to the drawing board for me.
| Gordon Tyler (doxxx) wrote : | # |
Once Launchpad decides to finish processing my latest push, I believe I have solved the line endings problem and added tests for both that and Unicode handling.
| Gordon Tyler (doxxx) wrote : | # |
Hmmm... Launchpad is still processing my last push after 6 hours. Could somebody poke it with a sharp stick?
| Alexander Belchenko (bialix) wrote : | # |
Gordon Tyler пишет:
> Hmmm... Launchpad is still processing my last push after 6 hours. Could somebody poke it with a sharp stick?
| Gordon Tyler (doxxx) wrote : | # |
Okay, looks like it's updated. Give it an eyeball or two?
| Robert Collins (lifeless) wrote : | # |
Launchpad will be slow for another day - its processing all of
maverick (the release after lucid) at the moment. Just a heads up.
| Parth Malwankar (parthm) wrote : | # |
> Launchpad will be slow for another day - its processing all of
> maverick (the release after lucid) at the moment. Just a heads up.
Thanks for the heads up Robert.
I will mark this as approved and ready to land, and try to land it tomorrow (once branch refresh is more responsive) in case no one has any further points to discuss on this mp.

Perhaps, reading the file twice from the disk is not best idea, but we should hit disk cache easily. And this operation is not performance critical one.
Thanks!