-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Andrew Bennetts wrote: > Review: Approve > I have a couple of comments, but basically: > > vote approve > > John A Meinel wrote: > [...] >> + def _add_text(self, key, parents, text, nostore_sha=None, random_id=False): >> + """See VersionedFiles.add_text().""" >> + self._index._check_write_ok() >> + self._check_add(key, None, random_id, check_content=False) >> + if text.__class__ is not str: >> + raise errors.BzrBadParameterUnicode("text") > > Well... 'text' might not be unicode either ;) > > (I'm not really concerned, but ideally perhaps that exception would take the > param value or type as well as param name...) Sure. It was all that "_check_add" was doing, but _check_add does it by iterating over the list of lines. > > [...] >> @@ -1597,7 +1615,7 @@ >> if refs: >> for ref in refs: >> if ref: >> - raise KnitCorrupt(self, >> + raise errors.KnitCorrupt(self, > > Thanks for fixing this. pyflakes reports two other undefined names in this > file: > > bzrlib/groupcompress.py:1543: undefined name 'progress' > bzrlib/groupcompress.py:1689: undefined name 'RevisionNotPresent' > > Now would be a good time to fix those too :) Done Though of course it would be better to have tests that provoke the behavior.... :) >> === modified file 'bzrlib/knit.py' > [...] >> + line_bytes = ''.join(lines) >> return self._add(key, lines, parents, >> - parent_texts, left_matching_blocks, nostore_sha, random_id) >> + parent_texts, left_matching_blocks, nostore_sha, random_id, >> + line_bytes=line_bytes) > > You pass lines *and* line_bytes? That looks odd. I'm pretty sure it's > intentional, but it's worth pointing this out and explaining the rationale > in _add's docstring, I think. Sure. It is because there are places that want lines (Content objects require them, and our "diff" code requires them as well.) and the commit code would rather have line_bytes (because writing a single string to the gzip code is quite a bit faster than a bunch of short ones). And this way, we have both types accessible when we need them, rather than "re-casting" from whatever we have into the other type, and then multiplying memory *again*. I'll doc it better. > >> + def _add_text(self, key, parents, text, nostore_sha=None, random_id=False): >> + """See VersionedFiles.add_text().""" >> + self._index._check_write_ok() >> + self._check_add(key, None, random_id, check_content=False) >> + if text.__class__ is not str: >> + raise errors.BzrBadParameterUnicode("text") >> + if parents is None: >> + # The caller might pass None if there is no graph data, but kndx >> + # indexes can't directly store that, so we give them >> + # an empty tuple instead. >> + parents = () >> + return self._add(key, None, parents, >> + None, None, nostore_sha, random_id, >> + line_bytes=text) > > Hmm. It would be nice if this wasn't largely the same as _add_text in > groupcompress.py. All but the final "self._add" versus "record = FulltextContentFactory, self._insert...". It didn't seem like quite enough duplication to worry about. If you really prefer I can refactor. > > [...] >> - if lines: >> - if lines[-1][-1] != '\n': >> - # copy the contents of lines. >> + no_eol = False >> + # Note: line_bytes is not modified to add a newline, that is tracked >> + # via the no_eol flag. 'lines' *is* modified, because that is the >> + # general values needed by the Content code. > > Gosh I hate our no_eol flag... :) Yeah, it really wasn't the best way to go about it. The Content code is all sorts of confused because of it. (line-deltas are a diff *with* the final newline always present, etc.) Robert originally wrote Groupcompress with a better method (*always* add a final newline and always strip it back off), but then we ended up going with a binary format that doesn't actually need a trailing newline anymore. > > [...] >> @@ -1920,21 +1949,16 @@ >> function spends less time resizing the final string. >> :return: (len, a StringIO instance with the raw data ready to read.) >> """ >> - # Note: using a string copy here increases memory pressure with e.g. >> - # ISO's, but it is about 3 seconds faster on a 1.2Ghz intel machine >> - # when doing the initial commit of a mozilla tree. RBC 20070921 >> - bytes = ''.join(chain( > > itertools.chain is now unused in this module, delete its import. > Done. > [...] >> + for chunk in chunks: >> + if type(chunk) != str: > > “is not”, seeing as you're touching this line anyway... Done. I cleaned up a few more while I was at it. > >> === modified file 'bzrlib/tests/test_versionedfile.py' > [...] > > The conditionals and repetition here are a bit ugly, but it'll do. Ideally > this would be done with the proper test parameterisation tools, though. > > -Andrew. > > So... self.key_length is handled by the parameterization, but I think I can use 'self.get_simple_key()' which knows about self.key_length and at least lets us hide the fact that it is doing crazy stuff with key width :) I don't know a way to handle 'self.graph' better. If the VF supports graph, then we want to have parents supplied. If the VF *doesn't* then it generates an exception if you supply parents. It at least was how other tests (like test_annotate) use the same 'if self.graph' so I stuck with it. So it is at least better, even if it isn't perfect. I suppose we could add a test permutation that tried to turn all calls to "add_lines" into "_add_text" and run against those. I don't feel it is quite worth that. John =:-> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Cygwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAko/pRAACgkQJdeBCYSNAAPrzACgi5N62aud9OGCpyFmteCYJniD RN0AniqoWoTk7HdFqsP4lfbgNKuSxXW3 =RHWC -----END PGP SIGNATURE-----