Code review comment for lp:~jameinel/bzr/1.16-commit-fulltext

John A Meinel (jameinel) wrote :

This branch adds a new api, VersionedFiles.add_text(). If people really want, I could change it to VF.add_chunks(), but add_text() fits what I needed, and was expedient.

The main effect is to change 'bzr commit' to use file.read() rather than file.readlines(), and then to pass that on to VF.add_text() rather than VF.add_lines().

It also spends a little bit of time to remove some of the bits that were causing us to copy the memory structures a lot. It doesn't completely remove the need for a list of lines during Knit.commit() but it *does* remove the need during --dev6.commit.

To test this, I created a 90MB file, which consists of mostly 20 byte strings with no final newline. I then did:
rm -rf .bzr; bzr init --format=X; bzr add; time bzr commit -Dmemory -m "bigfile"

For --pack-0.92:
pre 469,748 kB, 5.554s
post 360,836 kB, 4.789s

For --development6-rich-root:
pre 589,732 kB, 7.785s
post 348,796 kB, 5.803s

So it is both faster and smaller. Though I still need to explore why --dev6 isn't more memory friendly. It seems to be because of the DeltaIndex structures as part of Groupcompress blocks. It might be worthwhile to optimize for not creating those on the first insert into a new group. (for the 90MB file, it seems to allocate 97MB for the 'loose' index, and then packs that into a 134MB index that has empty slots.)

« Back to merge proposal