Code review comment for lp:~jameinel/bzr/2.1-static-tuple-btree-string-intern

Revision history for this message
John A Meinel (jameinel) wrote :

This is a fairly minor update to my "btree uses StaticTuples", though it is mostly orthogonal.

This changes the code that decides to intern strings. Specifically it:

1) Doesn't intern key bits that start with 'sha1:'.
   a) There are *lots* of these (500k)
   b) They are rarely duplicated (no parent list, and only duplicated in the chk maps)
   c) They are never a sub part of a key. revision_id may show up as (revision_id,) or may show up
      as (file_id, revision_id), sha1: is never used like that [yet]

  This saves us 24 bytes per string in the string interned dict. At 500k sha1's that is 11MB.
  Note that at the moment, we still intern the StaticTuple('sha1:...'). I may want to revisit
  that, since there are no parent lists. However, for something like fetch, we will load the key
  from the index, and then load it again in the chk map. So I think interning is a net win.

2) Does intern some of the 'values' if they indicate the value is for a 'null content' record. We
   get a null content record for all of the entries for directories. And since our upgrade to
   rich-root decided to add an entry for TREE_ROOT for every revision, we have *lots* of these.
   (I think it was something like 1/4th of all entries in the per-file graph was these root keys.)

   The code comment mentions that when loading bzr.dev it saves 1.2MB, and for launchpad it saves
   around 4MB. Which was right about 1.7% of peak memory. Not huge, but a fairly cheap win.

This is a little bit of a layering inversion, because it is adding domain specific logic into btree index, when they should be considered generic containers. However, the memory savings is visible. It also isn't a correctness thing, as it only decides whether we intern() or not.

« Back to merge proposal