I'm no dirstate expert, so depending on how confident you are with this patch you might like to wait for another review before merging.
Overall, I this change seems ok to me. It deserves careful benchmarking of course, but I know you've been doing that :)
It's interesting to me that _build_line_cache actually re-reads the lines from the file. I had expected that maybe we'd keep the lines around from the original read, but perhaps this is just as good. If we mmap'ed the dirstate then certainly re-reading from the mmap rather than keeping a copy would clearly make sense.
+ :param worth_saving_limit: when the exact number of changed entries
+ is known, only bother saving the dirstate if more than this
+ count of entries have changed. -1 means never save.
I'd personally prefer "None means never save", but that's not a big deal.
I believe the dirstate can record changes to tree state, not just cache stat info... so is there a risk with this patch that e.g. "bzr mv foo bar" will go unrecorded because it's under the _worth_saving_limit? I understand the value in not updating the cache for only a small number of changes, but AFAIK we always want to update the dirstate file when actual state is changed. I think the header_too=True flag is used to make sure this works, but that seems like a pretty indirect way of expressing the concept that this is a non-optional update.
I wonder: if you unconditionally set _worth_saving_limit to something very high (e.g. 1000000) does the test suite still pass? My guess is that except perhaps for one or two hyper-sensitive tests it ought to, and it might make for an interesting sanity check.
+ # We only enable smart saving is it hasn't already been disabled
"is" -> "if"
+ if self._use_smart_saving is not False:
+ self._use_smart_saving = True
So if _use_smart_saving is True, set it to True? Is that right? Perhaps you mean "is not None"?
(If you really are just testing for True/False, it's generally a bad idea to test for True or False with "is". Use "if [not] self._use_smart_saving:" instead, which is also easier to read.)
I'm no dirstate expert, so depending on how confident you are with this patch you might like to wait for another review before merging.
Overall, I this change seems ok to me. It deserves careful benchmarking of course, but I know you've been doing that :)
It's interesting to me that _build_line_cache actually re-reads the lines from the file. I had expected that maybe we'd keep the lines around from the original read, but perhaps this is just as good. If we mmap'ed the dirstate then certainly re-reading from the mmap rather than keeping a copy would clearly make sense.
+ :param worth_saving_limit: when the exact number of changed entries
+ is known, only bother saving the dirstate if more than this
+ count of entries have changed. -1 means never save.
I'd personally prefer "None means never save", but that's not a big deal.
I believe the dirstate can record changes to tree state, not just cache stat info... so is there a risk with this patch that e.g. "bzr mv foo bar" will go unrecorded because it's under the _worth_ saving_ limit? I understand the value in not updating the cache for only a small number of changes, but AFAIK we always want to update the dirstate file when actual state is changed. I think the header_too=True flag is used to make sure this works, but that seems like a pretty indirect way of expressing the concept that this is a non-optional update.
I wonder: if you unconditionally set _worth_saving_limit to something very high (e.g. 1000000) does the test suite still pass? My guess is that except perhaps for one or two hyper-sensitive tests it ought to, and it might make for an interesting sanity check.
+ # We only enable smart saving is it hasn't already been disabled
"is" -> "if"
+ if self._use_ smart_saving is not False: smart_saving = True
+ self._use_
So if _use_smart_saving is True, set it to True? Is that right? Perhaps you mean "is not None"?
(If you really are just testing for True/False, it's generally a bad idea to test for True or False with "is". Use "if [not] self._use_ smart_saving: " instead, which is also easier to read.)