-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Ian Clatworthy wrote: > Ian Clatworthy has proposed merging lp:~ian-clatworthy/bzr/faster-log into lp:bzr. > > Requested reviews: > bzr-core (bzr-core) > > This patch makes log --long faster by around 10%. For OpenOffice.org with 260K revisions, the time drops from 50 seconds to 46 seconds. > I'll try to give a bit of a summary as to why I think this is faster, though it would be nice if you would give a similar summary when submitting. 1) Pull things like + levels = rqst.get('levels') + limit = rqst.get('limit') + diff_type = rqst.get('diff_type') Out of the inner loops, and set the variable once at the outside. This also handles stuff like using an if in the inner loop rather than a function call: + if diff_type is None: + diff = None + else: + diff = self._format_diff(rev, rev_id, diff_type) 2) Factor out "custom_properties" into a re-used helper function. 3) Add a new function: format_date_with_offset_in_original_timezone 4) Change log_revision(self, revision) so that rather than calling outfile.write() for every line of the revision, batching it up into a list, and then using ('\n' + indent).join(lines) and writing that out as a single string. IMO the batching of everything for one revision seems just about right. I'm a bit concerned, though, about handling Unicode Author names and UTF-8 'diff' strings. My guess is that this actually breaks it, and we just don't know it yet. Can you please ensure that we have a test with non-ascii file content (even better if it isn't even UTF-8 since we don't assert anything about file content encoding), and then also include a non-ascii Author name. That ensures that we want to output Unicode for the name, and something random for the diff content. 5) I'm a bit curious about this: === modified file 'bzrlib/osutils.py' - --- bzrlib/osutils.py 2009-05-23 04:55:52 +0000 +++ bzrlib/osutils.py 2009-05-25 05:46:01 +0000 @@ -686,6 +686,8 @@ return offset.days * 86400 + offset.seconds weekdays = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'] +_default_format_by_weekday_num = [wd + " %Y-%m-%d %H:%M:%S" for wd in weekdays] + ^- Does this mean that we are skipping localization? So someone with de_DE.UTF-8 will no longer see the German short-names for days of the week? +def format_date_with_offset_in_original_timezone(t, offset=0, + _cache=_offset_cache): + """Return a formatted date string in the original timezone. + + This routine may be faster then format_date. + + :param t: Seconds since the epoch. + :param offset: Timezone offset in seconds east of utc. + """ + if offset is None: + offset = 0 + tt = time.gmtime(t + offset) + date_fmt = _default_format_by_weekday_num[tt[6]] + date_str = time.strftime(date_fmt, tt) + offset_str = _cache.get(offset, None) + if offset_str is None: + offset_str = ' %+03d%02d' % (offset / 3600, (offset / 60) % 60) + _cache[offset] = offset_str + return date_str + offset_str + ^- I wonder if it might be faster to include the offset_str inside the 'date_fmt' string, since it would require one less string combination. That said, I wouldn't lose much sleep over it. Though you've obviously spent a lot of time trying to make it fast. I would also say that if you are going to have "if offset is None:" then I would just set the default as "offset=None" rather than "offset=0". John =:-> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Cygwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkoaYB4ACgkQJdeBCYSNAAMilACeOQcJkl4dNXVOqNw0SAvO33sW PzQAn2wc8dTf/3tVG3C25sEV7UbL3uPu =ax/6 -----END PGP SIGNATURE-----