Bazaar

 === modified file 'Makefile'
 --- Makefile	2009-08-03 20:38:39 +0000
 +++ Makefile	2009-08-27 00:53:27 +0000
@@ -1,4 +1,4 @@
--# Copyright (C) 2005, 2006, 2007, 2008 Canonical Ltd
++# Copyright (C) 2005, 2006, 2007, 2008, 2009 Canonical Ltd
+ #
  # This program is free software; you can redistribute it and/or modify
  # it under the terms of the GNU General Public License as published by
@@ -40,8 +40,6 @@
  check-nodocs: extensions
  	$(PYTHON) -Werror -O ./bzr selftest -1v $(tests)
--	@echo "Running all tests with no locale."
--	LC_CTYPE= LANG=C LC_ALL= ./bzr selftest -1v $(tests) 2>&1 | sed -e 's/^/[ascii] /'
  # Run Python style checker (apt-get install pyflakes)
+ #
 === modified file 'NEWS'
 --- NEWS	2009-08-25 05:32:07 +0000
 +++ NEWS	2009-09-11 14:23:16 +0000
@@ -6,6 +6,309 @@
  .. contents:: List of Releases
     :depth: 1
++In Development
++##############
++
++Compatibility Breaks
++********************
++
++New Features
++************
++
++* Give more control on BZR_PLUGIN_PATH by providing a way to refer to or
++  disable the user, site and core plugin directories.
++  (Vincent Ladeuil, #412930, #316192, #145612)
++
++Bug Fixes
++*********
++
++* Bazaar's native protocol code now correctly handles EINTR, which most
++  noticeably occurs if you break in to the debugger while connected to a
++  bzr+ssh server.  You can now can continue from the debugger (by typing
++  'c') and the process continues.  However, note that pressing C-\ in the
++  shell may still kill the SSH process, which is bug 162509, so you must
++  sent a signal to the bzr process specifically, for example by typing
++  ``kill -QUIT PID`` in another shell.  (Martin Pool, #341535)
++
++* ``bzr check`` in pack-0.92, 1.6 and 1.9 format repositories will no
++  longer report incorrect errors about ``Missing inventory ('TREE_ROOT', ...)``
++  (Robert Collins, #416732)
++
++* ``bzr info -v`` on a 2a format still claimed that it was a "Development
++  format" (John Arbash Meinel, #424392)
++
++* ``bzr merge`` and ``bzr remove-tree`` now requires --force if pending
++  merges are present in the working tree.
++  (Vincent Ladeuil, #426344)
++
++* Clearer message when Bazaar runs out of memory, instead of a ``MemoryError``
++  traceback.  (Martin Pool, #109115)
++
++* Conversion to 2a will create a single pack for all the new revisions (as
++  long as it ran without interruption). This improves both ``bzr upgrade``
++  and ``bzr pull`` or ``bzr merge`` from local branches in older formats.
++  The autopack logic that occurs every 100 revisions during local
++  conversions was not returning that pack's identifier, which resulted in
++  the partial packs created during the conversion not being consolidated
++  at the end of the conversion process. (Robert Collins, #423818)
++
++* Don't restrict the command name used to run the test suite.
++  (Vincent Ladeuil, #419950)
++
++* Fetches from 2a to 2a are now again requested in 'groupcompress' order.
++  Groups that are seen as 'underutilized' will be repacked on-the-fly.
++  This means that when the source is fully packed, there is minimal
++  overhead during the fetch, but if the source is poorly packed the result
++  is a fairly well packed repository (not as good as 'bzr pack' but
++  good-enough.) (Robert Collins, John Arbash Meinel, #402652)
++
++* Network streams now decode adjacent records of the same type into a
++  single stream, reducing layering churn. (Robert Collins)
++
++* Prevent some kinds of incomplete data from being committed to a 2a
++  repository, such as revisions without inventories or inventories without
++  chk_bytes root records.
++  (Andrew Bennetts, #423506)
++
++* Make sure that we unlock the tree if we fail to create a TreeTransform
++  object when doing a merge, and there is limbo, or pending-deletions
++  directory.  (Gary van der Merwe, #427773)
++
++Improvements
++************
++
++Documentation
++*************
++
++* Help on hooks no longer says 'Not deprecated' for hooks that are
++  currently supported. (Ian Clatworthy, #422415)
++
++API Changes
++***********
++
++* ``bzrlib.tests`` now uses ``stopTestRun`` for its ``TestResult``
++  subclasses - the same as python's unittest module. (Robert Collins)
++
++Internals
++*********
++
++* ``BTreeLeafParser.extract_key`` has been tweaked slightly to reduce
++  mallocs while parsing the index (approx 3=>1 mallocs per key read).
++  This results in a 10% speedup while reading an index.
++  (John Arbash Meinel)
++
++* The ``bzrlib.lsprof`` module has a new class ``BzrProfiler`` which makes
++  profiling in some situations like callbacks and generators easier.
++  (Robert Collins)
++
++Testing
++*******
++
++* Passing ``--lsprof-tests -v`` to bzr selftest will cause lsprof output to
++  be output for every test. Note that this is very verbose! (Robert Collins)
++
++* Test parameterisation now does a shallow copy, not a deep copy of the test
++  to be parameterised. This is not expected to break external use of test
++  parameterisation, and is substantially faster. (Robert Collins)
++
++
++bzr 2.0rc2 (not released yet)
++#############################
++
++New Features
++************
++
++* Added post_commit hook for mutable trees. This allows the keywords
++  plugin to expand keywords on files changed by the commit.
++  (Ian Clatworthy, #408841)
++
++Bug Fixes
++*********
++
++* Bazaar's native protocol code now correctly handles EINTR, which most
++  noticeably occurs if you break in to the debugger while connected to a
++  bzr+ssh server.  You can now can continue from the debugger (by typing
++  'c') and the process continues.  However, note that pressing C-\ in the
++  shell may still kill the SSH process, which is bug 162509, so you must
++  sent a signal to the bzr process specifically, for example by typing
++  ``kill -QUIT PID`` in another shell.  (Martin Pool, #341535)
++
++* ``bzr check`` in pack-0.92, 1.6 and 1.9 format repositories will no
++  longer report incorrect errors about ``Missing inventory ('TREE_ROOT', ...)``
++  (Robert Collins, #416732)
++
++* ``bzr info -v`` on a 2a format still claimed that it was a "Development
++  format" (John Arbash Meinel, #424392)
++
++* ``bzr log stacked-branch`` shows the full log including
++  revisions that are in the fallback repository. (Regressed in 2.0rc1).
++  (John Arbash Meinel, #419241)
++
++* Clearer message when Bazaar runs out of memory, instead of a ``MemoryError``
++  traceback.  (Martin Pool, #109115)
++
++* Conversion to 2a will create a single pack for all the new revisions (as
++  long as it ran without interruption). This improves both ``bzr upgrade``
++  and ``bzr pull`` or ``bzr merge`` from local branches in older formats.
++  The autopack logic that occurs every 100 revisions during local
++  conversions was not returning that pack's identifier, which resulted in
++  the partial packs created during the conversion not being consolidated
++  at the end of the conversion process. (Robert Collins, #423818)
++
++* Fetches from 2a to 2a are now again requested in 'groupcompress' order.
++  Groups that are seen as 'underutilized' will be repacked on-the-fly.
++  This means that when the source is fully packed, there is minimal
++  overhead during the fetch, but if the source is poorly packed the result
++  is a fairly well packed repository (not as good as 'bzr pack' but
++  good-enough.) (Robert Collins, John Arbash Meinel, #402652)
++
++* Fix a potential segmentation fault when doing 'log' of a branch that had
++  ghosts in its mainline.  (Evaluating None as a tuple is bad.)
++  (John Arbash Meinel, #419241)
++
++* ``groupcompress`` sort order is now more stable, rather than relying on
++  ``topo_sort`` ordering. The implementation is now
++  ``KnownGraph.gc_sort``. (John Arbash Meinel)
++
++* Local data conversion will generate correct deltas. This is a critical
++  bugfix vs 2.0rc1, and all 2.0rc1 users should upgrade to 2.0rc2 before
++  converting repositories. (Robert Collins, #422849)
++
++* Network streams now decode adjacent records of the same type into a
++  single stream, reducing layering churn. (Robert Collins)
++
++* Prevent some kinds of incomplete data from being committed to a 2a
++  repository, such as revisions without inventories, a missing chk_bytes
++  record for an inventory, or a missing text referenced by an inventory.
++  (Andrew Bennetts, #423506, #406687)
++
++Documentation
++*************
++
++* Fix assertion error about "_remember_remote_is_before" when pushing to
++  older smart servers.
++  (Andrew Bennetts, #418931)
++
++* Help on hooks no longer says 'Not deprecated' for hooks that are
++  currently supported. (Ian Clatworthy, #422415)
++
++* The main table of contents now provides links to the new Migration Docs
++  and Plugins Guide. (Ian Clatworthy)
++
++
++bzr 2.0rc1
++##########
++
++
++:Codename: no worries
++:2.0rc1: 2009-08-26
++
++This release of Bazaar makes 2a 'brisbane-core' format the
++default.  Most of the work in this release now focuses on bug
++fixes and stabilization, covering both 2a and previous formats.
++
++The Bazaar team decided that 2.0 will be a long-term supported
++release, with bugfix-only releases based on it continuing for at
++least six months or until the following stable release (we said
++that previously, but that's worth repeating).
++
++Compatibility Breaks
++********************
++
++* The default format for bzr is now ``2a``. This format brings many
++  significant performance and size improvements. bzr can pull from
++  any existing repository into a ``2a`` one, but can only transfer
++  into ``rich-root`` repositories from ``2a``. The Upgrade guide
++  has more information about this change. (Robert Collins)
++
++* On Windows auto-detection of Putty's plink.exe is disabled.
++  Default SSH client for Windows is paramiko. User still can force
++  usage of plink if explicitly set environment variable BZR_SSH=plink.
++  (#414743, Alexander Belchenko)
++
++New Features
++************
++
++* ``bzr branch --switch`` can now switch the checkout in the current directory
++  to the newly created branch. (Lukáš Lalinský)
++
++Bug Fixes
++*********
++
++* Further tweaks to handling of ``bzr add`` messages about ignored files.
++  (Jason Spashett, #76616)
++
++* Fetches were being requested in 'groupcompress' order, but weren't
++  recombining the groups. Thus they would 'fragment' to get the correct
++  order, but not 'recombine' to actually benefit from it. Until we get
++  recombining to work, switching to 'unordered' fetches avoids the
++  fragmentation. (John Arbash Meinel, #402645)
++
++* Fix a pycurl related test failure on karmic by recognizing an error
++  raised by newer versions of pycurl.
++  (Vincent Ladeuil, #306264)
++
++* Fix a test failure on karmic by making a locale test more robust.
++  (Vincent Ladeuil, #413514)
++
++* Fix IndexError printing CannotBindAddress errors.
++  (Martin Pool, #286871)
++
++* Fix "Revision ... not present" errors when upgrading stacked branches,
++  or when doing fetches from a stacked source to a stacked target.
++  (Andrew Bennetts, #399140)
++
++* ``bzr branch`` of 2a repositories over HTTP is much faster.  bzr now
++  batches together small fetches from 2a repositories, rather than
++  fetching only a few hundred bytes at a time.
++  (Andrew Bennetts, #402657)
++
++Improvements
++************
++
++* A better description of the platform is shown in crash tracebacks, ``bzr
++  --version`` and ``bzr selftest``.
++  (Martin Pool, #409137)
++
++* bzr can now (again) capture crash data through the apport library,
++  so that a single human-readable file can be attached to bug reports.
++  This can be disabled by using ``-Dno_apport`` on the command line, or by
++  putting ``no_apport`` into the ``debug_flags`` section of
++  ``bazaar.conf``.
++  (Martin Pool, Robert Collins, #389328)
++
++* ``bzr push`` locally on windows will no longer give a locking error with
++  dirstate based formats. (Robert Collins)
++
++* ``bzr shelve`` and ``bzr unshelve`` now work on windows.
++  (Robert Collins, #305006)
++
++* Commit of specific files no longer prevents using the the iter_changes
++  codepath. On 2a repositories, commit of specific files should now be as
++  fast, or slightly faster, than a full commit. (Robert Collins)
++
++* The internal core code that handles specific file operations like
++  ``bzr st FILENAME`` or ``bzr commit FILENAME`` has been changed to
++  include the parent directories if they have altered, and when a
++  directory stops being a directory its children are always included. This
++  fixes a number of causes for ``InconsistentDelta`` errors, and permits
++  faster commit of specific paths. (Robert Collins, #347649)
++
++Documentation
++*************
++
++* New developer documentation for content filtering.
++  (Martin Pool)
++
++API Changes
++***********
++
++* ``bzrlib.shelf_ui`` has had the ``from_args`` convenience methods of its
++  classes changed to manage lock lifetime of the trees they open in a way
++  consistent with reader-exclusive locks. (Robert Collins, #305006)
++
++Testing
++*******
  bzr 1.18.1 NOT RELEASED YET
  ###########################
@@ -17,17 +320,144 @@
    conversion will commit too many copies a file.
    (Martin Pool, #415508)
++Improvements
++************
++
++* ``bzr push`` locally on windows will no longer give a locking error with
++  dirstate based formats. (Robert Collins)
++
++* ``bzr shelve`` and ``bzr unshelve`` now work on windows.
++  (Robert Collins, #305006)
++
  API Changes
  ***********
++* ``bzrlib.shelf_ui`` has had the ``from_args`` convenience methods of its
++  classes changed to manage lock lifetime of the trees they open in a way
++  consistent with reader-exclusive locks. (Robert Collins, #305006)
++
  * ``Tree.path_content_summary`` may return a size of None, when called on
    a tree with content filtering where the size of the canonical form
    cannot be cheaply determined.  (Martin Pool)
++* When manually creating transport servers in test cases, a new helper
++  ``TestCase.start_server`` that registers a cleanup and starts the server
++  should be used. (Robert Collins)
  bzr 1.18
  ########
++Compatibility Breaks
++********************
++
++* Committing directly to a stacked branch from a lightweight checkout will
++  no longer work. In previous versions this would appear to work but would
++  generate repositories with insufficient data to create deltas, leading
++  to later errors when branching or reading from the repository.
++  (Robert Collins, bug #375013)
++
++New Features
++************
++
++Bug Fixes
++*********
++
++* Fetching from 2a branches from a version-2 bzr protocol would fail to
++  copy the internal inventory pages from the CHK store. This cannot happen
++  in normal use as all 2a compatible clients and servers support the
++  version-3 protocol, but it does cause test suite failures when testing
++  downlevel protocol behaviour. (Robert Collins)
++
++* Fix a test failure on karmic by making a locale test more robust.
++  (Vincent Ladeuil, #413514)
++
++* Fixed "Pack ... already exists" error when running ``bzr pack`` on a
++  fully packed 2a repository.  (Andrew Bennetts, #382463)
++
++* Further tweaks to handling of ``bzr add`` messages about ignored files.
++  (Jason Spashett, #76616)
++
++* Properly handle fetching into a stacked branch while converting the
++  data, especially when there are also ghosts. The code was filling in
++  parent inventories incorrectly, and also not handling when one of the
++  parents was a ghost. (John Arbash Meinel, #402778, #412198)
++
++* ``RemoteStreamSource.get_stream_for_missing_keys`` will fetch CHK
++  inventory pages when appropriate (by falling back to the vfs stream
++  source).  (Andrew Bennetts, #406686)
++
++* StreamSource generates rich roots from non-rich root sources correctly
++  now.  (Andrew Bennetts, #368921)
++
++* When deciding whether a repository was compatible for upgrading or
++  fetching, we previously incorrectly checked the default repository
++  format for the bzrdir format, rather than the format that was actually
++  present on disk.  (Martin Pool, #408824)
++
++Improvements
++************
++
++* A better description of the platform is shown in crash tracebacks, ``bzr
++  --version`` and ``bzr selftest``.
++  (Martin Pool, #409137)
++
++* Cross-format fetches (such as between 1.9-rich-root and 2a) via the
++  smart server are more efficient now.  They send inventory deltas rather
++  than full inventories.  The smart server has two new requests,
++  ``Repository.get_stream_1.19`` and ``Repository.insert_stream_1.19`` to
++  support this.  (Andrew Bennetts, #374738, #385826)
++
++* Extracting the full ancestry and computing the ``merge_sort`` is now
++  significantly faster. This effects things like ``bzr log -n0``. (For
++  example, ``bzr log -r -10..-1 -n0 bzr.dev`` is 2.5s down to 1.0s.
++  (John Arbash Meinel)
++
++Documentation
++*************
++
++API Changes
++***********
++
++Internals
++*********
++
++* ``-Dstrict_locks`` can now be used to check that read and write locks
++  are treated properly w.r.t. exclusivity. (We don't try to take an OS
++  read lock on a file that we already have an OS write lock on.) This is
++  now set by default for all tests, if you have a test which cannot be
++  fixed, you can use ``self.thisFailsStrictLockCheck()`` as a
++  compatibility knob. (John Arbash Meinel)
++
++* InterDifferingSerializer is now only used locally.  Other fetches that
++  would have used InterDifferingSerializer now use the more network
++  friendly StreamSource, which now automatically does the same
++  transformations as InterDifferingSerializer.  (Andrew Bennetts)
++
++* ``KnownGraph`` now has a ``.topo_sort`` and ``.merge_sort`` member which
++  are implemented in pyrex and significantly faster. This is exposed along
++  with ``CombinedGraphIndex.find_ancestry()`` as
++  ``VersionedFiles.get_known_graph_ancestry(keys)``.
++  (John Arbash Meinel)
++
++* RemoteBranch.open now honours ignore_fallbacks correctly on bzr-v2
++  protocols. (Robert Collins)
++
++* The index code now has some specialized routines to extract the full
++  ancestry of a key in a more efficient manner.
++  ``CombinedGraphIndex.find_ancestry()``. (Time to get ancestry for
++  bzr.dev drops from 1.5s down to 300ms. For OOo from 33s => 10.5s) (John
++  Arbash Meinel)
++
++Testing
++*******
++
++* Install the test ssl certificate and key so that installed bzr
++  can run the https tests. (Denys Duchier, #392401)
++
++
++bzr 1.18rc1
++###########
++
  :Codename: little traveller
  :1.18:    2009-08-20
  :1.18rc1: 2009-08-10
@@ -113,6 +543,9 @@
    running send and similar commands on 2a formats.
    (Martin Pool, #408201)
++* Fix crash in some invocations of ``bzr status`` in format 2a.
++  (Martin Pool, #403523)
++
  * Fixed export to existing directory: if directory is empty then export
    will succeed, otherwise it fails with error.
    (Alexander Belchenko, #406174)
@@ -133,7 +566,9 @@
  * Requests for unknown methods no longer cause the smart server to log
    lots of backtraces about ``UnknownSmartMethod``, ``do_chunk`` or
    ``do_end``.  (Andrew Bennetts, #338561)
--
++
++* Shelve will not shelve the initial add of the tree root.  (Aaron Bentley)
++
  * Streaming from bzr servers where there is a chain of stacked branches
    (A stacked on B stacked on C) will now work. (Robert Collins, #406597)
@@ -245,6 +680,17 @@
    ``countTestsCases``. (Robert Collins)
++bzr 1.17.1 (unreleased)
++#######################
++
++Bug Fixes
++*********
++
++* The optional ``_knit_load_data_pyx`` C extension was never being
++  imported.  This caused significant slowdowns when reading data from
++  knit format repositories.  (Andrew Bennetts, #405653)
++
++
  bzr 1.17 "So late it's brunch" 2009-07-20
  #########################################
  :Codename: so-late-its-brunch
@@ -743,6 +1189,9 @@
  Testing
  *******
++* ``make check`` no longer repeats the test run in ``LANG=C``.
++  (Martin Pool, #386180)
++
  * The number of cores is now correctly detected on OSX. (John Szakmeister)
  * The number of cores is also detected on Solaris and win32. (Vincent Ladeuil)
@@ -4723,7 +5172,7 @@
    checkouts.  (Aaron Bentley, #182040)
  * Stop polluting /tmp when running selftest.
--  (Vincent Ladeuil, #123623)
++  (Vincent Ladeuil, #123363)
  * Switch from NFKC => NFC for normalization checks. NFC allows a few
    more characters which should be considered valid.
 === modified file 'bzr'
 --- bzr	2009-07-30 23:54:26 +0000
 +++ bzr	2009-08-28 05:11:10 +0000
@@ -23,7 +23,7 @@
  import warnings
  # update this on each release
--_script_version = (1, 18, 0)
++_script_version = (2, 1, 0)
  if __doc__ is None:
      print "bzr does not support python -OO."
 === modified file 'bzrlib/__init__.py'
 --- bzrlib/__init__.py	2009-08-20 08:38:09 +0000
 +++ bzrlib/__init__.py	2009-08-30 21:34:42 +0000
@@ -50,7 +50,7 @@
  # Python version 2.0 is (2, 0, 0, 'final', 0)."  Additionally we use a
  # releaselevel of 'dev' for unreleased under-development code.
--version_info = (1, 18, 0, 'final', 0)
++version_info = (2, 1, 0, 'dev', 0)
  # API compatibility version: bzrlib is currently API compatible with 1.15.
  api_minimum_version = (1, 17, 0)
@@ -70,6 +70,8 @@
 .2dev
      >>> print _format_version_tuple((1, 1, 1, 'candidate', 2))
 .1.1rc2
++    >>> print bzrlib._format_version_tuple((2, 1, 0, 'beta', 1))
++    2.1b1
      >>> print _format_version_tuple((1, 4, 0))
 .4
      >>> print _format_version_tuple((1, 4))
 === modified file 'bzrlib/_btree_serializer_pyx.pyx'
 --- bzrlib/_btree_serializer_pyx.pyx	2009-06-22 12:52:39 +0000
 +++ bzrlib/_btree_serializer_pyx.pyx	2009-09-04 21:16:14 +0000
@@ -1,4 +1,4 @@
--# Copyright (C) 2008 Canonical Ltd
++# Copyright (C) 2008, 2009 Canonical Ltd
+ #
  # This program is free software; you can redistribute it and/or modify
  # it under the terms of the GNU General Public License as published by
@@ -41,8 +41,11 @@
      int PyString_AsStringAndSize_ptr(PyObject *, char **buf, Py_ssize_t *len)
      void PyString_InternInPlace(PyObject **)
      int PyTuple_CheckExact(object t)
++    object PyTuple_New(Py_ssize_t n_entries)
++    void PyTuple_SET_ITEM(object, Py_ssize_t offset, object) # steals the ref
      Py_ssize_t PyTuple_GET_SIZE(object t)
      PyObject *PyTuple_GET_ITEM_ptr_object "PyTuple_GET_ITEM" (object tpl, int index)
++    void Py_INCREF(object)
      void Py_DECREF_ptr "Py_DECREF" (PyObject *)
  cdef extern from "string.h":
@@ -140,14 +143,12 @@
          cdef char *temp_ptr
          cdef int loop_counter
          # keys are tuples
--        loop_counter = 0
--        key_segments = []
--        while loop_counter < self.key_length:
--            loop_counter = loop_counter + 1
++        key = PyTuple_New(self.key_length)
++        for loop_counter from 0 <= loop_counter < self.key_length:
              # grab a key segment
              temp_ptr = <char*>memchr(self._start, c'\0', last - self._start)
              if temp_ptr == NULL:
--                if loop_counter == self.key_length:
++                if loop_counter + 1 == self.key_length:
                      # capture to last
                      temp_ptr = last
                  else:
@@ -164,8 +165,9 @@
                                                           temp_ptr - self._start)
              # advance our pointer
              self._start = temp_ptr + 1
--            PyList_Append(key_segments, key_element)
--        return tuple(key_segments)
++            Py_INCREF(key_element)
++            PyTuple_SET_ITEM(key, loop_counter, key_element)
++        return key
      cdef int process_line(self) except -1:
          """Process a line in the bytes."""
 === modified file 'bzrlib/_dirstate_helpers_pyx.pyx'
 --- bzrlib/_dirstate_helpers_pyx.pyx	2009-07-27 05:44:19 +0000
 +++ bzrlib/_dirstate_helpers_pyx.pyx	2009-08-28 05:00:33 +0000
@@ -28,7 +28,7 @@
  from bzrlib import cache_utf8, errors, osutils
  from bzrlib.dirstate import DirState
--from bzrlib.osutils import pathjoin, splitpath
++from bzrlib.osutils import parent_directories, pathjoin, splitpath
  # This is the Windows equivalent of ENOTDIR
@@ -963,15 +963,21 @@
  cdef class ProcessEntryC:
++    cdef int doing_consistency_expansion
      cdef object old_dirname_to_file_id # dict
      cdef object new_dirname_to_file_id # dict
      cdef object last_source_parent
      cdef object last_target_parent
--    cdef object include_unchanged
++    cdef int include_unchanged
++    cdef int partial
      cdef object use_filesystem_for_exec
      cdef object utf8_decode
      cdef readonly object searched_specific_files
++    cdef readonly object searched_exact_paths
      cdef object search_specific_files
++    # The parents up to the root of the paths we are searching.
++    # After all normal paths are returned, these specific items are returned.
++    cdef object search_specific_file_parents
      cdef object state
      # Current iteration variables:
      cdef object current_root
@@ -989,31 +995,48 @@
      cdef object current_block_list
      cdef object current_dir_info
      cdef object current_dir_list
++    cdef object _pending_consistent_entries # list
      cdef int path_index
      cdef object root_dir_info
      cdef object bisect_left
      cdef object pathjoin
      cdef object fstat
++    # A set of the ids we've output when doing partial output.
++    cdef object seen_ids
      cdef object sha_file
      def __init__(self, include_unchanged, use_filesystem_for_exec,
          search_specific_files, state, source_index, target_index,
          want_unversioned, tree):
++        self.doing_consistency_expansion = 0
          self.old_dirname_to_file_id = {}
          self.new_dirname_to_file_id = {}
++        # Are we doing a partial iter_changes?
++        self.partial = set(['']).__ne__(search_specific_files)
          # Using a list so that we can access the values and change them in
          # nested scope. Each one is [path, file_id, entry]
          self.last_source_parent = [None, None]
          self.last_target_parent = [None, None]
--        self.include_unchanged = include_unchanged
++        if include_unchanged is None:
++            self.include_unchanged = False
++        else:
++            self.include_unchanged = int(include_unchanged)
          self.use_filesystem_for_exec = use_filesystem_for_exec
          self.utf8_decode = cache_utf8._utf8_decode
          # for all search_indexs in each path at or under each element of
--        # search_specific_files, if the detail is relocated: add the id, and add the
--        # relocated path as one to search if its not searched already. If the
--        # detail is not relocated, add the id.
++        # search_specific_files, if the detail is relocated: add the id, and
++        # add the relocated path as one to search if its not searched already.
++        # If the detail is not relocated, add the id.
          self.searched_specific_files = set()
++        # When we search exact paths without expanding downwards, we record
++        # that here.
++        self.searched_exact_paths = set()
          self.search_specific_files = search_specific_files
++        # The parents up to the root of the paths we are searching.
++        # After all normal paths are returned, these specific items are returned.
++        self.search_specific_file_parents = set()
++        # The ids we've sent out in the delta.
++        self.seen_ids = set()
          self.state = state
          self.current_root = None
          self.current_root_unicode = None
@@ -1035,26 +1058,30 @@
          self.current_block_pos = -1
          self.current_dir_info = None
          self.current_dir_list = None
++        self._pending_consistent_entries = []
          self.path_index = 0
          self.root_dir_info = None
          self.bisect_left = bisect.bisect_left
          self.pathjoin = osutils.pathjoin
          self.fstat = os.fstat
          self.sha_file = osutils.sha_file
++        if target_index != 0:
++            # A lot of code in here depends on target_index == 0
++            raise errors.BzrError('unsupported target index')
      cdef _process_entry(self, entry, path_info):
          """Compare an entry and real disk to generate delta information.
          :param path_info: top_relpath, basename, kind, lstat, abspath for
--            the path of entry. If None, then the path is considered absent.
--            (Perhaps we should pass in a concrete entry for this ?)
++            the path of entry. If None, then the path is considered absent in
++            the target (Perhaps we should pass in a concrete entry for this ?)
              Basename is returned as a utf8 string because we expect this
              tuple will be ignored, and don't want to take the time to
              decode.
          :return: (iter_changes_result, changed). If the entry has not been
              handled then changed is None. Otherwise it is False if no content
--            or metadata changes have occured, and None if any content or
--            metadata change has occured. If self.include_unchanged is True then
++            or metadata changes have occured, and True if any content or
++            metadata change has occurred. If self.include_unchanged is True then
              if changed is not None, iter_changes_result will always be a result
              tuple. Otherwise, iter_changes_result is None unless changed is
              True.
@@ -1099,9 +1126,12 @@
              else:
                  # add the source to the search path to find any children it
                  # has.  TODO ? : only add if it is a container ?
--                if not osutils.is_inside_any(self.searched_specific_files,
--                                             source_details[1]):
++                if (not self.doing_consistency_expansion and
++                    not osutils.is_inside_any(self.searched_specific_files,
++                                             source_details[1])):
                      self.search_specific_files.add(source_details[1])
++                    # expanding from a user requested path, parent expansion
++                    # for delta consistency happens later.
                  # generate the old path; this is needed for stating later
                  # as well.
                  old_path = source_details[1]
@@ -1180,7 +1210,8 @@
                      file_id = entry[0][2]
                  self.old_dirname_to_file_id[old_path] = file_id
              # parent id is the entry for the path in the target tree
--            if old_dirname == self.last_source_parent[0]:
++            if old_basename and old_dirname == self.last_source_parent[0]:
++                # use a cached hit for non-root source entries.
                  source_parent_id = self.last_source_parent[1]
              else:
                  try:
@@ -1196,7 +1227,8 @@
                      self.last_source_parent[0] = old_dirname
                      self.last_source_parent[1] = source_parent_id
              new_dirname = entry[0][0]
--            if new_dirname == self.last_target_parent[0]:
++            if entry[0][1] and new_dirname == self.last_target_parent[0]:
++                # use a cached hit for non-root target entries.
                  target_parent_id = self.last_target_parent[1]
              else:
                  try:
@@ -1313,8 +1345,13 @@
              # a renamed parent. TODO: handle this efficiently. Its not
              # common case to rename dirs though, so a correct but slow
              # implementation will do.
--            if not osutils.is_inside_any(self.searched_specific_files, target_details[1]):
++            if (not self.doing_consistency_expansion and
++                not osutils.is_inside_any(self.searched_specific_files,
++                    target_details[1])):
                  self.search_specific_files.add(target_details[1])
++                # We don't expand the specific files parents list here as
++                # the path is absent in target and won't create a delta with
++                # missing parent.
          elif ((source_minikind == c'r' or source_minikind == c'a') and
                (target_minikind == c'r' or target_minikind == c'a')):
              # neither of the selected trees contain this path,
@@ -1334,6 +1371,25 @@
      def iter_changes(self):
          return self
++    cdef void _gather_result_for_consistency(self, result):
++        """Check a result we will yield to make sure we are consistent later.
++
++        This gathers result's parents into a set to output later.
++
++        :param result: A result tuple.
++        """
++        if not self.partial or not result[0]:
++            return
++        self.seen_ids.add(result[0])
++        new_path = result[1][1]
++        if new_path:
++            # Not the root and not a delete: queue up the parents of the path.
++            self.search_specific_file_parents.update(
++                osutils.parent_directories(new_path.encode('utf8')))
++            # Add the root directory which parent_directories does not
++            # provide.
++            self.search_specific_file_parents.add('')
++
      cdef void _update_current_block(self):
          if (self.block_index < len(self.state._dirblocks) and
              osutils.is_inside(self.current_root, self.state._dirblocks[self.block_index][0])):
@@ -1406,8 +1462,11 @@
              entry = self.root_entries[self.root_entries_pos]
              self.root_entries_pos = self.root_entries_pos + 1
              result, changed = self._process_entry(entry, self.root_dir_info)
--            if changed is not None and changed or self.include_unchanged:
--                return result
++            if changed is not None:
++                if changed:
++                    self._gather_result_for_consistency(result)
++                if changed or self.include_unchanged:
++                    return result
          # Have we finished the prior root, or never started one ?
          if self.current_root is None:
              # TODO: the pending list should be lexically sorted?  the
@@ -1416,12 +1475,12 @@
                  self.current_root = self.search_specific_files.pop()
              except KeyError:
                  raise StopIteration()
--            self.current_root_unicode = self.current_root.decode('utf8')
              self.searched_specific_files.add(self.current_root)
              # process the entries for this containing directory: the rest will be
              # found by their parents recursively.
              self.root_entries = self.state._entries_for_path(self.current_root)
              self.root_entries_len = len(self.root_entries)
++            self.current_root_unicode = self.current_root.decode('utf8')
              self.root_abspath = self.tree.abspath(self.current_root_unicode)
              try:
                  root_stat = os.lstat(self.root_abspath)
@@ -1458,6 +1517,8 @@
                  result, changed = self._process_entry(entry, self.root_dir_info)
                  if changed is not None:
                      path_handled = -1
++                    if changed:
++                        self._gather_result_for_consistency(result)
                      if changed or self.include_unchanged:
                          return result
              # handle unversioned specified paths:
@@ -1476,7 +1537,8 @@
+                       )
              # If we reach here, the outer flow continues, which enters into the
              # per-root setup logic.
--        if self.current_dir_info is None and self.current_block is None:
++        if (self.current_dir_info is None and self.current_block is None and not
++            self.doing_consistency_expansion):
              # setup iteration of this root:
              self.current_dir_list = None
              if self.root_dir_info and self.root_dir_info[2] == 'tree-reference':
@@ -1606,6 +1668,8 @@
                          # advance the entry only, after processing.
                          result, changed = self._process_entry(current_entry, None)
                          if changed is not None:
++                            if changed:
++                                self._gather_result_for_consistency(result)
                              if changed or self.include_unchanged:
                                  return result
                      self.block_index = self.block_index + 1
@@ -1618,6 +1682,15 @@
              # More supplied paths to process
              self.current_root = None
              return self._iter_next()
++        # Start expanding more conservatively, adding paths the user may not
++        # have intended but required for consistent deltas.
++        self.doing_consistency_expansion = 1
++        if not self._pending_consistent_entries:
++            self._pending_consistent_entries = self._next_consistent_entries()
++        while self._pending_consistent_entries:
++            result, changed = self._pending_consistent_entries.pop()
++            if changed is not None:
++                return result
          raise StopIteration()
      cdef object _maybe_tree_ref(self, current_path_info):
@@ -1705,6 +1778,8 @@
                              current_path_info)
                          if changed is not None:
                              path_handled = -1
++                            if not changed and not self.include_unchanged:
++                                changed = None
                  # >- loop control starts here:
                  # >- entry
                  if advance_entry and current_entry is not None:
@@ -1726,7 +1801,7 @@
                              except UnicodeDecodeError:
                                  raise errors.BadFilenameEncoding(
                                      current_path_info[0], osutils._fs_enc)
--                            if result is not None:
++                            if changed is not None:
                                  raise AssertionError(
                                      "result is not None: %r" % result)
                              result = (None,
@@ -1737,6 +1812,7 @@
                                  (None, self.utf8_decode(current_path_info[1])[0]),
                                  (None, current_path_info[2]),
                                  (None, new_executable))
++                            changed = True
                          # dont descend into this unversioned path if it is
                          # a dir
                          if current_path_info[2] in ('directory'):
@@ -1755,9 +1831,12 @@
                                  current_path_info)
                      else:
                          current_path_info = None
--                if result is not None:
++                if changed is not None:
                      # Found a result on this pass, yield it
--                    return result
++                    if changed:
++                        self._gather_result_for_consistency(result)
++                    if changed or self.include_unchanged:
++                        return result
              if self.current_block is not None:
                  self.block_index = self.block_index + 1
                  self._update_current_block()
@@ -1769,3 +1848,123 @@
                      self.current_dir_list = self.current_dir_info[1]
                  except StopIteration:
                      self.current_dir_info = None
++
++    cdef object _next_consistent_entries(self):
++        """Grabs the next specific file parent case to consider.
++
++        :return: A list of the results, each of which is as for _process_entry.
++        """
++        results = []
++        while self.search_specific_file_parents:
++            # Process the parent directories for the paths we were iterating.
++            # Even in extremely large trees this should be modest, so currently
++            # no attempt is made to optimise.
++            path_utf8 = self.search_specific_file_parents.pop()
++            if path_utf8 in self.searched_exact_paths:
++                # We've examined this path.
++                continue
++            if osutils.is_inside_any(self.searched_specific_files, path_utf8):
++                # We've examined this path.
++                continue
++            path_entries = self.state._entries_for_path(path_utf8)
++            # We need either one or two entries. If the path in
++            # self.target_index has moved (so the entry in source_index is in
++            # 'ar') then we need to also look for the entry for this path in
++            # self.source_index, to output the appropriate delete-or-rename.
++            selected_entries = []
++            found_item = False
++            for candidate_entry in path_entries:
++                # Find entries present in target at this path:
++                if candidate_entry[1][self.target_index][0] not in 'ar':
++                    found_item = True
++                    selected_entries.append(candidate_entry)
++                # Find entries present in source at this path:
++                elif (self.source_index is not None and
++                    candidate_entry[1][self.source_index][0] not in 'ar'):
++                    found_item = True
++                    if candidate_entry[1][self.target_index][0] == 'a':
++                        # Deleted, emit it here.
++                        selected_entries.append(candidate_entry)
++                    else:
++                        # renamed, emit it when we process the directory it
++                        # ended up at.
++                        self.search_specific_file_parents.add(
++                            candidate_entry[1][self.target_index][1])
++            if not found_item:
++                raise AssertionError(
++                    "Missing entry for specific path parent %r, %r" % (
++                    path_utf8, path_entries))
++            path_info = self._path_info(path_utf8, path_utf8.decode('utf8'))
++            for entry in selected_entries:
++                if entry[0][2] in self.seen_ids:
++                    continue
++                result, changed = self._process_entry(entry, path_info)
++                if changed is None:
++                    raise AssertionError(
++                        "Got entry<->path mismatch for specific path "
++                        "%r entry %r path_info %r " % (
++                        path_utf8, entry, path_info))
++                # Only include changes - we're outside the users requested
++                # expansion.
++                if changed:
++                    self._gather_result_for_consistency(result)
++                    if (result[6][0] == 'directory' and
++                        result[6][1] != 'directory'):
++                        # This stopped being a directory, the old children have
++                        # to be included.
++                        if entry[1][self.source_index][0] == 'r':
++                            # renamed, take the source path
++                            entry_path_utf8 = entry[1][self.source_index][1]
++                        else:
++                            entry_path_utf8 = path_utf8
++                        initial_key = (entry_path_utf8, '', '')
++                        block_index, _ = self.state._find_block_index_from_key(
++                            initial_key)
++                        if block_index == 0:
++                            # The children of the root are in block index 1.
++                            block_index = block_index + 1
++                        current_block = None
++                        if block_index < len(self.state._dirblocks):
++                            current_block = self.state._dirblocks[block_index]
++                            if not osutils.is_inside(
++                                entry_path_utf8, current_block[0]):
++                                # No entries for this directory at all.
++                                current_block = None
++                        if current_block is not None:
++                            for entry in current_block[1]:
++                                if entry[1][self.source_index][0] in 'ar':
++                                    # Not in the source tree, so doesn't have to be
++                                    # included.
++                                    continue
++                                # Path of the entry itself.
++                                self.search_specific_file_parents.add(
++                                    self.pathjoin(*entry[0][:2]))
++                if changed or self.include_unchanged:
++                    results.append((result, changed))
++            self.searched_exact_paths.add(path_utf8)
++        return results
++
++    cdef object _path_info(self, utf8_path, unicode_path):
++        """Generate path_info for unicode_path.
++
++        :return: None if unicode_path does not exist, or a path_info tuple.
++        """
++        abspath = self.tree.abspath(unicode_path)
++        try:
++            stat = os.lstat(abspath)
++        except OSError, e:
++            if e.errno == errno.ENOENT:
++                # the path does not exist.
++                return None
++            else:
++                raise
++        utf8_basename = utf8_path.rsplit('/', 1)[-1]
++        dir_info = (utf8_path, utf8_basename,
++            osutils.file_kind_from_stat_mode(stat.st_mode), stat,
++            abspath)
++        if dir_info[2] == 'directory':
++            if self.tree._directory_is_tree_reference(
++                unicode_path):
++                self.root_dir_info = self.root_dir_info[:2] + \
++                    ('tree-reference',) + self.root_dir_info[3:]
++        return dir_info
 === modified file 'bzrlib/_known_graph_py.py'
 --- bzrlib/_known_graph_py.py	2009-07-08 20:58:10 +0000
 +++ bzrlib/_known_graph_py.py	2009-09-07 14:19:05 +0000
@@ -18,6 +18,7 @@
  """
  from bzrlib import (
++    errors,
      revision,
+     )
@@ -40,6 +41,18 @@
              self.parent_keys, self.child_keys)
++class _MergeSortNode(object):
++    """Information about a specific node in the merge graph."""
++
++    __slots__ = ('key', 'merge_depth', 'revno', 'end_of_merge')
++
++    def __init__(self, key, merge_depth, revno, end_of_merge):
++        self.key = key
++        self.merge_depth = merge_depth
++        self.revno = revno
++        self.end_of_merge = end_of_merge
++
++
  class KnownGraph(object):
      """This is a class which assumes we already know the full graph."""
@@ -84,6 +97,10 @@
          return [node for node in self._nodes.itervalues()
                  if not node.parent_keys]
++    def _find_tips(self):
++        return [node for node in self._nodes.itervalues()
++                      if not node.child_keys]
++
      def _find_gdfo(self):
          nodes = self._nodes
          known_parent_gdfos = {}
@@ -171,3 +188,119 @@
              self._known_heads[heads_key] = heads
          return heads
++    def topo_sort(self):
++        """Return the nodes in topological order.
++
++        All parents must occur before all children.
++        """
++        for node in self._nodes.itervalues():
++            if node.gdfo is None:
++                raise errors.GraphCycleError(self._nodes)
++        pending = self._find_tails()
++        pending_pop = pending.pop
++        pending_append = pending.append
++
++        topo_order = []
++        topo_order_append = topo_order.append
++
++        num_seen_parents = dict.fromkeys(self._nodes, 0)
++        while pending:
++            node = pending_pop()
++            if node.parent_keys is not None:
++                # We don't include ghost parents
++                topo_order_append(node.key)
++            for child_key in node.child_keys:
++                child_node = self._nodes[child_key]
++                seen_parents = num_seen_parents[child_key] + 1
++                if seen_parents == len(child_node.parent_keys):
++                    # All parents have been processed, enqueue this child
++                    pending_append(child_node)
++                    # This has been queued up, stop tracking it
++                    del num_seen_parents[child_key]
++                else:
++                    num_seen_parents[child_key] = seen_parents
++        # We started from the parents, so we don't need to do anymore work
++        return topo_order
++
++    def gc_sort(self):
++        """Return a reverse topological ordering which is 'stable'.
++
++        There are a few constraints:
++          1) Reverse topological (all children before all parents)
++          2) Grouped by prefix
++          3) 'stable' sorting, so that we get the same result, independent of
++             machine, or extra data.
++        To do this, we use the same basic algorithm as topo_sort, but when we
++        aren't sure what node to access next, we sort them lexicographically.
++        """
++        tips = self._find_tips()
++        # Split the tips based on prefix
++        prefix_tips = {}
++        for node in tips:
++            if node.key.__class__ is str or len(node.key) == 1:
++                prefix = ''
++            else:
++                prefix = node.key[0]
++            prefix_tips.setdefault(prefix, []).append(node)
++
++        num_seen_children = dict.fromkeys(self._nodes, 0)
++
++        result = []
++        for prefix in sorted(prefix_tips):
++            pending = sorted(prefix_tips[prefix], key=lambda n:n.key,
++                             reverse=True)
++            while pending:
++                node = pending.pop()
++                if node.parent_keys is None:
++                    # Ghost node, skip it
++                    continue
++                result.append(node.key)
++                for parent_key in sorted(node.parent_keys, reverse=True):
++                    parent_node = self._nodes[parent_key]
++                    seen_children = num_seen_children[parent_key] + 1
++                    if seen_children == len(parent_node.child_keys):
++                        # All children have been processed, enqueue this parent
++                        pending.append(parent_node)
++                        # This has been queued up, stop tracking it
++                        del num_seen_children[parent_key]
++                    else:
++                        num_seen_children[parent_key] = seen_children
++        return result
++
++    def merge_sort(self, tip_key):
++        """Compute the merge sorted graph output."""
++        from bzrlib import tsort
++        as_parent_map = dict((node.key, node.parent_keys)
++                             for node in self._nodes.itervalues()
++                              if node.parent_keys is not None)
++        # We intentionally always generate revnos and never force the
++        # mainline_revisions
++        # Strip the sequence_number that merge_sort generates
++        return [_MergeSortNode(key, merge_depth, revno, end_of_merge)
++                for _, key, merge_depth, revno, end_of_merge
++                 in tsort.merge_sort(as_parent_map, tip_key,
++                                     mainline_revisions=None,
++                                     generate_revno=True)]
++
++    def get_parent_keys(self, key):
++        """Get the parents for a key
++
++        Returns a list containg the parents keys. If the key is a ghost,
++        None is returned. A KeyError will be raised if the key is not in
++        the graph.
++
++        :param keys: Key to check (eg revision_id)
++        :return: A list of parents
++        """
++        return self._nodes[key].parent_keys
++
++    def get_child_keys(self, key):
++        """Get the children for a key
++
++        Returns a list containg the children keys. A KeyError will be raised
++        if the key is not in the graph.
++
++        :param keys: Key to check (eg revision_id)
++        :return: A list of children
++        """
++        return self._nodes[key].child_keys
 === modified file 'bzrlib/_known_graph_pyx.pyx'
 --- bzrlib/_known_graph_pyx.pyx	2009-07-14 16:10:32 +0000
 +++ bzrlib/_known_graph_pyx.pyx	2009-09-07 14:19:05 +0000
@@ -25,11 +25,18 @@
      ctypedef struct PyObject:
          pass
++    int PyString_CheckExact(object)
++
++    int PyObject_RichCompareBool(object, object, int)
++    int Py_LT
++
++    int PyTuple_CheckExact(object)
      object PyTuple_New(Py_ssize_t n)
      Py_ssize_t PyTuple_GET_SIZE(object t)
      PyObject * PyTuple_GET_ITEM(object t, Py_ssize_t o)
      void PyTuple_SET_ITEM(object t, Py_ssize_t o, object v)
++    int PyList_CheckExact(object)
      Py_ssize_t PyList_GET_SIZE(object l)
      PyObject * PyList_GET_ITEM(object l, Py_ssize_t o)
      int PyList_SetItem(object l, Py_ssize_t o, object l) except -1
@@ -44,8 +51,9 @@
      void Py_INCREF(object)
++import gc
--from bzrlib import revision
++from bzrlib import errors, revision
  cdef object NULL_REVISION
  NULL_REVISION = revision.NULL_REVISION
@@ -59,10 +67,9 @@
      cdef object children
      cdef public long gdfo
      cdef int seen
++    cdef object extra
      def __init__(self, key):
--        cdef int i
--
          self.key = key
          self.parents = None
@@ -70,6 +77,7 @@
          # Greatest distance from origin
          self.gdfo = -1
          self.seen = 0
++        self.extra = None
      property child_keys:
          def __get__(self):
@@ -80,6 +88,18 @@
                  PyList_Append(keys, child.key)
              return keys
++    property parent_keys:
++        def __get__(self):
++            if self.parents is None:
++                return None
++
++            cdef _KnownGraphNode parent
++
++            keys = []
++            for parent in self.parents:
++                PyList_Append(keys, parent.key)
++            return keys
++
      cdef clear_references(self):
          self.parents = None
          self.children = None
@@ -107,17 +127,66 @@
      return <_KnownGraphNode>temp_node
--cdef _KnownGraphNode _get_parent(parents, Py_ssize_t pos):
++cdef _KnownGraphNode _get_tuple_node(tpl, Py_ssize_t pos):
      cdef PyObject *temp_node
--    cdef _KnownGraphNode node
--    temp_node = PyTuple_GET_ITEM(parents, pos)
++    temp_node = PyTuple_GET_ITEM(tpl, pos)
      return <_KnownGraphNode>temp_node
--# TODO: slab allocate all _KnownGraphNode objects.
--#       We already know how many we are going to need, except for a couple of
--#       ghosts that could be allocated on demand.
++def get_key(node):
++    cdef _KnownGraphNode real_node
++    real_node = node
++    return real_node.key
++
++
++cdef object _sort_list_nodes(object lst_or_tpl, int reverse):
++    """Sort a list of _KnownGraphNode objects.
++
++    If lst_or_tpl is a list, it is allowed to mutate in place. It may also
++    just return the input list if everything is already sorted.
++    """
++    cdef _KnownGraphNode node1, node2
++    cdef int do_swap, is_tuple
++    cdef Py_ssize_t length
++
++    is_tuple = PyTuple_CheckExact(lst_or_tpl)
++    if not (is_tuple or PyList_CheckExact(lst_or_tpl)):
++        raise TypeError('lst_or_tpl must be a list or tuple.')
++    length = len(lst_or_tpl)
++    if length == 0 or length == 1:
++        return lst_or_tpl
++    if length == 2:
++        if is_tuple:
++            node1 = _get_tuple_node(lst_or_tpl, 0)
++            node2 = _get_tuple_node(lst_or_tpl, 1)
++        else:
++            node1 = _get_list_node(lst_or_tpl, 0)
++            node2 = _get_list_node(lst_or_tpl, 1)
++        if reverse:
++            do_swap = PyObject_RichCompareBool(node1.key, node2.key, Py_LT)
++        else:
++            do_swap = PyObject_RichCompareBool(node2.key, node1.key, Py_LT)
++        if not do_swap:
++            return lst_or_tpl
++        if is_tuple:
++            return (node2, node1)
++        else:
++            # Swap 'in-place', since lists are mutable
++            Py_INCREF(node1)
++            PyList_SetItem(lst_or_tpl, 1, node1)
++            Py_INCREF(node2)
++            PyList_SetItem(lst_or_tpl, 0, node2)
++            return lst_or_tpl
++    # For all other sizes, we just use 'sorted()'
++    if is_tuple:
++        # Note that sorted() is just list(iterable).sort()
++        lst_or_tpl = list(lst_or_tpl)
++    lst_or_tpl.sort(key=get_key, reverse=reverse)
++    return lst_or_tpl
++
++
++cdef class _MergeSorter
  cdef class KnownGraph:
      """This is a class which assumes we already know the full graph."""
@@ -136,6 +205,9 @@
          # Maps {sorted(revision_id, revision_id): heads}
          self._known_heads = {}
          self.do_cache = int(do_cache)
++        # TODO: consider disabling gc since we are allocating a lot of nodes
++        #       that won't be collectable anyway. real world testing has not
++        #       shown a specific impact, yet.
          self._initialize_nodes(parent_map)
          self._find_gdfo()
@@ -183,11 +255,16 @@
              parent_keys = <object>temp_parent_keys
              num_parent_keys = len(parent_keys)
              node = self._get_or_create_node(key)
--            # We know how many parents, so we could pre allocate an exact sized
--            # tuple here
++            # We know how many parents, so we pre allocate the tuple
              parent_nodes = PyTuple_New(num_parent_keys)
--            # We use iter here, because parent_keys maybe be a list or tuple
              for pos2 from 0 <= pos2 < num_parent_keys:
++                # Note: it costs us 10ms out of 40ms to lookup all of these
++                #       parents, it doesn't seem to be an allocation overhead,
++                #       but rather a lookup overhead. There doesn't seem to be
++                #       a way around it, and that is one reason why
++                #       KnownGraphNode maintains a direct pointer to the parent
++                #       node.
++                # We use [] because parent_keys may be a tuple or list
                  parent_node = self._get_or_create_node(parent_keys[pos2])
                  # PyTuple_SET_ITEM will steal a reference, so INCREF first
                  Py_INCREF(parent_node)
@@ -209,6 +286,19 @@
                  PyList_Append(tails, node)
          return tails
++    def _find_tips(self):
++        cdef PyObject *temp_node
++        cdef _KnownGraphNode node
++        cdef Py_ssize_t pos
++
++        tips = []
++        pos = 0
++        while PyDict_Next(self._nodes, &pos, NULL, &temp_node):
++            node = <_KnownGraphNode>temp_node
++            if PyList_GET_SIZE(node.children) == 0:
++                PyList_Append(tips, node)
++        return tips
++
      def _find_gdfo(self):
          cdef _KnownGraphNode node
          cdef _KnownGraphNode child
@@ -315,7 +405,7 @@
                  continue
              if node.parents is not None and PyTuple_GET_SIZE(node.parents) > 0:
                  for pos from 0 <= pos < PyTuple_GET_SIZE(node.parents):
--                    parent_node = _get_parent(node.parents, pos)
++                    parent_node = _get_tuple_node(node.parents, pos)
                      last_item = last_item + 1
                      if last_item < PyList_GET_SIZE(pending):
                          Py_INCREF(parent_node) # SetItem steals a ref
@@ -335,3 +425,454 @@
          if self.do_cache:
              PyDict_SetItem(self._known_heads, heads_key, heads)
          return heads
++
++    def topo_sort(self):
++        """Return the nodes in topological order.
++
++        All parents must occur before all children.
++        """
++        # This is, for the most part, the same iteration order that we used for
++        # _find_gdfo, consider finding a way to remove the duplication
++        # In general, we find the 'tails' (nodes with no parents), and then
++        # walk to the children. For children that have all of their parents
++        # yielded, we queue up the child to be yielded as well.
++        cdef _KnownGraphNode node
++        cdef _KnownGraphNode child
++        cdef PyObject *temp
++        cdef Py_ssize_t pos
++        cdef int replace
++        cdef Py_ssize_t last_item
++
++        pending = self._find_tails()
++        if PyList_GET_SIZE(pending) == 0 and len(self._nodes) > 0:
++            raise errors.GraphCycleError(self._nodes)
++
++        topo_order = []
++
++        last_item = PyList_GET_SIZE(pending) - 1
++        while last_item >= 0:
++            # Avoid pop followed by push, instead, peek, and replace
++            # timing shows this is 930ms => 770ms for OOo
++            node = _get_list_node(pending, last_item)
++            last_item = last_item - 1
++            if node.parents is not None:
++                # We don't include ghost parents
++                PyList_Append(topo_order, node.key)
++            for pos from 0 <= pos < PyList_GET_SIZE(node.children):
++                child = _get_list_node(node.children, pos)
++                if child.gdfo == -1:
++                    # We know we have a graph cycle because a node has a parent
++                    # which we couldn't find
++                    raise errors.GraphCycleError(self._nodes)
++                child.seen = child.seen + 1
++                if child.seen == PyTuple_GET_SIZE(child.parents):
++                    # All parents of this child have been yielded, queue this
++                    # one to be yielded as well
++                    last_item = last_item + 1
++                    if last_item < PyList_GET_SIZE(pending):
++                        Py_INCREF(child) # SetItem steals a ref
++                        PyList_SetItem(pending, last_item, child)
++                    else:
++                        PyList_Append(pending, child)
++                    # We have queued this node, we don't need to track it
++                    # anymore
++                    child.seen = 0
++        # We started from the parents, so we don't need to do anymore work
++        return topo_order
++
++    def gc_sort(self):
++        """Return a reverse topological ordering which is 'stable'.
++
++        There are a few constraints:
++          1) Reverse topological (all children before all parents)
++          2) Grouped by prefix
++          3) 'stable' sorting, so that we get the same result, independent of
++             machine, or extra data.
++        To do this, we use the same basic algorithm as topo_sort, but when we
++        aren't sure what node to access next, we sort them lexicographically.
++        """
++        cdef PyObject *temp
++        cdef Py_ssize_t pos, last_item
++        cdef _KnownGraphNode node, node2, parent_node
++
++        tips = self._find_tips()
++        # Split the tips based on prefix
++        prefix_tips = {}
++        for pos from 0 <= pos < PyList_GET_SIZE(tips):
++            node = _get_list_node(tips, pos)
++            if PyString_CheckExact(node.key) or len(node.key) == 1:
++                prefix = ''
++            else:
++                prefix = node.key[0]
++            temp = PyDict_GetItem(prefix_tips, prefix)
++            if temp == NULL:
++                prefix_tips[prefix] = [node]
++            else:
++                tip_nodes = <object>temp
++                PyList_Append(tip_nodes, node)
++
++        result = []
++        for prefix in sorted(prefix_tips):
++            temp = PyDict_GetItem(prefix_tips, prefix)
++            assert temp != NULL
++            tip_nodes = <object>temp
++            pending = _sort_list_nodes(tip_nodes, 1)
++            last_item = PyList_GET_SIZE(pending) - 1
++            while last_item >= 0:
++                node = _get_list_node(pending, last_item)
++                last_item = last_item - 1
++                if node.parents is None:
++                    # Ghost
++                    continue
++                PyList_Append(result, node.key)
++                # Sorting the parent keys isn't strictly necessary for stable
++                # sorting of a given graph. But it does help minimize the
++                # differences between graphs
++                # For bzr.dev ancestry:
++                #   4.73ms  no sort
++                #   7.73ms  RichCompareBool sort
++                parents = _sort_list_nodes(node.parents, 1)
++                for pos from 0 <= pos < len(parents):
++                    if PyTuple_CheckExact(parents):
++                        parent_node = _get_tuple_node(parents, pos)
++                    else:
++                        parent_node = _get_list_node(parents, pos)
++                    # TODO: GraphCycle detection
++                    parent_node.seen = parent_node.seen + 1
++                    if (parent_node.seen
++                        == PyList_GET_SIZE(parent_node.children)):
++                        # All children have been processed, queue up this
++                        # parent
++                        last_item = last_item + 1
++                        if last_item < PyList_GET_SIZE(pending):
++                            Py_INCREF(parent_node) # SetItem steals a ref
++                            PyList_SetItem(pending, last_item, parent_node)
++                        else:
++                            PyList_Append(pending, parent_node)
++                        parent_node.seen = 0
++        return result
++
++    def merge_sort(self, tip_key):
++        """Compute the merge sorted graph output."""
++        cdef _MergeSorter sorter
++
++        # TODO: consider disabling gc since we are allocating a lot of nodes
++        #       that won't be collectable anyway. real world testing has not
++        #       shown a specific impact, yet.
++        sorter = _MergeSorter(self, tip_key)
++        return sorter.topo_order()
++
++    def get_parent_keys(self, key):
++        """Get the parents for a key
++
++        Returns a list containg the parents keys. If the key is a ghost,
++        None is returned. A KeyError will be raised if the key is not in
++        the graph.
++
++        :param keys: Key to check (eg revision_id)
++        :return: A list of parents
++        """
++        return self._nodes[key].parent_keys
++
++    def get_child_keys(self, key):
++        """Get the children for a key
++
++        Returns a list containg the children keys. A KeyError will be raised
++        if the key is not in the graph.
++
++        :param keys: Key to check (eg revision_id)
++        :return: A list of children
++        """
++        return self._nodes[key].child_keys
++
++
++cdef class _MergeSortNode:
++    """Tracks information about a node during the merge_sort operation."""
++
++    # Public api
++    cdef public object key
++    cdef public long merge_depth
++    cdef public object end_of_merge # True/False Is this the end of the current merge
++
++    # Private api, used while computing the information
++    cdef _KnownGraphNode left_parent
++    cdef _KnownGraphNode left_pending_parent
++    cdef object pending_parents # list of _KnownGraphNode for non-left parents
++    cdef long _revno_first
++    cdef long _revno_second
++    cdef long _revno_last
++    # TODO: turn these into flag/bit fields rather than individual members
++    cdef int is_first_child # Is this the first child?
++    cdef int seen_by_child # A child node has seen this parent
++    cdef int completed # Fully Processed
++
++    def __init__(self, key):
++        self.key = key
++        self.merge_depth = -1
++        self.left_parent = None
++        self.left_pending_parent = None
++        self.pending_parents = None
++        self._revno_first = -1
++        self._revno_second = -1
++        self._revno_last = -1
++        self.is_first_child = 0
++        self.seen_by_child = 0
++        self.completed = 0
++
++    def __repr__(self):
++        return '%s(%s depth:%s rev:%s,%s,%s first:%s seen:%s)' % (
++            self.__class__.__name__, self.key,
++            self.merge_depth,
++            self._revno_first, self._revno_second, self._revno_last,
++            self.is_first_child, self.seen_by_child)
++
++    cdef int has_pending_parents(self):
++        if self.left_pending_parent is not None or self.pending_parents:
++            return 1
++        return 0
++
++    cdef object _revno(self):
++        if self._revno_first == -1:
++            if self._revno_second != -1:
++                raise RuntimeError('Something wrong with: %s' % (self,))
++            return (self._revno_last,)
++        else:
++            return (self._revno_first, self._revno_second, self._revno_last)
++
++    property revno:
++        def __get__(self):
++            return self._revno()
++
++
++cdef class _MergeSorter:
++    """This class does the work of computing the merge_sort ordering.
++
++    We have some small advantages, in that we get all the extra information
++    that KnownGraph knows, like knowing the child lists, etc.
++    """
++
++    # Current performance numbers for merge_sort(bzr_dev_parent_map):
++    #  302ms tsort.merge_sort()
++    #   91ms graph.KnownGraph().merge_sort()
++    #   40ms kg.merge_sort()
++
++    cdef KnownGraph graph
++    cdef object _depth_first_stack  # list
++    cdef Py_ssize_t _last_stack_item # offset to last item on stack
++    # cdef object _ms_nodes # dict of key => _MergeSortNode
++    cdef object _revno_to_branch_count # {revno => num child branches}
++    cdef object _scheduled_nodes # List of nodes ready to be yielded
++
++    def __init__(self, known_graph, tip_key):
++        cdef _KnownGraphNode node
++
++        self.graph = known_graph
++        # self._ms_nodes = {}
++        self._revno_to_branch_count = {}
++        self._depth_first_stack = []
++        self._last_stack_item = -1
++        self._scheduled_nodes = []
++        if (tip_key is not None and tip_key != NULL_REVISION
++            and tip_key != (NULL_REVISION,)):
++            node = self.graph._nodes[tip_key]
++            self._push_node(node, 0)
++
++    cdef _MergeSortNode _get_ms_node(self, _KnownGraphNode node):
++        cdef PyObject *temp_node
++        cdef _MergeSortNode ms_node
++
++        if node.extra is None:
++            ms_node = _MergeSortNode(node.key)
++            node.extra = ms_node
++        else:
++            ms_node = <_MergeSortNode>node.extra
++        return ms_node
++
++    cdef _push_node(self, _KnownGraphNode node, long merge_depth):
++        cdef _KnownGraphNode parent_node
++        cdef _MergeSortNode ms_node, ms_parent_node
++        cdef Py_ssize_t pos
++
++        ms_node = self._get_ms_node(node)
++        ms_node.merge_depth = merge_depth
++        if node.parents is None:
++            raise RuntimeError('ghost nodes should not be pushed'
++                               ' onto the stack: %s' % (node,))
++        if PyTuple_GET_SIZE(node.parents) > 0:
++            parent_node = _get_tuple_node(node.parents, 0)
++            ms_node.left_parent = parent_node
++            if parent_node.parents is None: # left-hand ghost
++                ms_node.left_pending_parent = None
++                ms_node.left_parent = None
++            else:
++                ms_node.left_pending_parent = parent_node
++        if PyTuple_GET_SIZE(node.parents) > 1:
++            ms_node.pending_parents = []
++            for pos from 1 <= pos < PyTuple_GET_SIZE(node.parents):
++                parent_node = _get_tuple_node(node.parents, pos)
++                if parent_node.parents is None: # ghost
++                    continue
++                PyList_Append(ms_node.pending_parents, parent_node)
++
++        ms_node.is_first_child = 1
++        if ms_node.left_parent is not None:
++            ms_parent_node = self._get_ms_node(ms_node.left_parent)
++            if ms_parent_node.seen_by_child:
++                ms_node.is_first_child = 0
++            ms_parent_node.seen_by_child = 1
++        self._last_stack_item = self._last_stack_item + 1
++        if self._last_stack_item < PyList_GET_SIZE(self._depth_first_stack):
++            Py_INCREF(node) # SetItem steals a ref
++            PyList_SetItem(self._depth_first_stack, self._last_stack_item,
++                           node)
++        else:
++            PyList_Append(self._depth_first_stack, node)
++
++    cdef _pop_node(self):
++        cdef PyObject *temp
++        cdef _MergeSortNode ms_node, ms_parent_node, ms_prev_node
++        cdef _KnownGraphNode node, parent_node, prev_node
++
++        node = _get_list_node(self._depth_first_stack, self._last_stack_item)
++        ms_node = <_MergeSortNode>node.extra
++        self._last_stack_item = self._last_stack_item - 1
++        if ms_node.left_parent is not None:
++            # Assign the revision number from the left-hand parent
++            ms_parent_node = <_MergeSortNode>ms_node.left_parent.extra
++            if ms_node.is_first_child:
++                # First child just increments the final digit
++                ms_node._revno_first = ms_parent_node._revno_first
++                ms_node._revno_second = ms_parent_node._revno_second
++                ms_node._revno_last = ms_parent_node._revno_last + 1
++            else:
++                # Not the first child, make a new branch
++                #  (mainline_revno, branch_count, 1)
++                if ms_parent_node._revno_first == -1:
++                    # Mainline ancestor, the increment is on the last digit
++                    base_revno = ms_parent_node._revno_last
++                else:
++                    base_revno = ms_parent_node._revno_first
++                temp = PyDict_GetItem(self._revno_to_branch_count,
++                                      base_revno)
++                if temp == NULL:
++                    branch_count = 1
++                else:
++                    branch_count = (<object>temp) + 1
++                PyDict_SetItem(self._revno_to_branch_count, base_revno,
++                               branch_count)
++                ms_node._revno_first = base_revno
++                ms_node._revno_second = branch_count
++                ms_node._revno_last = 1
++        else:
++            temp = PyDict_GetItem(self._revno_to_branch_count, 0)
++            if temp == NULL:
++                # The first root node doesn't have a 3-digit revno
++                root_count = 0
++                ms_node._revno_first = -1
++                ms_node._revno_second = -1
++                ms_node._revno_last = 1
++            else:
++                root_count = (<object>temp) + 1
++                ms_node._revno_first = 0
++                ms_node._revno_second = root_count
++                ms_node._revno_last = 1
++            PyDict_SetItem(self._revno_to_branch_count, 0, root_count)
++        ms_node.completed = 1
++        if PyList_GET_SIZE(self._scheduled_nodes) == 0:
++            # The first scheduled node is always the end of merge
++            ms_node.end_of_merge = True
++        else:
++            prev_node = _get_list_node(self._scheduled_nodes,
++                                    PyList_GET_SIZE(self._scheduled_nodes) - 1)
++            ms_prev_node = <_MergeSortNode>prev_node.extra
++            if ms_prev_node.merge_depth < ms_node.merge_depth:
++                # The previously pushed node is to our left, so this is the end
++                # of this right-hand chain
++                ms_node.end_of_merge = True
++            elif (ms_prev_node.merge_depth == ms_node.merge_depth
++                  and prev_node not in node.parents):
++                # The next node is not a direct parent of this node
++                ms_node.end_of_merge = True
++            else:
++                ms_node.end_of_merge = False
++        PyList_Append(self._scheduled_nodes, node)
++
++    cdef _schedule_stack(self):
++        cdef _KnownGraphNode last_node, next_node
++        cdef _MergeSortNode ms_node, ms_last_node, ms_next_node
++        cdef long next_merge_depth
++        ordered = []
++        while self._last_stack_item >= 0:
++            # Peek at the last item on the stack
++            last_node = _get_list_node(self._depth_first_stack,
++                                       self._last_stack_item)
++            if last_node.gdfo == -1:
++                # if _find_gdfo skipped a node, that means there is a graph
++                # cycle, error out now
++                raise errors.GraphCycleError(self.graph._nodes)
++            ms_last_node = <_MergeSortNode>last_node.extra
++            if not ms_last_node.has_pending_parents():
++                # Processed all parents, pop this node
++                self._pop_node()
++                continue
++            while ms_last_node.has_pending_parents():
++                if ms_last_node.left_pending_parent is not None:
++                    # recurse depth first into the primary parent
++                    next_node = ms_last_node.left_pending_parent
++                    ms_last_node.left_pending_parent = None
++                else:
++                    # place any merges in right-to-left order for scheduling
++                    # which gives us left-to-right order after we reverse
++                    # the scheduled queue.
++                    # Note: This has the effect of allocating common-new
++                    #       revisions to the right-most subtree rather than the
++                    #       left most, which will display nicely (you get
++                    #       smaller trees at the top of the combined merge).
++                    next_node = ms_last_node.pending_parents.pop()
++                ms_next_node = self._get_ms_node(next_node)
++                if ms_next_node.completed:
++                    # this parent was completed by a child on the
++                    # call stack. skip it.
++                    continue
++                # otherwise transfer it from the source graph into the
++                # top of the current depth first search stack.
++
++                if next_node is ms_last_node.left_parent:
++                    next_merge_depth = ms_last_node.merge_depth
++                else:
++                    next_merge_depth = ms_last_node.merge_depth + 1
++                self._push_node(next_node, next_merge_depth)
++                # and do not continue processing parents until this 'call'
++                # has recursed.
++                break
++
++    cdef topo_order(self):
++        cdef _MergeSortNode ms_node
++        cdef _KnownGraphNode node
++        cdef Py_ssize_t pos
++        cdef PyObject *temp_key, *temp_node
++
++        # Note: allocating a _MergeSortNode and deallocating it for all nodes
++        #       costs approx 8.52ms (21%) of the total runtime
++        #       We might consider moving the attributes into the base
++        #       KnownGraph object.
++        self._schedule_stack()
++
++        # We've set up the basic schedule, now we can continue processing the
++        # output.
++        # Note: This final loop costs us 40.0ms => 28.8ms (11ms, 25%) on
++        #       bzr.dev, to convert the internal Object representation into a
++        #       Tuple representation...
++        #       2ms is walking the data and computing revno tuples
++        #       7ms is computing the return tuple
++        #       4ms is PyList_Append()
++        ordered = []
++        # output the result in reverse order, and separate the generated info
++        for pos from PyList_GET_SIZE(self._scheduled_nodes) > pos >= 0:
++            node = _get_list_node(self._scheduled_nodes, pos)
++            ms_node = <_MergeSortNode>node.extra
++            PyList_Append(ordered, ms_node)
++            node.extra = None
++        # Clear out the scheduled nodes now that we're done
++        self._scheduled_nodes = []
++        return ordered
 === modified file 'bzrlib/annotate.py'
 --- bzrlib/annotate.py	2009-07-08 17:09:03 +0000
 +++ bzrlib/annotate.py	2009-08-17 18:52:01 +0000
@@ -188,6 +188,10 @@
          # or something.
          last_revision = current_rev.revision_id
          # XXX: Partially Cloned from branch, uses the old_get_graph, eep.
++        # XXX: The main difficulty is that we need to inject a single new node
++        #      (current_rev) into the graph before it gets numbered, etc.
++        #      Once KnownGraph gets an 'add_node()' function, we can use
++        #      VF.get_known_graph_ancestry().
          graph = repository.get_graph()
          revision_graph = dict(((key, value) for key, value in
              graph.iter_ancestry(current_rev.parent_ids) if value is not None))
 === modified file 'bzrlib/branch.py'
 --- bzrlib/branch.py	2009-08-04 04:36:34 +0000
 +++ bzrlib/branch.py	2009-08-19 18:04:49 +0000
@@ -446,15 +446,11 @@
          # start_revision_id.
          if self._merge_sorted_revisions_cache is None:
              last_revision = self.last_revision()
--            graph = self.repository.get_graph()
--            parent_map = dict(((key, value) for key, value in
--                     graph.iter_ancestry([last_revision]) if value is not None))
--            revision_graph = repository._strip_NULL_ghosts(parent_map)
--            revs = tsort.merge_sort(revision_graph, last_revision, None,
--                generate_revno=True)
--            # Drop the sequence # before caching
--            self._merge_sorted_revisions_cache = [r[1:] for r in revs]
--
++            last_key = (last_revision,)
++            known_graph = self.repository.revisions.get_known_graph_ancestry(
++                [last_key])
++            self._merge_sorted_revisions_cache = known_graph.merge_sort(
++                last_key)
          filtered = self._filter_merge_sorted_revisions(
              self._merge_sorted_revisions_cache, start_revision_id,
              stop_revision_id, stop_rule)
@@ -470,27 +466,34 @@
          """Iterate over an inclusive range of sorted revisions."""
          rev_iter = iter(merge_sorted_revisions)
          if start_revision_id is not None:
--            for rev_id, depth, revno, end_of_merge in rev_iter:
++            for node in rev_iter:
++                rev_id = node.key[-1]
                  if rev_id != start_revision_id:
                      continue
                  else:
                      # The decision to include the start or not
                      # depends on the stop_rule if a stop is provided
--                    rev_iter = chain(
--                        iter([(rev_id, depth, revno, end_of_merge)]),
--                        rev_iter)
++                    # so pop this node back into the iterator
++                    rev_iter = chain(iter([node]), rev_iter)
                      break
          if stop_revision_id is None:
--            for rev_id, depth, revno, end_of_merge in rev_iter:
--                yield rev_id, depth, revno, end_of_merge
++            # Yield everything
++            for node in rev_iter:
++                rev_id = node.key[-1]
++                yield (rev_id, node.merge_depth, node.revno,
++                       node.end_of_merge)
          elif stop_rule == 'exclude':
--            for rev_id, depth, revno, end_of_merge in rev_iter:
++            for node in rev_iter:
++                rev_id = node.key[-1]
                  if rev_id == stop_revision_id:
                      return
--                yield rev_id, depth, revno, end_of_merge
++                yield (rev_id, node.merge_depth, node.revno,
++                       node.end_of_merge)
          elif stop_rule == 'include':
--            for rev_id, depth, revno, end_of_merge in rev_iter:
--                yield rev_id, depth, revno, end_of_merge
++            for node in rev_iter:
++                rev_id = node.key[-1]
++                yield (rev_id, node.merge_depth, node.revno,
++                       node.end_of_merge)
                  if rev_id == stop_revision_id:
                      return
          elif stop_rule == 'with-merges':
@@ -499,10 +502,12 @@
                  left_parent = stop_rev.parent_ids[0]
              else:
                  left_parent = _mod_revision.NULL_REVISION
--            for rev_id, depth, revno, end_of_merge in rev_iter:
++            for node in rev_iter:
++                rev_id = node.key[-1]
                  if rev_id == left_parent:
                      return
--                yield rev_id, depth, revno, end_of_merge
++                yield (rev_id, node.merge_depth, node.revno,
++                       node.end_of_merge)
          else:
              raise ValueError('invalid stop_rule %r' % stop_rule)
@@ -1147,6 +1152,9 @@
          revision_id: if not None, the revision history in the new branch will
                       be truncated to end with revision_id.
          """
++        if (repository_policy is not None and
++            repository_policy.requires_stacking()):
++            to_bzrdir._format.require_stacking(_skip_repo=True)
          result = to_bzrdir.create_branch()
          result.lock_write()
          try:
@@ -2064,7 +2072,7 @@
  BranchFormat.register_format(__format6)
  BranchFormat.register_format(__format7)
  BranchFormat.register_format(__format8)
--BranchFormat.set_default_format(__format6)
++BranchFormat.set_default_format(__format7)
  _legacy_formats = [BzrBranchFormat4(),
+     ]
  network_format_registry.register(
 === modified file 'bzrlib/btree_index.py'
 --- bzrlib/btree_index.py	2009-07-01 10:51:47 +0000
 +++ bzrlib/btree_index.py	2009-08-17 22:11:06 +0000
@@ -586,13 +586,19 @@
  class _LeafNode(object):
      """A leaf node for a serialised B+Tree index."""
--    __slots__ = ('keys',)
++    __slots__ = ('keys', 'min_key', 'max_key')
      def __init__(self, bytes, key_length, ref_list_length):
          """Parse bytes to create a leaf node object."""
          # splitlines mangles the \r delimiters.. don't use it.
--        self.keys = dict(_btree_serializer._parse_leaf_lines(bytes,
--            key_length, ref_list_length))
++        key_list = _btree_serializer._parse_leaf_lines(bytes,
++            key_length, ref_list_length)
++        if key_list:
++            self.min_key = key_list[0][0]
++            self.max_key = key_list[-1][0]
++        else:
++            self.min_key = self.max_key = None
++        self.keys = dict(key_list)
  class _InternalNode(object):
@@ -1039,6 +1045,39 @@
              output.append(cur_out)
          return output
++    def _walk_through_internal_nodes(self, keys):
++        """Take the given set of keys, and find the corresponding LeafNodes.
++
++        :param keys: An unsorted iterable of keys to search for
++        :return: (nodes, index_and_keys)
++            nodes is a dict mapping {index: LeafNode}
++            keys_at_index is a list of tuples of [(index, [keys for Leaf])]
++        """
++        # 6 seconds spent in miss_torture using the sorted() line.
++        # Even with out of order disk IO it seems faster not to sort it when
++        # large queries are being made.
++        keys_at_index = [(0, sorted(keys))]
++
++        for row_pos, next_row_start in enumerate(self._row_offsets[1:-1]):
++            node_indexes = [idx for idx, s_keys in keys_at_index]
++            nodes = self._get_internal_nodes(node_indexes)
++
++            next_nodes_and_keys = []
++            for node_index, sub_keys in keys_at_index:
++                node = nodes[node_index]
++                positions = self._multi_bisect_right(sub_keys, node.keys)
++                node_offset = next_row_start + node.offset
++                next_nodes_and_keys.extend([(node_offset + pos, s_keys)
++                                           for pos, s_keys in positions])
++            keys_at_index = next_nodes_and_keys
++        # We should now be at the _LeafNodes
++        node_indexes = [idx for idx, s_keys in keys_at_index]
++
++        # TODO: We may *not* want to always read all the nodes in one
++        #       big go. Consider setting a max size on this.
++        nodes = self._get_leaf_nodes(node_indexes)
++        return nodes, keys_at_index
++
      def iter_entries(self, keys):
          """Iterate over keys within the index.
@@ -1082,32 +1121,7 @@
          needed_keys = keys
          if not needed_keys:
              return
--        # 6 seconds spent in miss_torture using the sorted() line.
--        # Even with out of order disk IO it seems faster not to sort it when
--        # large queries are being made.
--        needed_keys = sorted(needed_keys)
--
--        nodes_and_keys = [(0, needed_keys)]
--
--        for row_pos, next_row_start in enumerate(self._row_offsets[1:-1]):
--            node_indexes = [idx for idx, s_keys in nodes_and_keys]
--            nodes = self._get_internal_nodes(node_indexes)
--
--            next_nodes_and_keys = []
--            for node_index, sub_keys in nodes_and_keys:
--                node = nodes[node_index]
--                positions = self._multi_bisect_right(sub_keys, node.keys)
--                node_offset = next_row_start + node.offset
--                next_nodes_and_keys.extend([(node_offset + pos, s_keys)
--                                           for pos, s_keys in positions])
--            nodes_and_keys = next_nodes_and_keys
--        # We should now be at the _LeafNodes
--        node_indexes = [idx for idx, s_keys in nodes_and_keys]
--
--        # TODO: We may *not* want to always read all the nodes in one
--        #       big go. Consider setting a max size on this.
--
--        nodes = self._get_leaf_nodes(node_indexes)
++        nodes, nodes_and_keys = self._walk_through_internal_nodes(needed_keys)
          for node_index, sub_keys in nodes_and_keys:
              if not sub_keys:
                  continue
@@ -1120,6 +1134,133 @@
                      else:
                          yield (self, next_sub_key, value)
++    def _find_ancestors(self, keys, ref_list_num, parent_map, missing_keys):
++        """Find the parent_map information for the set of keys.
++
++        This populates the parent_map dict and missing_keys set based on the
++        queried keys. It also can fill out an arbitrary number of parents that
++        it finds while searching for the supplied keys.
++
++        It is unlikely that you want to call this directly. See
++        "CombinedGraphIndex.find_ancestry()" for a more appropriate API.
++
++        :param keys: A keys whose ancestry we want to return
++            Every key will either end up in 'parent_map' or 'missing_keys'.
++        :param ref_list_num: This index in the ref_lists is the parents we
++            care about.
++        :param parent_map: {key: parent_keys} for keys that are present in this
++            index. This may contain more entries than were in 'keys', that are
++            reachable ancestors of the keys requested.
++        :param missing_keys: keys which are known to be missing in this index.
++            This may include parents that were not directly requested, but we
++            were able to determine that they are not present in this index.
++        :return: search_keys    parents that were found but not queried to know
++            if they are missing or present. Callers can re-query this index for
++            those keys, and they will be placed into parent_map or missing_keys
++        """
++        if not self.key_count():
++            # We use key_count() to trigger reading the root node and
++            # determining info about this BTreeGraphIndex
++            # If we don't have any keys, then everything is missing
++            missing_keys.update(keys)
++            return set()
++        if ref_list_num >= self.node_ref_lists:
++            raise ValueError('No ref list %d, index has %d ref lists'
++                % (ref_list_num, self.node_ref_lists))
++
++        # The main trick we are trying to accomplish is that when we find a
++        # key listing its parents, we expect that the parent key is also likely
++        # to sit on the same page. Allowing us to expand parents quickly
++        # without suffering the full stack of bisecting, etc.
++        nodes, nodes_and_keys = self._walk_through_internal_nodes(keys)
++
++        # These are parent keys which could not be immediately resolved on the
++        # page where the child was present. Note that we may already be
++        # searching for that key, and it may actually be present [or known
++        # missing] on one of the other pages we are reading.
++        # TODO:
++        #   We could try searching for them in the immediate previous or next
++        #   page. If they occur "later" we could put them in a pending lookup
++        #   set, and then for each node we read thereafter we could check to
++        #   see if they are present.
++        #   However, we don't know the impact of keeping this list of things
++        #   that I'm going to search for every node I come across from here on
++        #   out.
++        #   It doesn't handle the case when the parent key is missing on a
++        #   page that we *don't* read. So we already have to handle being
++        #   re-entrant for that.
++        #   Since most keys contain a date string, they are more likely to be
++        #   found earlier in the file than later, but we would know that right
++        #   away (key < min_key), and wouldn't keep searching it on every other
++        #   page that we read.
++        #   Mostly, it is an idea, one which should be benchmarked.
++        parents_not_on_page = set()
++
++        for node_index, sub_keys in nodes_and_keys:
++            if not sub_keys:
++                continue
++            # sub_keys is all of the keys we are looking for that should exist
++            # on this page, if they aren't here, then they won't be found
++            node = nodes[node_index]
++            node_keys = node.keys
++            parents_to_check = set()
++            for next_sub_key in sub_keys:
++                if next_sub_key not in node_keys:
++                    # This one is just not present in the index at all
++                    missing_keys.add(next_sub_key)
++                else:
++                    value, refs = node_keys[next_sub_key]
++                    parent_keys = refs[ref_list_num]
++                    parent_map[next_sub_key] = parent_keys
++                    parents_to_check.update(parent_keys)
++            # Don't look for things we've already found
++            parents_to_check = parents_to_check.difference(parent_map)
++            # this can be used to test the benefit of having the check loop
++            # inlined.
++            # parents_not_on_page.update(parents_to_check)
++            # continue
++            while parents_to_check:
++                next_parents_to_check = set()
++                for key in parents_to_check:
++                    if key in node_keys:
++                        value, refs = node_keys[key]
++                        parent_keys = refs[ref_list_num]
++                        parent_map[key] = parent_keys
++                        next_parents_to_check.update(parent_keys)
++                    else:
++                        # This parent either is genuinely missing, or should be
++                        # found on another page. Perf test whether it is better
++                        # to check if this node should fit on this page or not.
++                        # in the 'everything-in-one-pack' scenario, this *not*
++                        # doing the check is 237ms vs 243ms.
++                        # So slightly better, but I assume the standard 'lots
++                        # of packs' is going to show a reasonable improvement
++                        # from the check, because it avoids 'going around
++                        # again' for everything that is in another index
++                        # parents_not_on_page.add(key)
++                        # Missing for some reason
++                        if key < node.min_key:
++                            # in the case of bzr.dev, 3.4k/5.3k misses are
++                            # 'earlier' misses (65%)
++                            parents_not_on_page.add(key)
++                        elif key > node.max_key:
++                            # This parent key would be present on a different
++                            # LeafNode
++                            parents_not_on_page.add(key)
++                        else:
++                            # assert key != node.min_key and key != node.max_key
++                            # If it was going to be present, it would be on
++                            # *this* page, so mark it missing.
++                            missing_keys.add(key)
++                parents_to_check = next_parents_to_check.difference(parent_map)
++                # Might want to do another .difference() from missing_keys
++        # parents_not_on_page could have been found on a different page, or be
++        # known to be missing. So cull out everything that has already been
++        # found.
++        search_keys = parents_not_on_page.difference(
++            parent_map).difference(missing_keys)
++        return search_keys
++
      def iter_entries_prefix(self, keys):
          """Iterate over keys within the index using prefix matching.
 === modified file 'bzrlib/builtins.py'
 --- bzrlib/builtins.py	2009-07-27 06:22:57 +0000
 +++ bzrlib/builtins.py	2009-09-08 16:45:11 +0000
@@ -120,6 +120,15 @@
  def _get_one_revision_tree(command_name, revisions, branch=None, tree=None):
++    """Get a revision tree. Not suitable for commands that change the tree.
++
++    Specifically, the basis tree in dirstate trees is coupled to the dirstate
++    and doing a commit/uncommit/pull will at best fail due to changing the
++    basis revision data.
++
++    If tree is passed in, it should be already locked, for lifetime management
++    of the trees internal cached state.
++    """
      if branch is None:
          branch = tree.branch
      if revisions is None:
@@ -452,8 +461,8 @@
              raise errors.BzrCommandError("You cannot remove the working tree"
                                           " of a remote path")
          if not force:
--            # XXX: What about pending merges ? -- vila 20090629
--            if working.has_changes(working.basis_tree()):
++            if (working.has_changes(working.basis_tree())
++                or len(working.get_parent_ids()) > 1):
                  raise errors.UncommittedChanges(working)
          working_path = working.bzrdir.root_transport.base
@@ -603,6 +612,9 @@
      branches that will be merged later (without showing the two different
      adds as a conflict). It is also useful when merging another project
      into a subdirectory of this one.
++
++    Any files matching patterns in the ignore list will not be added
++    unless they are explicitly mentioned.
      """
      takes_args = ['file*']
      takes_options = [
@@ -616,7 +628,7 @@
                 help='Lookup file ids from this tree.'),
+         ]
      encoding_type = 'replace'
--    _see_also = ['remove']
++    _see_also = ['remove', 'ignore']
      def run(self, file_list, no_recurse=False, dry_run=False, verbose=False,
              file_ids_from=None):
@@ -654,14 +666,6 @@
                      for path in ignored[glob]:
                          self.outf.write("ignored %s matching \"%s\"\n"
                                          % (path, glob))
--            else:
--                match_len = 0
--                for glob, paths in ignored.items():
--                    match_len += len(paths)
--                self.outf.write("ignored %d file(s).\n" % match_len)
--            self.outf.write("If you wish to add ignored files, "
--                            "please add them explicitly by name. "
--                            "(\"bzr ignored\" gives a list)\n")
  class cmd_mkdir(Command):
@@ -1172,6 +1176,9 @@
          help='Hard-link working tree files where possible.'),
          Option('no-tree',
              help="Create a branch without a working-tree."),
++        Option('switch',
++            help="Switch the checkout in the current directory "
++                 "to the new branch."),
          Option('stacked',
              help='Create a stacked branch referring to the source branch. '
                  'The new branch will depend on the availability of the source '
@@ -1188,9 +1195,9 @@
      def run(self, from_location, to_location=None, revision=None,
              hardlink=False, stacked=False, standalone=False, no_tree=False,
--            use_existing_dir=False):
++            use_existing_dir=False, switch=False):
++        from bzrlib import switch as _mod_switch
          from bzrlib.tag import _merge_tags_if_possible
--
          accelerator_tree, br_from = bzrdir.BzrDir.open_tree_or_branch(
              from_location)
          if (accelerator_tree is not None and
@@ -1250,6 +1257,12 @@
              except (errors.NotStacked, errors.UnstackableBranchFormat,
                  errors.UnstackableRepositoryFormat), e:
                  note('Branched %d revision(s).' % branch.revno())
++            if switch:
++                # Switch to the new branch
++                wt, _ = WorkingTree.open_containing('.')
++                _mod_switch.switch(wt.bzrdir, branch)
++                note('Switched to branch: %s',
++                    urlutils.unescape_for_display(branch.base, 'utf-8'))
          finally:
              br_from.unlock()
@@ -3025,6 +3038,10 @@
                  raise errors.BzrCommandError("empty commit message specified")
              return my_message
++        # The API permits a commit with a filter of [] to mean 'select nothing'
++        # but the command line should not do that.
++        if not selected_list:
++            selected_list = None
          try:
              tree.commit(message_callback=get_message,
                          specific_files=selected_list,
@@ -3365,6 +3382,8 @@
                       Option('lsprof-timed',
                              help='Generate lsprof output for benchmarked'
                                   ' sections of code.'),
++                     Option('lsprof-tests',
++                            help='Generate lsprof output for each test.'),
                       Option('cache-dir', type=str,
                              help='Cache intermediate benchmark output in this '
                                   'directory.'),
@@ -3411,7 +3430,7 @@
              first=False, list_only=False,
              randomize=None, exclude=None, strict=False,
              load_list=None, debugflag=None, starting_with=None, subunit=False,
--            parallel=None):
++            parallel=None, lsprof_tests=False):
          from bzrlib.tests import selftest
          import bzrlib.benchmarks as benchmarks
          from bzrlib.benchmarks import tree_creator
@@ -3451,6 +3470,7 @@
                                "transport": transport,
                                "test_suite_factory": test_suite_factory,
                                "lsprof_timed": lsprof_timed,
++                              "lsprof_tests": lsprof_tests,
                                "bench_history": benchfile,
                                "matching_tests_first": first,
                                "list_only": list_only,
@@ -3633,13 +3653,14 @@
          verified = 'inapplicable'
          tree = WorkingTree.open_containing(directory)[0]
--        # die as quickly as possible if there are uncommitted changes
          try:
              basis_tree = tree.revision_tree(tree.last_revision())
          except errors.NoSuchRevision:
              basis_tree = tree.basis_tree()
++
++        # die as quickly as possible if there are uncommitted changes
          if not force:
--            if tree.has_changes(basis_tree):
++            if tree.has_changes(basis_tree) or len(tree.get_parent_ids()) > 1:
                  raise errors.UncommittedChanges(tree)
          view_info = _get_view_info_for_change_reporter(tree)
@@ -5627,8 +5648,12 @@
          if writer is None:
              writer = bzrlib.option.diff_writer_registry.get()
          try:
--            Shelver.from_args(writer(sys.stdout), revision, all, file_list,
--                              message, destroy=destroy).run()
++            shelver = Shelver.from_args(writer(sys.stdout), revision, all,
++                file_list, message, destroy=destroy)
++            try:
++                shelver.run()
++            finally:
++                shelver.work_tree.unlock()
          except errors.UserAbort:
              return 0
@@ -5673,7 +5698,11 @@
      def run(self, shelf_id=None, action='apply'):
          from bzrlib.shelf_ui import Unshelver
--        Unshelver.from_args(shelf_id, action).run()
++        unshelver = Unshelver.from_args(shelf_id, action)
++        try:
++            unshelver.run()
++        finally:
++            unshelver.tree.unlock()
  class cmd_clean_tree(Command):
 === modified file 'bzrlib/bzrdir.py'
 --- bzrlib/bzrdir.py	2009-08-04 14:48:59 +0000
 +++ bzrlib/bzrdir.py	2009-08-21 02:10:06 +0000
@@ -129,9 +129,16 @@
          return True
      def check_conversion_target(self, target_format):
++        """Check that a bzrdir as a whole can be converted to a new format."""
++        # The only current restriction is that the repository content can be
++        # fetched compatibly with the target.
          target_repo_format = target_format.repository_format
--        source_repo_format = self._format.repository_format
--        source_repo_format.check_conversion_target(target_repo_format)
++        try:
++            self.open_repository()._format.check_conversion_target(
++                target_repo_format)
++        except errors.NoRepositoryPresent:
++            # No repo, no problem.
++            pass
      @staticmethod
      def _check_supported(format, allow_unsupported,
@@ -3039,7 +3046,8 @@
                        new is _mod_branch.BzrBranchFormat8):
                      branch_converter = _mod_branch.Converter7to8()
                  else:
--                    raise errors.BadConversionTarget("No converter", new)
++                    raise errors.BadConversionTarget("No converter", new,
++                        branch._format)
                  branch_converter.convert(branch)
                  branch = self.bzrdir.open_branch()
                  old = branch._format.__class__
@@ -3548,6 +3556,10 @@
              if self._require_stacking:
                  raise
++    def requires_stacking(self):
++        """Return True if this policy requires stacking."""
++        return self._stack_on is not None and self._require_stacking
++
      def _get_full_stack_on(self):
          """Get a fully-qualified URL for the stack_on location."""
          if self._stack_on is None:
@@ -3860,11 +3872,11 @@
  # The following format should be an alias for the rich root equivalent
  # of the default format
  format_registry.register_metadir('default-rich-root',
--    'bzrlib.repofmt.pack_repo.RepositoryFormatKnitPack4',
--    help='Default format, rich root variant. (needed for bzr-svn and bzr-git).',
--    branch_format='bzrlib.branch.BzrBranchFormat6',
--    tree_format='bzrlib.workingtree.WorkingTreeFormat4',
++    'bzrlib.repofmt.groupcompress_repo.RepositoryFormat2a',
++    branch_format='bzrlib.branch.BzrBranchFormat7',
++    tree_format='bzrlib.workingtree.WorkingTreeFormat6',
      alias=True,
--    )
++    help='Same as 2a.')
++
  # The current format that is made on 'bzr init'.
--format_registry.set_default('pack-0.92')
++format_registry.set_default('2a')
 === modified file 'bzrlib/commands.py'
 --- bzrlib/commands.py	2009-06-19 09:06:56 +0000
 +++ bzrlib/commands.py	2009-09-09 17:51:19 +0000
@@ -1028,13 +1028,13 @@
              ret = apply_coveraged(opt_coverage_dir, run, *run_argv)
          else:
              ret = run(*run_argv)
--        if 'memory' in debug.debug_flags:
--            trace.debug_memory('Process status after command:', short=False)
          return ret or 0
      finally:
          # reset, in case we may do other commands later within the same
          # process. Commands that want to execute sub-commands must propagate
          # --verbose in their own way.
++        if 'memory' in debug.debug_flags:
++            trace.debug_memory('Process status after command:', short=False)
          option._verbosity_level = saved_verbosity_level
 === modified file 'bzrlib/commit.py'
 --- bzrlib/commit.py	2009-08-25 05:09:42 +0000
 +++ bzrlib/commit.py	2009-08-28 05:00:33 +0000
@@ -209,7 +209,8 @@
          :param timestamp: if not None, seconds-since-epoch for a
              postdated/predated commit.
--        :param specific_files: If true, commit only those files.
++        :param specific_files: If not None, commit only those files. An empty
++            list means 'commit no files'.
          :param rev_id: If set, use this as the new revision id.
              Useful for test or import commands that need to tightly
@@ -264,6 +265,8 @@
          self.master_locked = False
          self.recursive = recursive
          self.rev_id = None
++        # self.specific_files is None to indicate no filter, or any iterable to
++        # indicate a filter - [] means no files at all, as per iter_changes.
          if specific_files is not None:
              self.specific_files = sorted(
                  minimum_path_selection(specific_files))
@@ -285,7 +288,6 @@
          # the command line parameters, and the repository has fast delta
          # generation. See bug 347649.
          self.use_record_iter_changes = (
--            not self.specific_files and
              not self.exclude and
              not self.branch.repository._format.supports_tree_reference and
              (self.branch.repository._format.fast_deltas or
@@ -333,7 +335,7 @@
              self._gather_parents()
              # After a merge, a selected file commit is not supported.
              # See 'bzr help merge' for an explanation as to why.
--            if len(self.parents) > 1 and self.specific_files:
++            if len(self.parents) > 1 and self.specific_files is not None:
                  raise errors.CannotCommitSelectedFileMerge(self.specific_files)
              # Excludes are a form of selected file commit.
              if len(self.parents) > 1 and self.exclude:
@@ -619,12 +621,13 @@
          """Update the commit builder with the data about what has changed.
          """
          exclude = self.exclude
--        specific_files = self.specific_files or []
++        specific_files = self.specific_files
          mutter("Selecting files for commit with filter %s", specific_files)
          self._check_strict()
          if self.use_record_iter_changes:
--            iter_changes = self.work_tree.iter_changes(self.basis_tree)
++            iter_changes = self.work_tree.iter_changes(self.basis_tree,
++                specific_files=specific_files)
              iter_changes = self._filter_iter_changes(iter_changes)
              for file_id, path, fs_hash in self.builder.record_iter_changes(
                  self.work_tree, self.basis_revid, iter_changes):
 === modified file 'bzrlib/config.py'
 --- bzrlib/config.py	2009-07-02 08:59:16 +0000
 +++ bzrlib/config.py	2009-08-20 04:53:23 +0000
@@ -821,6 +821,29 @@
      return osutils.pathjoin(config_dir(), 'ignore')
++def crash_dir():
++    """Return the directory name to store crash files.
++
++    This doesn't implicitly create it.
++
++    On Windows it's in the config directory; elsewhere in the XDG cache directory.
++    """
++    if sys.platform == 'win32':
++        return osutils.pathjoin(config_dir(), 'Crash')
++    else:
++        return osutils.pathjoin(xdg_cache_dir(), 'crash')
++
++
++def xdg_cache_dir():
++    # See http://standards.freedesktop.org/basedir-spec/latest/ar01s03.html
++    # Possibly this should be different on Windows?
++    e = os.environ.get('XDG_CACHE_DIR', None)
++    if e:
++        return e
++    else:
++        return os.path.expanduser('~/.cache')
++
++
  def _auto_user_id():
      """Calculate automatic user identification.
 === added file 'bzrlib/crash.py'
 --- bzrlib/crash.py	1970-01-01 00:00:00 +0000
 +++ bzrlib/crash.py	2009-08-20 05:47:53 +0000
@@ -0,0 +1,163 @@
++# Copyright (C) 2009 Canonical Ltd
++#
++# This program is free software; you can redistribute it and/or modify
++# it under the terms of the GNU General Public License as published by
++# the Free Software Foundation; either version 2 of the License, or
++# (at your option) any later version.
++#
++# This program is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
++# GNU General Public License for more details.
++#
++# You should have received a copy of the GNU General Public License
++# along with this program; if not, write to the Free Software
++# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
++
++
++"""Handling and reporting crashes.
++"""
++
++# for interactive testing, try the 'bzr assert-fail' command
++# or see http://code.launchpad.net/~mbp/bzr/bzr-fail
++
++import os
++import platform
++import pprint
++import sys
++import time
++from StringIO import StringIO
++
++import bzrlib
++from bzrlib import (
++    config,
++    debug,
++    osutils,
++    plugin,
++    trace,
++    )
++
++
++def report_bug(exc_info, stderr):
++    if 'no_apport' not in debug.debug_flags:
++        try:
++            report_bug_to_apport(exc_info, stderr)
++            return
++        except ImportError, e:
++            trace.mutter("couldn't find apport bug-reporting library: %s" % e)
++            pass
++        except Exception, e:
++            # this should only happen if apport is installed but it didn't
++            # work, eg because of an io error writing the crash file
++            sys.stderr.write("bzr: failed to report crash using apport:\n "
++                "    %r\n" % e)
++            pass
++    report_bug_legacy(exc_info, stderr)
++
++
++def report_bug_legacy(exc_info, err_file):
++    """Report a bug by just printing a message to the user."""
++    trace.print_exception(exc_info, err_file)
++    err_file.write('\n')
++    err_file.write('bzr %s on python %s (%s)\n' % \
++                       (bzrlib.__version__,
++                        bzrlib._format_version_tuple(sys.version_info),
++                        platform.platform(aliased=1)))
++    err_file.write('arguments: %r\n' % sys.argv)
++    err_file.write(
++        'encoding: %r, fsenc: %r, lang: %r\n' % (
++            osutils.get_user_encoding(), sys.getfilesystemencoding(),
++            os.environ.get('LANG')))
++    err_file.write("plugins:\n")
++    err_file.write(_format_plugin_list())
++    err_file.write(
++        "\n\n"
++        "*** Bazaar has encountered an internal error.  This probably indicates a\n"
++        "    bug in Bazaar.  You can help us fix it by filing a bug report at\n"
++        "        https://bugs.launchpad.net/bzr/+filebug\n"
++        "    including this traceback and a description of the problem.\n"
++        )
++
++
++def report_bug_to_apport(exc_info, stderr):
++    """Report a bug to apport for optional automatic filing.
++    """
++    # this is based on apport_package_hook.py, but omitting some of the
++    # Ubuntu-specific policy about what to report and when
++
++    # if this fails its caught at a higher level; we don't want to open the
++    # crash file unless apport can be loaded.
++    import apport
++
++    crash_file = _open_crash_file()
++    try:
++        _write_apport_report_to_file(exc_info, crash_file)
++    finally:
++        crash_file.close()
++
++    stderr.write("bzr: ERROR: %s.%s: %s\n"
++        "\n"
++        "*** Bazaar has encountered an internal error.  This probably indicates a\n"
++        "    bug in Bazaar.  You can help us fix it by filing a bug report at\n"
++        "        https://bugs.launchpad.net/bzr/+filebug\n"
++        "    attaching the crash file\n"
++        "        %s\n"
++        "    and including a description of the problem.\n"
++        "\n"
++        "    The crash file is plain text and you can inspect or edit it to remove\n"
++        "    private information.\n"
++        % (exc_info[0].__module__, exc_info[0].__name__, exc_info[1],
++           crash_file.name))
++
++
++def _write_apport_report_to_file(exc_info, crash_file):
++    import traceback
++    from apport.report import Report
++
++    exc_type, exc_object, exc_tb = exc_info
++
++    pr = Report()
++    # add_proc_info gives you the memory map of the process: this seems rarely
++    # useful for Bazaar and it does make the report harder to scan, though it
++    # does tell you what binary modules are loaded.
++    # pr.add_proc_info()
++    pr.add_user_info()
++    pr['CommandLine'] = pprint.pformat(sys.argv)
++    pr['BzrVersion'] = bzrlib.__version__
++    pr['PythonVersion'] = bzrlib._format_version_tuple(sys.version_info)
++    pr['Platform'] = platform.platform(aliased=1)
++    pr['UserEncoding'] = osutils.get_user_encoding()
++    pr['FileSystemEncoding'] = sys.getfilesystemencoding()
++    pr['Locale'] = os.environ.get('LANG')
++    pr['BzrPlugins'] = _format_plugin_list()
++    pr['PythonLoadedModules'] = _format_module_list()
++    pr['BzrDebugFlags'] = pprint.pformat(debug.debug_flags)
++
++    tb_file = StringIO()
++    traceback.print_exception(exc_type, exc_object, exc_tb, file=tb_file)
++    pr['Traceback'] = tb_file.getvalue()
++
++    pr.write(crash_file)
++
++
++def _open_crash_file():
++    crash_dir = config.crash_dir()
++    # user-readable only, just in case the contents are sensitive.
++    if not osutils.isdir(crash_dir):
++        os.makedirs(crash_dir, mode=0700)
++    filename = 'bzr-%s-%s.crash' % (
++        osutils.compact_date(time.time()),
++        os.getpid(),)
++    return open(osutils.pathjoin(crash_dir, filename), 'wt')
++
++
++def _format_plugin_list():
++    plugin_lines = []
++    for name, a_plugin in sorted(plugin.plugins().items()):
++        plugin_lines.append("  %-20s %s [%s]" %
++            (name, a_plugin.path(), a_plugin.__version__))
++    return '\n'.join(plugin_lines)
++
++
++def _format_module_list():
++    return pprint.pformat(sys.modules)
 === modified file 'bzrlib/dirstate.py'
 --- bzrlib/dirstate.py	2009-07-27 05:44:19 +0000
 +++ bzrlib/dirstate.py	2009-08-28 05:00:33 +0000
@@ -3166,15 +3166,18 @@
      __slots__ = ["old_dirname_to_file_id", "new_dirname_to_file_id",
          "last_source_parent", "last_target_parent", "include_unchanged",
--        "use_filesystem_for_exec", "utf8_decode", "searched_specific_files",
--        "search_specific_files", "state", "source_index", "target_index",
--        "want_unversioned", "tree"]
++        "partial", "use_filesystem_for_exec", "utf8_decode",
++        "searched_specific_files", "search_specific_files",
++        "searched_exact_paths", "search_specific_file_parents", "seen_ids",
++        "state", "source_index", "target_index", "want_unversioned", "tree"]
      def __init__(self, include_unchanged, use_filesystem_for_exec,
          search_specific_files, state, source_index, target_index,
          want_unversioned, tree):
          self.old_dirname_to_file_id = {}
          self.new_dirname_to_file_id = {}
++        # Are we doing a partial iter_changes?
++        self.partial = search_specific_files != set([''])
          # Using a list so that we can access the values and change them in
          # nested scope. Each one is [path, file_id, entry]
          self.last_source_parent = [None, None]
@@ -3183,14 +3186,25 @@
          self.use_filesystem_for_exec = use_filesystem_for_exec
          self.utf8_decode = cache_utf8._utf8_decode
          # for all search_indexs in each path at or under each element of
--        # search_specific_files, if the detail is relocated: add the id, and add the
--        # relocated path as one to search if its not searched already. If the
--        # detail is not relocated, add the id.
++        # search_specific_files, if the detail is relocated: add the id, and
++        # add the relocated path as one to search if its not searched already.
++        # If the detail is not relocated, add the id.
          self.searched_specific_files = set()
++        # When we search exact paths without expanding downwards, we record
++        # that here.
++        self.searched_exact_paths = set()
          self.search_specific_files = search_specific_files
++        # The parents up to the root of the paths we are searching.
++        # After all normal paths are returned, these specific items are returned.
++        self.search_specific_file_parents = set()
++        # The ids we've sent out in the delta.
++        self.seen_ids = set()
          self.state = state
          self.source_index = source_index
          self.target_index = target_index
++        if target_index != 0:
++            # A lot of code in here depends on target_index == 0
++            raise errors.BzrError('unsupported target index')
          self.want_unversioned = want_unversioned
          self.tree = tree
@@ -3198,15 +3212,15 @@
          """Compare an entry and real disk to generate delta information.
          :param path_info: top_relpath, basename, kind, lstat, abspath for
--            the path of entry. If None, then the path is considered absent.
--            (Perhaps we should pass in a concrete entry for this ?)
++            the path of entry. If None, then the path is considered absent in
++            the target (Perhaps we should pass in a concrete entry for this ?)
              Basename is returned as a utf8 string because we expect this
              tuple will be ignored, and don't want to take the time to
              decode.
          :return: (iter_changes_result, changed). If the entry has not been
              handled then changed is None. Otherwise it is False if no content
--            or metadata changes have occured, and None if any content or
--            metadata change has occured. If self.include_unchanged is True then
++            or metadata changes have occurred, and True if any content or
++            metadata change has occurred. If self.include_unchanged is True then
              if changed is not None, iter_changes_result will always be a result
              tuple. Otherwise, iter_changes_result is None unless changed is
              True.
@@ -3463,6 +3477,25 @@
      def __iter__(self):
          return self
++    def _gather_result_for_consistency(self, result):
++        """Check a result we will yield to make sure we are consistent later.
++
++        This gathers result's parents into a set to output later.
++
++        :param result: A result tuple.
++        """
++        if not self.partial or not result[0]:
++            return
++        self.seen_ids.add(result[0])
++        new_path = result[1][1]
++        if new_path:
++            # Not the root and not a delete: queue up the parents of the path.
++            self.search_specific_file_parents.update(
++                osutils.parent_directories(new_path.encode('utf8')))
++            # Add the root directory which parent_directories does not
++            # provide.
++            self.search_specific_file_parents.add('')
++
      def iter_changes(self):
          """Iterate over the changes."""
          utf8_decode = cache_utf8._utf8_decode
@@ -3546,6 +3579,8 @@
                  result, changed = _process_entry(entry, root_dir_info)
                  if changed is not None:
                      path_handled = True
++                    if changed:
++                        self._gather_result_for_consistency(result)
                      if changed or self.include_unchanged:
                          yield result
              if self.want_unversioned and not path_handled and root_dir_info:
@@ -3664,6 +3699,8 @@
                              # advance the entry only, after processing.
                              result, changed = _process_entry(current_entry, None)
                              if changed is not None:
++                                if changed:
++                                    self._gather_result_for_consistency(result)
                                  if changed or self.include_unchanged:
                                      yield result
                          block_index +=1
@@ -3702,6 +3739,8 @@
                          # no path is fine: the per entry code will handle it.
                          result, changed = _process_entry(current_entry, current_path_info)
                          if changed is not None:
++                            if changed:
++                                self._gather_result_for_consistency(result)
                              if changed or self.include_unchanged:
                                  yield result
                      elif (current_entry[0][1] != current_path_info[1]
@@ -3723,6 +3762,8 @@
                              # advance the entry only, after processing.
                              result, changed = _process_entry(current_entry, None)
                              if changed is not None:
++                                if changed:
++                                    self._gather_result_for_consistency(result)
                                  if changed or self.include_unchanged:
                                      yield result
                              advance_path = False
@@ -3730,6 +3771,8 @@
                          result, changed = _process_entry(current_entry, current_path_info)
                          if changed is not None:
                              path_handled = True
++                            if changed:
++                                self._gather_result_for_consistency(result)
                              if changed or self.include_unchanged:
                                  yield result
                      if advance_entry and current_entry is not None:
@@ -3795,6 +3838,124 @@
                          current_dir_info = dir_iterator.next()
                      except StopIteration:
                          current_dir_info = None
++        for result in self._iter_specific_file_parents():
++            yield result
++
++    def _iter_specific_file_parents(self):
++        """Iter over the specific file parents."""
++        while self.search_specific_file_parents:
++            # Process the parent directories for the paths we were iterating.
++            # Even in extremely large trees this should be modest, so currently
++            # no attempt is made to optimise.
++            path_utf8 = self.search_specific_file_parents.pop()
++            if osutils.is_inside_any(self.searched_specific_files, path_utf8):
++                # We've examined this path.
++                continue
++            if path_utf8 in self.searched_exact_paths:
++                # We've examined this path.
++                continue
++            path_entries = self.state._entries_for_path(path_utf8)
++            # We need either one or two entries. If the path in
++            # self.target_index has moved (so the entry in source_index is in
++            # 'ar') then we need to also look for the entry for this path in
++            # self.source_index, to output the appropriate delete-or-rename.
++            selected_entries = []
++            found_item = False
++            for candidate_entry in path_entries:
++                # Find entries present in target at this path:
++                if candidate_entry[1][self.target_index][0] not in 'ar':
++                    found_item = True
++                    selected_entries.append(candidate_entry)
++                # Find entries present in source at this path:
++                elif (self.source_index is not None and
++                    candidate_entry[1][self.source_index][0] not in 'ar'):
++                    found_item = True
++                    if candidate_entry[1][self.target_index][0] == 'a':
++                        # Deleted, emit it here.
++                        selected_entries.append(candidate_entry)
++                    else:
++                        # renamed, emit it when we process the directory it
++                        # ended up at.
++                        self.search_specific_file_parents.add(
++                            candidate_entry[1][self.target_index][1])
++            if not found_item:
++                raise AssertionError(
++                    "Missing entry for specific path parent %r, %r" % (
++                    path_utf8, path_entries))
++            path_info = self._path_info(path_utf8, path_utf8.decode('utf8'))
++            for entry in selected_entries:
++                if entry[0][2] in self.seen_ids:
++                    continue
++                result, changed = self._process_entry(entry, path_info)
++                if changed is None:
++                    raise AssertionError(
++                        "Got entry<->path mismatch for specific path "
++                        "%r entry %r path_info %r " % (
++                        path_utf8, entry, path_info))
++                # Only include changes - we're outside the users requested
++                # expansion.
++                if changed:
++                    self._gather_result_for_consistency(result)
++                    if (result[6][0] == 'directory' and
++                        result[6][1] != 'directory'):
++                        # This stopped being a directory, the old children have
++                        # to be included.
++                        if entry[1][self.source_index][0] == 'r':
++                            # renamed, take the source path
++                            entry_path_utf8 = entry[1][self.source_index][1]
++                        else:
++                            entry_path_utf8 = path_utf8
++                        initial_key = (entry_path_utf8, '', '')
++                        block_index, _ = self.state._find_block_index_from_key(
++                            initial_key)
++                        if block_index == 0:
++                            # The children of the root are in block index 1.
++                            block_index +=1
++                        current_block = None
++                        if block_index < len(self.state._dirblocks):
++                            current_block = self.state._dirblocks[block_index]
++                            if not osutils.is_inside(
++                                entry_path_utf8, current_block[0]):
++                                # No entries for this directory at all.
++                                current_block = None
++                        if current_block is not None:
++                            for entry in current_block[1]:
++                                if entry[1][self.source_index][0] in 'ar':
++                                    # Not in the source tree, so doesn't have to be
++                                    # included.
++                                    continue
++                                # Path of the entry itself.
++
++                                self.search_specific_file_parents.add(
++                                    osutils.pathjoin(*entry[0][:2]))
++                if changed or self.include_unchanged:
++                    yield result
++            self.searched_exact_paths.add(path_utf8)
++
++    def _path_info(self, utf8_path, unicode_path):
++        """Generate path_info for unicode_path.
++
++        :return: None if unicode_path does not exist, or a path_info tuple.
++        """
++        abspath = self.tree.abspath(unicode_path)
++        try:
++            stat = os.lstat(abspath)
++        except OSError, e:
++            if e.errno == errno.ENOENT:
++                # the path does not exist.
++                return None
++            else:
++                raise
++        utf8_basename = utf8_path.rsplit('/', 1)[-1]
++        dir_info = (utf8_path, utf8_basename,
++            osutils.file_kind_from_stat_mode(stat.st_mode), stat,
++            abspath)
++        if dir_info[2] == 'directory':
++            if self.tree._directory_is_tree_reference(
++                unicode_path):
++                self.root_dir_info = self.root_dir_info[:2] + \
++                    ('tree-reference',) + self.root_dir_info[3:]
++        return dir_info
  # Try to load the compiled form if possible
 === modified file 'bzrlib/errors.py'
 --- bzrlib/errors.py	2009-07-14 21:07:36 +0000
 +++ bzrlib/errors.py	2009-08-30 21:34:42 +0000
@@ -793,6 +793,12 @@
  class IncompatibleRepositories(BzrError):
++    """Report an error that two repositories are not compatible.
++
++    Note that the source and target repositories are permitted to be strings:
++    this exception is thrown from the smart server and may refer to a
++    repository the client hasn't opened.
++    """
      _fmt = "%(target)s\n" \
              "is not compatible with\n" \
@@ -2006,12 +2012,14 @@
  class BadConversionTarget(BzrError):
--    _fmt = "Cannot convert to format %(format)s.  %(problem)s"
++    _fmt = "Cannot convert from format %(from_format)s to format %(format)s." \
++            "    %(problem)s"
--    def __init__(self, problem, format):
++    def __init__(self, problem, format, from_format=None):
          BzrError.__init__(self)
          self.problem = problem
          self.format = format
++        self.from_format = from_format or '(unspecified)'
  class NoDiffFound(BzrError):
@@ -2918,8 +2926,9 @@
      _fmt = 'Cannot bind address "%(host)s:%(port)i": %(orig_error)s.'
      def __init__(self, host, port, orig_error):
++        # nb: in python2.4 socket.error doesn't have a useful repr
          BzrError.__init__(self, host=host, port=port,
--            orig_error=orig_error[1])
++            orig_error=repr(orig_error.args))
  class UnknownRules(BzrError):
 === modified file 'bzrlib/fetch.py'
 --- bzrlib/fetch.py	2009-07-09 08:59:51 +0000
 +++ bzrlib/fetch.py	2009-08-07 04:29:36 +0000
@@ -25,16 +25,21 @@
  import operator
++from bzrlib.lazy_import import lazy_import
++lazy_import(globals(), """
++from bzrlib import (
++    tsort,
++    versionedfile,
++    )
++""")
  import bzrlib
  from bzrlib import (
      errors,
      symbol_versioning,
+     )
  from bzrlib.revision import NULL_REVISION
--from bzrlib.tsort import topo_sort
  from bzrlib.trace import mutter
  import bzrlib.ui
--from bzrlib.versionedfile import FulltextContentFactory
  class RepoFetcher(object):
@@ -213,11 +218,9 @@
      def _find_root_ids(self, revs, parent_map, graph):
          revision_root = {}
--        planned_versions = {}
          for tree in self.iter_rev_trees(revs):
              revision_id = tree.inventory.root.revision
              root_id = tree.get_root_id()
--            planned_versions.setdefault(root_id, []).append(revision_id)
              revision_root[revision_id] = root_id
          # Find out which parents we don't already know root ids for
          parents = set()
@@ -229,7 +232,7 @@
          for tree in self.iter_rev_trees(parents):
              root_id = tree.get_root_id()
              revision_root[tree.get_revision_id()] = root_id
--        return revision_root, planned_versions
++        return revision_root
      def generate_root_texts(self, revs):
          """Generate VersionedFiles for all root ids.
@@ -238,9 +241,8 @@
          """
          graph = self.source.get_graph()
          parent_map = graph.get_parent_map(revs)
--        rev_order = topo_sort(parent_map)
--        rev_id_to_root_id, root_id_to_rev_ids = self._find_root_ids(
--            revs, parent_map, graph)
++        rev_order = tsort.topo_sort(parent_map)
++        rev_id_to_root_id = self._find_root_ids(revs, parent_map, graph)
          root_id_order = [(rev_id_to_root_id[rev_id], rev_id) for rev_id in
              rev_order]
          # Guaranteed stable, this groups all the file id operations together
@@ -249,20 +251,93 @@
          # yet, and are unlikely to in non-rich-root environments anyway.
          root_id_order.sort(key=operator.itemgetter(0))
          # Create a record stream containing the roots to create.
--        def yield_roots():
--            for key in root_id_order:
--                root_id, rev_id = key
--                rev_parents = parent_map[rev_id]
--                # We drop revision parents with different file-ids, because
--                # that represents a rename of the root to a different location
--                # - its not actually a parent for us. (We could look for that
--                # file id in the revision tree at considerably more expense,
--                # but for now this is sufficient (and reconcile will catch and
--                # correct this anyway).
--                # When a parent revision is a ghost, we guess that its root id
--                # was unchanged (rather than trimming it from the parent list).
--                parent_keys = tuple((root_id, parent) for parent in rev_parents
--                    if parent != NULL_REVISION and
--                        rev_id_to_root_id.get(parent, root_id) == root_id)
--                yield FulltextContentFactory(key, parent_keys, None, '')
--        return [('texts', yield_roots())]
++        from bzrlib.graph import FrozenHeadsCache
++        graph = FrozenHeadsCache(graph)
++        new_roots_stream = _new_root_data_stream(
++            root_id_order, rev_id_to_root_id, parent_map, self.source, graph)
++        return [('texts', new_roots_stream)]
++
++
++def _new_root_data_stream(
++    root_keys_to_create, rev_id_to_root_id_map, parent_map, repo, graph=None):
++    """Generate a texts substream of synthesised root entries.
++
++    Used in fetches that do rich-root upgrades.
++
++    :param root_keys_to_create: iterable of (root_id, rev_id) pairs describing
++        the root entries to create.
++    :param rev_id_to_root_id_map: dict of known rev_id -> root_id mappings for
++        calculating the parents.  If a parent rev_id is not found here then it
++        will be recalculated.
++    :param parent_map: a parent map for all the revisions in
++        root_keys_to_create.
++    :param graph: a graph to use instead of repo.get_graph().
++    """
++    for root_key in root_keys_to_create:
++        root_id, rev_id = root_key
++        parent_keys = _parent_keys_for_root_version(
++            root_id, rev_id, rev_id_to_root_id_map, parent_map, repo, graph)
++        yield versionedfile.FulltextContentFactory(
++            root_key, parent_keys, None, '')
++
++
++def _parent_keys_for_root_version(
++    root_id, rev_id, rev_id_to_root_id_map, parent_map, repo, graph=None):
++    """Get the parent keys for a given root id.
++
++    A helper function for _new_root_data_stream.
++    """
++    # Include direct parents of the revision, but only if they used the same
++    # root_id and are heads.
++    rev_parents = parent_map[rev_id]
++    parent_ids = []
++    for parent_id in rev_parents:
++        if parent_id == NULL_REVISION:
++            continue
++        if parent_id not in rev_id_to_root_id_map:
++            # We probably didn't read this revision, go spend the extra effort
++            # to actually check
++            try:
++                tree = repo.revision_tree(parent_id)
++            except errors.NoSuchRevision:
++                # Ghost, fill out rev_id_to_root_id in case we encounter this
++                # again.
++                # But set parent_root_id to None since we don't really know
++                parent_root_id = None
++            else:
++                parent_root_id = tree.get_root_id()
++            rev_id_to_root_id_map[parent_id] = None
++            # XXX: why not:
++            #   rev_id_to_root_id_map[parent_id] = parent_root_id
++            # memory consumption maybe?
++        else:
++            parent_root_id = rev_id_to_root_id_map[parent_id]
++        if root_id == parent_root_id:
++            # With stacking we _might_ want to refer to a non-local revision,
++            # but this code path only applies when we have the full content
++            # available, so ghosts really are ghosts, not just the edge of
++            # local data.
++            parent_ids.append(parent_id)
++        else:
++            # root_id may be in the parent anyway.
++            try:
++                tree = repo.revision_tree(parent_id)
++            except errors.NoSuchRevision:
++                # ghost, can't refer to it.
++                pass
++            else:
++                try:
++                    parent_ids.append(tree.inventory[root_id].revision)
++                except errors.NoSuchId:
++                    # not in the tree
++                    pass
++    # Drop non-head parents
++    if graph is None:
++        graph = repo.get_graph()
++    heads = graph.heads(parent_ids)
++    selected_ids = []
++    for parent_id in parent_ids:
++        if parent_id in heads and parent_id not in selected_ids:
++            selected_ids.append(parent_id)
++    parent_keys = [(root_id, parent_id) for parent_id in selected_ids]
++    return parent_keys
 === modified file 'bzrlib/graph.py'
 --- bzrlib/graph.py	2009-08-04 04:36:34 +0000
 +++ bzrlib/graph.py	2009-08-17 18:36:14 +0000
@@ -21,7 +21,6 @@
      errors,
      revision,
      trace,
--    tsort,
+     )
  from bzrlib.symbol_versioning import deprecated_function, deprecated_in
@@ -926,6 +925,7 @@
          An ancestor may sort after a descendant if the relationship is not
          visible in the supplied list of revisions.
          """
++        from bzrlib import tsort
          sorter = tsort.TopoSorter(self.get_parent_map(revisions))
          return sorter.iter_topo_order()
 === modified file 'bzrlib/groupcompress.py'
 --- bzrlib/groupcompress.py	2009-08-04 04:36:34 +0000
 +++ bzrlib/groupcompress.py	2009-09-09 13:05:33 +0000
@@ -33,7 +33,6 @@
      pack,
      trace,
+     )
--from bzrlib.graph import Graph
  from bzrlib.btree_index import BTreeBuilder
  from bzrlib.lru_cache import LRUSizeCache
  from bzrlib.tsort import topo_sort
@@ -45,12 +44,15 @@
      VersionedFiles,
+     )
++# Minimum number of uncompressed bytes to try fetch at once when retrieving
++# groupcompress blocks.
++BATCH_SIZE = 2**16
++
  _USE_LZMA = False and (pylzma is not None)
  # osutils.sha_string('')
  _null_sha1 = 'da39a3ee5e6b4b0d3255bfef95601890afd80709'
--
  def sort_gc_optimal(parent_map):
      """Sort and group the keys in parent_map into groupcompress order.
@@ -62,16 +64,15 @@
      # groupcompress ordering is approximately reverse topological,
      # properly grouped by file-id.
      per_prefix_map = {}
--    for item in parent_map.iteritems():
--        key = item[0]
++    for key, value in parent_map.iteritems():
          if isinstance(key, str) or len(key) == 1:
              prefix = ''
          else:
              prefix = key[0]
          try:
--            per_prefix_map[prefix].append(item)
++            per_prefix_map[prefix][key] = value
          except KeyError:
--            per_prefix_map[prefix] = [item]
++            per_prefix_map[prefix] = {key: value}
      present_keys = []
      for prefix in sorted(per_prefix_map):
@@ -456,7 +457,6 @@
                  # There are code paths that first extract as fulltext, and then
                  # extract as storage_kind (smart fetch). So we don't break the
                  # refcycle here, but instead in manager.get_record_stream()
--                # self._manager = None
              if storage_kind == 'fulltext':
                  return self._bytes
              else:
@@ -468,6 +468,14 @@
  class _LazyGroupContentManager(object):
      """This manages a group of _LazyGroupCompressFactory objects."""
++    _max_cut_fraction = 0.75 # We allow a block to be trimmed to 75% of
++                             # current size, and still be considered
++                             # resuable
++    _full_block_size = 4*1024*1024
++    _full_mixed_block_size = 2*1024*1024
++    _full_enough_block_size = 3*1024*1024 # size at which we won't repack
++    _full_enough_mixed_block_size = 2*768*1024 # 1.5MB
++
      def __init__(self, block):
          self._block = block
          # We need to preserve the ordering
@@ -545,22 +553,23 @@
          # time (self._block._content) is a little expensive.
          self._block._ensure_content(self._last_byte)
--    def _check_rebuild_block(self):
++    def _check_rebuild_action(self):
          """Check to see if our block should be repacked."""
          total_bytes_used = 0
          last_byte_used = 0
          for factory in self._factories:
              total_bytes_used += factory._end - factory._start
--            last_byte_used = max(last_byte_used, factory._end)
--        # If we are using most of the bytes from the block, we have nothing
--        # else to check (currently more that 1/2)
++            if last_byte_used < factory._end:
++                last_byte_used = factory._end
++        # If we are using more than half of the bytes from the block, we have
++        # nothing else to check
          if total_bytes_used * 2 >= self._block._content_length:
--            return
--        # Can we just strip off the trailing bytes? If we are going to be
--        # transmitting more than 50% of the front of the content, go ahead
++            return None, last_byte_used, total_bytes_used
++        # We are using less than 50% of the content. Is the content we are
++        # using at the beginning of the block? If so, we can just trim the
++        # tail, rather than rebuilding from scratch.
          if total_bytes_used * 2 > last_byte_used:
--            self._trim_block(last_byte_used)
--            return
++            return 'trim', last_byte_used, total_bytes_used
          # We are using a small amount of the data, and it isn't just packed
          # nicely at the front, so rebuild the content.
@@ -573,7 +582,80 @@
          #       expanding many deltas into fulltexts, as well.
          #       If we build a cheap enough 'strip', then we could try a strip,
          #       if that expands the content, we then rebuild.
--        self._rebuild_block()
++        return 'rebuild', last_byte_used, total_bytes_used
++
++    def check_is_well_utilized(self):
++        """Is the current block considered 'well utilized'?
++
++        This heuristic asks if the current block considers itself to be a fully
++        developed group, rather than just a loose collection of data.
++        """
++        if len(self._factories) == 1:
++            # A block of length 1 could be improved by combining with other
++            # groups - don't look deeper. Even larger than max size groups
++            # could compress well with adjacent versions of the same thing.
++            return False
++        action, last_byte_used, total_bytes_used = self._check_rebuild_action()
++        block_size = self._block._content_length
++        if total_bytes_used < block_size * self._max_cut_fraction:
++            # This block wants to trim itself small enough that we want to
++            # consider it under-utilized.
++            return False
++        # TODO: This code is meant to be the twin of _insert_record_stream's
++        #       'start_new_block' logic. It would probably be better to factor
++        #       out that logic into a shared location, so that it stays
++        #       together better
++        # We currently assume a block is properly utilized whenever it is >75%
++        # of the size of a 'full' block. In normal operation, a block is
++        # considered full when it hits 4MB of same-file content. So any block
++        # >3MB is 'full enough'.
++        # The only time this isn't true is when a given block has large-object
++        # content. (a single file >4MB, etc.)
++        # Under these circumstances, we allow a block to grow to
++        # 2 x largest_content.  Which means that if a given block had a large
++        # object, it may actually be under-utilized. However, given that this
++        # is 'pack-on-the-fly' it is probably reasonable to not repack large
++        # content blobs on-the-fly. Note that because we return False for all
++        # 1-item blobs, we will repack them; we may wish to reevaluate our
++        # treatment of large object blobs in the future.
++        if block_size >= self._full_enough_block_size:
++            return True
++        # If a block is <3MB, it still may be considered 'full' if it contains
++        # mixed content. The current rule is 2MB of mixed content is considered
++        # full. So check to see if this block contains mixed content, and
++        # set the threshold appropriately.
++        common_prefix = None
++        for factory in self._factories:
++            prefix = factory.key[:-1]
++            if common_prefix is None:
++                common_prefix = prefix
++            elif prefix != common_prefix:
++                # Mixed content, check the size appropriately
++                if block_size >= self._full_enough_mixed_block_size:
++                    return True
++                break
++        # The content failed both the mixed check and the single-content check
++        # so obviously it is not fully utilized
++        # TODO: there is one other constraint that isn't being checked
++        #       namely, that the entries in the block are in the appropriate
++        #       order. For example, you could insert the entries in exactly
++        #       reverse groupcompress order, and we would think that is ok.
++        #       (all the right objects are in one group, and it is fully
++        #       utilized, etc.) For now, we assume that case is rare,
++        #       especially since we should always fetch in 'groupcompress'
++        #       order.
++        return False
++
++    def _check_rebuild_block(self):
++        action, last_byte_used, total_bytes_used = self._check_rebuild_action()
++        if action is None:
++            return
++        if action == 'trim':
++            self._trim_block(last_byte_used)
++        elif action == 'rebuild':
++            self._rebuild_block()
++        else:
++            raise ValueError('unknown rebuild action: %r' % (action,))
      def _wire_bytes(self):
          """Return a byte stream suitable for transmitting over the wire."""
@@ -975,23 +1057,139 @@
      versioned_files.stream.close()
++class _BatchingBlockFetcher(object):
++    """Fetch group compress blocks in batches.
++
++    :ivar total_bytes: int of expected number of bytes needed to fetch the
++        currently pending batch.
++    """
++
++    def __init__(self, gcvf, locations):
++        self.gcvf = gcvf
++        self.locations = locations
++        self.keys = []
++        self.batch_memos = {}
++        self.memos_to_get = []
++        self.total_bytes = 0
++        self.last_read_memo = None
++        self.manager = None
++
++    def add_key(self, key):
++        """Add another to key to fetch.
++
++        :return: The estimated number of bytes needed to fetch the batch so
++            far.
++        """
++        self.keys.append(key)
++        index_memo, _, _, _ = self.locations[key]
++        read_memo = index_memo[0:3]
++        # Three possibilities for this read_memo:
++        #  - it's already part of this batch; or
++        #  - it's not yet part of this batch, but is already cached; or
++        #  - it's not yet part of this batch and will need to be fetched.
++        if read_memo in self.batch_memos:
++            # This read memo is already in this batch.
++            return self.total_bytes
++        try:
++            cached_block = self.gcvf._group_cache[read_memo]
++        except KeyError:
++            # This read memo is new to this batch, and the data isn't cached
++            # either.
++            self.batch_memos[read_memo] = None
++            self.memos_to_get.append(read_memo)
++            byte_length = read_memo[2]
++            self.total_bytes += byte_length
++        else:
++            # This read memo is new to this batch, but cached.
++            # Keep a reference to the cached block in batch_memos because it's
++            # certain that we'll use it when this batch is processed, but
++            # there's a risk that it would fall out of _group_cache between now
++            # and then.
++            self.batch_memos[read_memo] = cached_block
++        return self.total_bytes
++
++    def _flush_manager(self):
++        if self.manager is not None:
++            for factory in self.manager.get_record_stream():
++                yield factory
++            self.manager = None
++            self.last_read_memo = None
++
++    def yield_factories(self, full_flush=False):
++        """Yield factories for keys added since the last yield.  They will be
++        returned in the order they were added via add_key.
++
++        :param full_flush: by default, some results may not be returned in case
++            they can be part of the next batch.  If full_flush is True, then
++            all results are returned.
++        """
++        if self.manager is None and not self.keys:
++            return
++        # Fetch all memos in this batch.
++        blocks = self.gcvf._get_blocks(self.memos_to_get)
++        # Turn blocks into factories and yield them.
++        memos_to_get_stack = list(self.memos_to_get)
++        memos_to_get_stack.reverse()
++        for key in self.keys:
++            index_memo, _, parents, _ = self.locations[key]
++            read_memo = index_memo[:3]
++            if self.last_read_memo != read_memo:
++                # We are starting a new block. If we have a
++                # manager, we have found everything that fits for
++                # now, so yield records
++                for factory in self._flush_manager():
++                    yield factory
++                # Now start a new manager.
++                if memos_to_get_stack and memos_to_get_stack[-1] == read_memo:
++                    # The next block from _get_blocks will be the block we
++                    # need.
++                    block_read_memo, block = blocks.next()
++                    if block_read_memo != read_memo:
++                        raise AssertionError(
++                            "block_read_memo out of sync with read_memo"
++                            "(%r != %r)" % (block_read_memo, read_memo))
++                    self.batch_memos[read_memo] = block
++                    memos_to_get_stack.pop()
++                else:
++                    block = self.batch_memos[read_memo]
++                self.manager = _LazyGroupContentManager(block)
++                self.last_read_memo = read_memo
++            start, end = index_memo[3:5]
++            self.manager.add_factory(key, parents, start, end)
++        if full_flush:
++            for factory in self._flush_manager():
++                yield factory
++        del self.keys[:]
++        self.batch_memos.clear()
++        del self.memos_to_get[:]
++        self.total_bytes = 0
++
++
  class GroupCompressVersionedFiles(VersionedFiles):
      """A group-compress based VersionedFiles implementation."""
--    def __init__(self, index, access, delta=True):
++    def __init__(self, index, access, delta=True, _unadded_refs=None):
          """Create a GroupCompressVersionedFiles object.
          :param index: The index object storing access and graph data.
          :param access: The access object storing raw data.
          :param delta: Whether to delta compress or just entropy compress.
++        :param _unadded_refs: private parameter, don't use.
          """
          self._index = index
          self._access = access
          self._delta = delta
--        self._unadded_refs = {}
++        if _unadded_refs is None:
++            _unadded_refs = {}
++        self._unadded_refs = _unadded_refs
          self._group_cache = LRUSizeCache(max_size=50*1024*1024)
          self._fallback_vfs = []
++    def without_fallbacks(self):
++        """Return a clone of this object without any fallbacks configured."""
++        return GroupCompressVersionedFiles(self._index, self._access,
++            self._delta, _unadded_refs=dict(self._unadded_refs))
++
      def add_lines(self, key, parents, lines, parent_texts=None,
          left_matching_blocks=None, nostore_sha=None, random_id=False,
          check_content=True):
@@ -1099,6 +1297,22 @@
              self._check_lines_not_unicode(lines)
              self._check_lines_are_lines(lines)
++    def get_known_graph_ancestry(self, keys):
++        """Get a KnownGraph instance with the ancestry of keys."""
++        # Note that this is identical to
++        # KnitVersionedFiles.get_known_graph_ancestry, but they don't share
++        # ancestry.
++        parent_map, missing_keys = self._index.find_ancestry(keys)
++        for fallback in self._fallback_vfs:
++            if not missing_keys:
++                break
++            (f_parent_map, f_missing_keys) = fallback._index.find_ancestry(
++                                                missing_keys)
++            parent_map.update(f_parent_map)
++            missing_keys = f_missing_keys
++        kg = _mod_graph.KnownGraph(parent_map)
++        return kg
++
      def get_parent_map(self, keys):
          """Get a map of the graph parents of keys.
@@ -1131,26 +1345,42 @@
              missing.difference_update(set(new_result))
          return result, source_results
--    def _get_block(self, index_memo):
--        read_memo = index_memo[0:3]
--        # get the group:
--        try:
--            block = self._group_cache[read_memo]
--        except KeyError:
--            # read the group
--            zdata = self._access.get_raw_records([read_memo]).next()
--            # decompress - whole thing - this is not a bug, as it
--            # permits caching. We might want to store the partially
--            # decompresed group and decompress object, so that recent
--            # texts are not penalised by big groups.
--            block = GroupCompressBlock.from_bytes(zdata)
--            self._group_cache[read_memo] = block
--        # cheapo debugging:
--        # print len(zdata), len(plain)
--        # parse - requires split_lines, better to have byte offsets
--        # here (but not by much - we only split the region for the
--        # recipe, and we often want to end up with lines anyway.
--        return block
++    def _get_blocks(self, read_memos):
++        """Get GroupCompressBlocks for the given read_memos.
++
++        :returns: a series of (read_memo, block) pairs, in the order they were
++            originally passed.
++        """
++        cached = {}
++        for read_memo in read_memos:
++            try:
++                block = self._group_cache[read_memo]
++            except KeyError:
++                pass
++            else:
++                cached[read_memo] = block
++        not_cached = []
++        not_cached_seen = set()
++        for read_memo in read_memos:
++            if read_memo in cached:
++                # Don't fetch what we already have
++                continue
++            if read_memo in not_cached_seen:
++                # Don't try to fetch the same data twice
++                continue
++            not_cached.append(read_memo)
++            not_cached_seen.add(read_memo)
++        raw_records = self._access.get_raw_records(not_cached)
++        for read_memo in read_memos:
++            try:
++                yield read_memo, cached[read_memo]
++            except KeyError:
++                # Read the block, and cache it.
++                zdata = raw_records.next()
++                block = GroupCompressBlock.from_bytes(zdata)
++                self._group_cache[read_memo] = block
++                cached[read_memo] = block
++                yield read_memo, block
      def get_missing_compression_parent_keys(self):
          """Return the keys of missing compression parents.
@@ -1322,55 +1552,35 @@
                  unadded_keys, source_result)
          for key in missing:
              yield AbsentContentFactory(key)
--        manager = None
--        last_read_memo = None
--        # TODO: This works fairly well at batching up existing groups into a
--        #       streamable format, and possibly allowing for taking one big
--        #       group and splitting it when it isn't fully utilized.
--        #       However, it doesn't allow us to find under-utilized groups and
--        #       combine them into a bigger group on the fly.
--        #       (Consider the issue with how chk_map inserts texts
--        #       one-at-a-time.) This could be done at insert_record_stream()
--        #       time, but it probably would decrease the number of
--        #       bytes-on-the-wire for fetch.
++        # Batch up as many keys as we can until either:
++        #  - we encounter an unadded ref, or
++        #  - we run out of keys, or
++        #  - the total bytes to retrieve for this batch > BATCH_SIZE
++        batcher = _BatchingBlockFetcher(self, locations)
          for source, keys in source_keys:
              if source is self:
                  for key in keys:
                      if key in self._unadded_refs:
--                        if manager is not None:
--                            for factory in manager.get_record_stream():
--                                yield factory
--                            last_read_memo = manager = None
++                        # Flush batch, then yield unadded ref from
++                        # self._compressor.
++                        for factory in batcher.yield_factories(full_flush=True):
++                            yield factory
                          bytes, sha1 = self._compressor.extract(key)
                          parents = self._unadded_refs[key]
                          yield FulltextContentFactory(key, parents, sha1, bytes)
--                    else:
--                        index_memo, _, parents, (method, _) = locations[key]
--                        read_memo = index_memo[0:3]
--                        if last_read_memo != read_memo:
--                            # We are starting a new block. If we have a
--                            # manager, we have found everything that fits for
--                            # now, so yield records
--                            if manager is not None:
--                                for factory in manager.get_record_stream():
--                                    yield factory
--                            # Now start a new manager
--                            block = self._get_block(index_memo)
--                            manager = _LazyGroupContentManager(block)
--                            last_read_memo = read_memo
--                        start, end = index_memo[3:5]
--                        manager.add_factory(key, parents, start, end)
++                        continue
++                    if batcher.add_key(key) > BATCH_SIZE:
++                        # Ok, this batch is big enough.  Yield some results.
++                        for factory in batcher.yield_factories():
++                            yield factory
              else:
--                if manager is not None:
--                    for factory in manager.get_record_stream():
--                        yield factory
--                    last_read_memo = manager = None
++                for factory in batcher.yield_factories(full_flush=True):
++                    yield factory
                  for record in source.get_record_stream(keys, ordering,
                                                         include_delta_closure):
                      yield record
--        if manager is not None:
--            for factory in manager.get_record_stream():
--                yield factory
++        for factory in batcher.yield_factories(full_flush=True):
++            yield factory
      def get_sha1s(self, keys):
          """See VersionedFiles.get_sha1s()."""
@@ -1449,6 +1659,7 @@
          block_length = None
          # XXX: TODO: remove this, it is just for safety checking for now
          inserted_keys = set()
++        reuse_this_block = reuse_blocks
          for record in stream:
              # Raise an error when a record is missing.
              if record.storage_kind == 'absent':
@@ -1462,10 +1673,20 @@
              if reuse_blocks:
                  # If the reuse_blocks flag is set, check to see if we can just
                  # copy a groupcompress block as-is.
++                # We only check on the first record (groupcompress-block) not
++                # on all of the (groupcompress-block-ref) entries.
++                # The reuse_this_block flag is then kept for as long as
++                if record.storage_kind == 'groupcompress-block':
++                    # Check to see if we really want to re-use this block
++                    insert_manager = record._manager
++                    reuse_this_block = insert_manager.check_is_well_utilized()
++            else:
++                reuse_this_block = False
++            if reuse_this_block:
++                # We still want to reuse this block
                  if record.storage_kind == 'groupcompress-block':
                      # Insert the raw block into the target repo
                      insert_manager = record._manager
--                    insert_manager._check_rebuild_block()
                      bytes = record._manager._block.to_bytes()
                      _, start, length = self._access.add_raw_records(
                          [(None, len(bytes))], bytes)[0]
@@ -1476,6 +1697,11 @@
                                             'groupcompress-block-ref'):
                      if insert_manager is None:
                          raise AssertionError('No insert_manager set')
++                    if insert_manager is not record._manager:
++                        raise AssertionError('insert_manager does not match'
++                            ' the current record, we cannot be positive'
++                            ' that the appropriate content was inserted.'
++                            )
                      value = "%d %d %d %d" % (block_start, block_length,
                                               record._start, record._end)
                      nodes = [(record.key, value, (record.parents,))]
@@ -1593,7 +1819,7 @@
      def __init__(self, graph_index, is_locked, parents=True,
          add_callback=None, track_external_parent_refs=False,
--        inconsistency_fatal=True):
++        inconsistency_fatal=True, track_new_keys=False):
          """Construct a _GCGraphIndex on a graph_index.
          :param graph_index: An implementation of bzrlib.index.GraphIndex.
@@ -1619,7 +1845,8 @@
          self._is_locked = is_locked
          self._inconsistency_fatal = inconsistency_fatal
          if track_external_parent_refs:
--            self._key_dependencies = knit._KeyRefs()
++            self._key_dependencies = knit._KeyRefs(
++                track_new_keys=track_new_keys)
          else:
              self._key_dependencies = None
@@ -1679,10 +1906,14 @@
                      result.append((key, value))
              records = result
          key_dependencies = self._key_dependencies
--        if key_dependencies is not None and self._parents:
--            for key, value, refs in records:
--                parents = refs[0]
--                key_dependencies.add_references(key, parents)
++        if key_dependencies is not None:
++            if self._parents:
++                for key, value, refs in records:
++                    parents = refs[0]
++                    key_dependencies.add_references(key, parents)
++            else:
++                for key, value, refs in records:
++                    new_keys.add_key(key)
          self._add_callback(records)
      def _check_read(self):
@@ -1719,6 +1950,10 @@
              if missing_keys:
                  raise errors.RevisionNotPresent(missing_keys.pop(), self)
++    def find_ancestry(self, keys):
++        """See CombinedGraphIndex.find_ancestry"""
++        return self._graph_index.find_ancestry(keys, 0)
++
      def get_parent_map(self, keys):
          """Get a map of the parents of keys.
@@ -1741,7 +1976,7 @@
          """Return the keys of missing parents."""
          # Copied from _KnitGraphIndex.get_missing_parents
          # We may have false positives, so filter those out.
--        self._key_dependencies.add_keys(
++        self._key_dependencies.satisfy_refs_for_keys(
              self.get_parent_map(self._key_dependencies.get_unsatisfied_refs()))
          return frozenset(self._key_dependencies.get_unsatisfied_refs())
@@ -1801,17 +2036,17 @@
          This allows this _GCGraphIndex to keep track of any missing
          compression parents we may want to have filled in to make those
--        indices valid.
++        indices valid.  It also allows _GCGraphIndex to track any new keys.
          :param graph_index: A GraphIndex
          """
--        if self._key_dependencies is not None:
--            # Add parent refs from graph_index (and discard parent refs that
--            # the graph_index has).
--            add_refs = self._key_dependencies.add_references
--            for node in graph_index.iter_all_entries():
--                add_refs(node[1], node[3][0])
--
++        key_dependencies = self._key_dependencies
++        if key_dependencies is None:
++            return
++        for node in graph_index.iter_all_entries():
++            # Add parent refs from graph_index (and discard parent refs
++            # that the graph_index has).
++            key_dependencies.add_references(node[1], node[3][0])
  from bzrlib._groupcompress_py import (
 === modified file 'bzrlib/help_topics/en/configuration.txt'
 --- bzrlib/help_topics/en/configuration.txt	2009-06-26 18:13:41 +0000
 +++ bzrlib/help_topics/en/configuration.txt	2009-08-21 09:19:11 +0000
@@ -63,6 +63,60 @@
  ~~~~~~~~~~~~~~~
  The path to the plugins directory that Bazaar should use.
++If not set, Bazaar will search for plugins in:
++
++* the user specific plugin directory (containing the ``user`` plugins),
++
++* the bzrlib directory (containing the ``core`` plugins),
++
++* the site specific plugin directory if applicable (containing
++  the ``site`` plugins).
++
++If ``BZR_PLUGIN_PATH`` is set in any fashion, it will change the
++the way the plugin are searched.
++
++As for the ``PATH`` variables, if multiple directories are
++specified in ``BZR_PLUGIN_PATH`` they should be separated by the
++platform specific appropriate character (':' on Unix/Linux/etc,
++';' on windows)
++
++By default if ``BZR_PLUGIN_PATH`` is set, it replaces searching
++in ``user``.  However it will continue to search in ``core`` and
++``site`` unless they are explicitly removed.
++
++If you need to change the order or remove one of these
++directories, you should use special values:
++
++* ``-user``, ``-core``, ``-site`` will remove the corresponding
++  path from the default values,
++
++* ``+user``, ``+core``, ``+site`` will add the corresponding path
++  before the remaining default values (and also remove it from
++  the default values).
++
++Note that the special values 'user', 'core' and 'site' should be
++used literally, they will be substituted by the corresponding,
++platform specific, values.
++
++Examples:
++^^^^^^^^^
++
++The examples below uses ':' as the separator, windows users
++should use ';'.
++
++Overriding the default user plugin directory:
++``BZR_PLUGIN_PATH='/path/to/my/other/plugins'``
++
++Disabling the site directory while retaining the user directory:
++``BZR_PLUGIN_PATH='-site:+user'``
++
++Disabling all plugins (better achieved with --no-plugins):
++``BZR_PLUGIN_PATH='-user:-core:-site'``
++
++Overriding the default site plugin directory:
++``BZR_PLUGIN_PATH='/path/to/my/site/plugins:-site':+user``
++
++
  BZRPATH
  ~~~~~~~
 === modified file 'bzrlib/help_topics/en/debug-flags.txt'
 --- bzrlib/help_topics/en/debug-flags.txt	2009-08-04 04:36:34 +0000
 +++ bzrlib/help_topics/en/debug-flags.txt	2009-08-20 05:02:45 +0000
@@ -12,18 +12,26 @@
                    operations.
  -Dfetch           Trace history copying between repositories.
  -Dfilters         Emit information for debugging content filtering.
++-Dforceinvdeltas  Force use of inventory deltas during generic streaming fetch.
  -Dgraph           Trace graph traversal.
  -Dhashcache       Log every time a working file is read to determine its hash.
  -Dhooks           Trace hook execution.
  -Dhpss            Trace smart protocol requests and responses.
  -Dhpssdetail      More hpss details.
  -Dhpssvfs         Traceback on vfs access to Remote objects.
---Dhttp            Trace http connections, requests and responses
++-Dhttp            Trace http connections, requests and responses.
  -Dindex           Trace major index operations.
  -Dknit            Trace knit operations.
++-Dstrict_locks    Trace when OS locks are potentially used in a non-portable
++                  manner.
  -Dlock            Trace when lockdir locks are taken or released.
  -Dprogress        Trace progress bar operations.
  -Dmerge           Emit information for debugging merges.
++-Dno_apport       Don't use apport to report crashes.
  -Dunlock          Some errors during unlock are treated as warnings.
  -Dpack            Emit information about pack operations.
  -Dsftp            Trace SFTP internals.
++-Dstream          Trace fetch streams.
++-DIDS_never       Never use InterDifferingSerializer when fetching.
++-DIDS_always      Always use InterDifferingSerializer to fetch if appropriate
++                  for the format, even for non-local fetches.
 === modified file 'bzrlib/hooks.py'
 --- bzrlib/hooks.py	2009-06-10 03:31:01 +0000
 +++ bzrlib/hooks.py	2009-09-01 12:29:54 +0000
@@ -219,9 +219,7 @@
          strings.append('Introduced in: %s' % introduced_string)
          if self.deprecated:
              deprecated_string = _format_version_tuple(self.deprecated)
--        else:
--            deprecated_string = 'Not deprecated'
--        strings.append('Deprecated in: %s' % deprecated_string)
++            strings.append('Deprecated in: %s' % deprecated_string)
          strings.append('')
          strings.extend(textwrap.wrap(self.__doc__,
              break_long_words=False))
 === modified file 'bzrlib/index.py'
 --- bzrlib/index.py	2009-07-01 10:53:08 +0000
 +++ bzrlib/index.py	2009-08-17 22:11:06 +0000
@@ -333,6 +333,22 @@
          if combine_backing_indices is not None:
              self._combine_backing_indices = combine_backing_indices
++    def find_ancestry(self, keys, ref_list_num):
++        """See CombinedGraphIndex.find_ancestry()"""
++        pending = set(keys)
++        parent_map = {}
++        missing_keys = set()
++        while pending:
++            next_pending = set()
++            for _, key, value, ref_lists in self.iter_entries(pending):
++                parent_keys = ref_lists[ref_list_num]
++                parent_map[key] = parent_keys
++                next_pending.update([p for p in parent_keys if p not in
++                                     parent_map])
++                missing_keys.update(pending.difference(parent_map))
++            pending = next_pending
++        return parent_map, missing_keys
++
  class GraphIndex(object):
      """An index for data with embedded graphs.
@@ -702,6 +718,23 @@
                  # the last thing looked up was a terminal element
                  yield (self, ) + key_dict
++    def _find_ancestors(self, keys, ref_list_num, parent_map, missing_keys):
++        """See BTreeIndex._find_ancestors."""
++        # The api can be implemented as a trivial overlay on top of
++        # iter_entries, it is not an efficient implementation, but it at least
++        # gets the job done.
++        found_keys = set()
++        search_keys = set()
++        for index, key, value, refs in self.iter_entries(keys):
++            parent_keys = refs[ref_list_num]
++            found_keys.add(key)
++            parent_map[key] = parent_keys
++            search_keys.update(parent_keys)
++        # Figure out what, if anything, was missing
++        missing_keys.update(set(keys).difference(found_keys))
++        search_keys = search_keys.difference(parent_map)
++        return search_keys
++
      def key_count(self):
          """Return an estimate of the number of keys in this index.
@@ -1297,6 +1330,69 @@
              except errors.NoSuchFile:
                  self._reload_or_raise()
++    def find_ancestry(self, keys, ref_list_num):
++        """Find the complete ancestry for the given set of keys.
++
++        Note that this is a whole-ancestry request, so it should be used
++        sparingly.
++
++        :param keys: An iterable of keys to look for
++        :param ref_list_num: The reference list which references the parents
++            we care about.
++        :return: (parent_map, missing_keys)
++        """
++        missing_keys = set()
++        parent_map = {}
++        keys_to_lookup = set(keys)
++        generation = 0
++        while keys_to_lookup:
++            # keys that *all* indexes claim are missing, stop searching them
++            generation += 1
++            all_index_missing = None
++            # print 'gen\tidx\tsub\tn_keys\tn_pmap\tn_miss'
++            # print '%4d\t\t\t%4d\t%5d\t%5d' % (generation, len(keys_to_lookup),
++            #                                   len(parent_map),
++            #                                   len(missing_keys))
++            for index_idx, index in enumerate(self._indices):
++                # TODO: we should probably be doing something with
++                #       'missing_keys' since we've already determined that
++                #       those revisions have not been found anywhere
++                index_missing_keys = set()
++                # Find all of the ancestry we can from this index
++                # keep looking until the search_keys set is empty, which means
++                # things we didn't find should be in index_missing_keys
++                search_keys = keys_to_lookup
++                sub_generation = 0
++                # print '    \t%2d\t\t%4d\t%5d\t%5d' % (
++                #     index_idx, len(search_keys),
++                #     len(parent_map), len(index_missing_keys))
++                while search_keys:
++                    sub_generation += 1
++                    # TODO: ref_list_num should really be a parameter, since
++                    #       CombinedGraphIndex does not know what the ref lists
++                    #       mean.
++                    search_keys = index._find_ancestors(search_keys,
++                        ref_list_num, parent_map, index_missing_keys)
++                    # print '    \t  \t%2d\t%4d\t%5d\t%5d' % (
++                    #     sub_generation, len(search_keys),
++                    #     len(parent_map), len(index_missing_keys))
++                # Now set whatever was missing to be searched in the next index
++                keys_to_lookup = index_missing_keys
++                if all_index_missing is None:
++                    all_index_missing = set(index_missing_keys)
++                else:
++                    all_index_missing.intersection_update(index_missing_keys)
++                if not keys_to_lookup:
++                    break
++            if all_index_missing is None:
++                # There were no indexes, so all search keys are 'missing'
++                missing_keys.update(keys_to_lookup)
++                keys_to_lookup = None
++            else:
++                missing_keys.update(all_index_missing)
++                keys_to_lookup.difference_update(all_index_missing)
++        return parent_map, missing_keys
++
      def key_count(self):
          """Return an estimate of the number of keys in this index.
 === modified file 'bzrlib/inventory.py'
 --- bzrlib/inventory.py	2009-08-05 02:30:59 +0000
 +++ bzrlib/inventory.py	2009-08-30 23:51:10 +0000
@@ -437,7 +437,13 @@
              self.text_id is not None):
              checker._report_items.append('directory {%s} has text in revision {%s}'
                                  % (self.file_id, rev_id))
--        # Directories are stored as ''.
++        # In non rich root repositories we do not expect a file graph for the
++        # root.
++        if self.name == '' and not checker.rich_roots:
++            return
++        # Directories are stored as an empty file, but the file should exist
++        # to provide a per-fileid log. The hash of every directory content is
++        # "da..." below (the sha1sum of '').
          checker.add_pending_item(rev_id,
              ('texts', self.file_id, self.revision), 'text',
               'da39a3ee5e6b4b0d3255bfef95601890afd80709')
@@ -743,6 +749,9 @@
          """
          return self.has_id(file_id)
++    def has_filename(self, filename):
++        return bool(self.path2id(filename))
++
      def id2path(self, file_id):
          """Return as a string the path to file_id.
@@ -751,6 +760,8 @@
          >>> e = i.add(InventoryFile('foo-id', 'foo.c', parent_id='src-id'))
          >>> print i.id2path('foo-id')
          src/foo.c
++
++        :raises NoSuchId: If file_id is not present in the inventory.
          """
          # get all names, skipping root
          return '/'.join(reversed(
@@ -1363,9 +1374,6 @@
              yield ie
              file_id = ie.parent_id
--    def has_filename(self, filename):
--        return bool(self.path2id(filename))
--
      def has_id(self, file_id):
          return (file_id in self._byid)
 === modified file 'bzrlib/inventory_delta.py'
 --- bzrlib/inventory_delta.py	2009-04-02 05:53:12 +0000
 +++ bzrlib/inventory_delta.py	2009-08-13 00:20:29 +0000
@@ -29,6 +29,25 @@
  from bzrlib import inventory
  from bzrlib.revision import NULL_REVISION
++FORMAT_1 = 'bzr inventory delta v1 (bzr 1.14)'
++
++
++class InventoryDeltaError(errors.BzrError):
++    """An error when serializing or deserializing an inventory delta."""
++
++    # Most errors when serializing and deserializing are due to bugs, although
++    # damaged input (i.e. a bug in a different process) could cause
++    # deserialization errors too.
++    internal_error = True
++
++
++class IncompatibleInventoryDelta(errors.BzrError):
++    """The delta could not be deserialised because its contents conflict with
++    the allow_versioned_root or allow_tree_references flags of the
++    deserializer.
++    """
++    internal_error = False
++
  def _directory_content(entry):
      """Serialize the content component of entry which is a directory.
@@ -49,7 +68,7 @@
          exec_bytes = ''
      size_exec_sha = (entry.text_size, exec_bytes, entry.text_sha1)
      if None in size_exec_sha:
--        raise errors.BzrError('Missing size or sha for %s' % entry.file_id)
++        raise InventoryDeltaError('Missing size or sha for %s' % entry.file_id)
      return "file\x00%d\x00%s\x00%s" % size_exec_sha
@@ -60,7 +79,7 @@
      """
      target = entry.symlink_target
      if target is None:
--        raise errors.BzrError('Missing target for %s' % entry.file_id)
++        raise InventoryDeltaError('Missing target for %s' % entry.file_id)
      return "link\x00%s" % target.encode('utf8')
@@ -71,7 +90,8 @@
      """
      tree_revision = entry.reference_revision
      if tree_revision is None:
--        raise errors.BzrError('Missing reference revision for %s' % entry.file_id)
++        raise InventoryDeltaError(
++            'Missing reference revision for %s' % entry.file_id)
      return "tree\x00%s" % tree_revision
@@ -115,11 +135,8 @@
      return result
--
  class InventoryDeltaSerializer(object):
--    """Serialize and deserialize inventory deltas."""
--
--    FORMAT_1 = 'bzr inventory delta v1 (bzr 1.14)'
++    """Serialize inventory deltas."""
      def __init__(self, versioned_root, tree_references):
          """Create an InventoryDeltaSerializer.
@@ -141,6 +158,9 @@
      def delta_to_lines(self, old_name, new_name, delta_to_new):
          """Return a line sequence for delta_to_new.
++        Both the versioned_root and tree_references flags must be set via
++        require_flags before calling this.
++
          :param old_name: A UTF8 revision id for the old inventory.  May be
              NULL_REVISION if there is no older inventory and delta_to_new
              includes the entire inventory contents.
@@ -150,15 +170,20 @@
              takes.
          :return: The serialized delta as lines.
          """
++        if type(old_name) is not str:
++            raise TypeError('old_name should be str, got %r' % (old_name,))
++        if type(new_name) is not str:
++            raise TypeError('new_name should be str, got %r' % (new_name,))
          lines = ['', '', '', '', '']
          to_line = self._delta_item_to_line
          for delta_item in delta_to_new:
--            lines.append(to_line(delta_item))
--            if lines[-1].__class__ != str:
--                raise errors.BzrError(
++            line = to_line(delta_item, new_name)
++            if line.__class__ != str:
++                raise InventoryDeltaError(
                      'to_line generated non-str output %r' % lines[-1])
++            lines.append(line)
          lines.sort()
--        lines[0] = "format: %s\n" % InventoryDeltaSerializer.FORMAT_1
++        lines[0] = "format: %s\n" % FORMAT_1
          lines[1] = "parent: %s\n" % old_name
          lines[2] = "version: %s\n" % new_name
          lines[3] = "versioned_root: %s\n" % self._serialize_bool(
@@ -173,7 +198,7 @@
          else:
              return "false"
--    def _delta_item_to_line(self, delta_item):
++    def _delta_item_to_line(self, delta_item, new_version):
          """Convert delta_item to a line."""
          oldpath, newpath, file_id, entry = delta_item
          if newpath is None:
@@ -188,6 +213,10 @@
                  oldpath_utf8 = 'None'
              else:
                  oldpath_utf8 = '/' + oldpath.encode('utf8')
++            if newpath == '/':
++                raise AssertionError(
++                    "Bad inventory delta: '/' is not a valid newpath "
++                    "(should be '') in delta item %r" % (delta_item,))
              # TODO: Test real-world utf8 cache hit rate. It may be a win.
              newpath_utf8 = '/' + newpath.encode('utf8')
              # Serialize None as ''
@@ -196,58 +225,78 @@
              last_modified = entry.revision
              # special cases for /
              if newpath_utf8 == '/' and not self._versioned_root:
--                if file_id != 'TREE_ROOT':
--                    raise errors.BzrError(
--                        'file_id %s is not TREE_ROOT for /' % file_id)
--                if last_modified is not None:
--                    raise errors.BzrError(
--                        'Version present for / in %s' % file_id)
--                last_modified = NULL_REVISION
++                # This is an entry for the root, this inventory does not
++                # support versioned roots.  So this must be an unversioned
++                # root, i.e. last_modified == new revision.  Otherwise, this
++                # delta is invalid.
++                # Note: the non-rich-root repositories *can* have roots with
++                # file-ids other than TREE_ROOT, e.g. repo formats that use the
++                # xml5 serializer.
++                if last_modified != new_version:
++                    raise InventoryDeltaError(
++                        'Version present for / in %s (%s != %s)'
++                        % (file_id, last_modified, new_version))
              if last_modified is None:
--                raise errors.BzrError("no version for fileid %s" % file_id)
++                raise InventoryDeltaError("no version for fileid %s" % file_id)
              content = self._entry_to_content[entry.kind](entry)
          return ("%s\x00%s\x00%s\x00%s\x00%s\x00%s\n" %
              (oldpath_utf8, newpath_utf8, file_id, parent_id, last_modified,
                  content))
++
++class InventoryDeltaDeserializer(object):
++    """Deserialize inventory deltas."""
++
++    def __init__(self, allow_versioned_root=True, allow_tree_references=True):
++        """Create an InventoryDeltaDeserializer.
++
++        :param versioned_root: If True, any root entry that is seen is expected
++            to be versioned, and root entries can have any fileid.
++        :param tree_references: If True support tree-reference entries.
++        """
++        self._allow_versioned_root = allow_versioned_root
++        self._allow_tree_references = allow_tree_references
++
      def _deserialize_bool(self, value):
          if value == "true":
              return True
          elif value == "false":
              return False
          else:
--            raise errors.BzrError("value %r is not a bool" % (value,))
++            raise InventoryDeltaError("value %r is not a bool" % (value,))
      def parse_text_bytes(self, bytes):
          """Parse the text bytes of a serialized inventory delta.
++        If versioned_root and/or tree_references flags were set via
++        require_flags, then the parsed flags must match or a BzrError will be
++        raised.
++
          :param bytes: The bytes to parse. This can be obtained by calling
              delta_to_lines and then doing ''.join(delta_lines).
--        :return: (parent_id, new_id, inventory_delta)
++        :return: (parent_id, new_id, versioned_root, tree_references,
++            inventory_delta)
          """
++        if bytes[-1:] != '\n':
++            last_line = bytes.rsplit('\n', 1)[-1]
++            raise InventoryDeltaError('last line not empty: %r' % (last_line,))
          lines = bytes.split('\n')[:-1] # discard the last empty line
--        if not lines or lines[0] != 'format: %s' % InventoryDeltaSerializer.FORMAT_1:
--            raise errors.BzrError('unknown format %r' % lines[0:1])
++        if not lines or lines[0] != 'format: %s' % FORMAT_1:
++            raise InventoryDeltaError('unknown format %r' % lines[0:1])
          if len(lines) < 2 or not lines[1].startswith('parent: '):
--            raise errors.BzrError('missing parent: marker')
++            raise InventoryDeltaError('missing parent: marker')
          delta_parent_id = lines[1][8:]
          if len(lines) < 3 or not lines[2].startswith('version: '):
--            raise errors.BzrError('missing version: marker')
++            raise InventoryDeltaError('missing version: marker')
          delta_version_id = lines[2][9:]
          if len(lines) < 4 or not lines[3].startswith('versioned_root: '):
--            raise errors.BzrError('missing versioned_root: marker')
++            raise InventoryDeltaError('missing versioned_root: marker')
          delta_versioned_root = self._deserialize_bool(lines[3][16:])
          if len(lines) < 5 or not lines[4].startswith('tree_references: '):
--            raise errors.BzrError('missing tree_references: marker')
++            raise InventoryDeltaError('missing tree_references: marker')
          delta_tree_references = self._deserialize_bool(lines[4][17:])
--        if delta_versioned_root != self._versioned_root:
--            raise errors.BzrError(
--                "serialized versioned_root flag is wrong: %s" %
--                (delta_versioned_root,))
--        if delta_tree_references != self._tree_references:
--            raise errors.BzrError(
--                "serialized tree_references flag is wrong: %s" %
--                (delta_tree_references,))
++        if (not self._allow_versioned_root and delta_versioned_root):
++            raise IncompatibleInventoryDelta("versioned_root not allowed")
          result = []
          seen_ids = set()
          line_iter = iter(lines)
@@ -258,33 +307,58 @@
                  content) = line.split('\x00', 5)
              parent_id = parent_id or None
              if file_id in seen_ids:
--                raise errors.BzrError(
++                raise InventoryDeltaError(
                      "duplicate file id in inventory delta %r" % lines)
              seen_ids.add(file_id)
--            if newpath_utf8 == '/' and not delta_versioned_root and (
--                last_modified != 'null:' or file_id != 'TREE_ROOT'):
--                    raise errors.BzrError("Versioned root found: %r" % line)
--            elif last_modified[-1] == ':':
--                    raise errors.BzrError('special revisionid found: %r' % line)
--            if not delta_tree_references and content.startswith('tree\x00'):
--                raise errors.BzrError("Tree reference found: %r" % line)
--            content_tuple = tuple(content.split('\x00'))
--            entry = _parse_entry(
--                newpath_utf8, file_id, parent_id, last_modified, content_tuple)
++            if (newpath_utf8 == '/' and not delta_versioned_root and
++                last_modified != delta_version_id):
++                    # Delta claims to be not have a versioned root, yet here's
++                    # a root entry with a non-default version.
++                    raise InventoryDeltaError("Versioned root found: %r" % line)
++            elif newpath_utf8 != 'None' and last_modified[-1] == ':':
++                # Deletes have a last_modified of null:, but otherwise special
++                # revision ids should not occur.
++                raise InventoryDeltaError('special revisionid found: %r' % line)
++            if content.startswith('tree\x00'):
++                if delta_tree_references is False:
++                    raise InventoryDeltaError(
++                            "Tree reference found (but header said "
++                            "tree_references: false): %r" % line)
++                elif not self._allow_tree_references:
++                    raise IncompatibleInventoryDelta(
++                        "Tree reference not allowed")
              if oldpath_utf8 == 'None':
                  oldpath = None
++            elif oldpath_utf8[:1] != '/':
++                raise InventoryDeltaError(
++                    "oldpath invalid (does not start with /): %r"
++                    % (oldpath_utf8,))
              else:
++                oldpath_utf8 = oldpath_utf8[1:]
                  oldpath = oldpath_utf8.decode('utf8')
              if newpath_utf8 == 'None':
                  newpath = None
++            elif newpath_utf8[:1] != '/':
++                raise InventoryDeltaError(
++                    "newpath invalid (does not start with /): %r"
++                    % (newpath_utf8,))
              else:
++                # Trim leading slash
++                newpath_utf8 = newpath_utf8[1:]
                  newpath = newpath_utf8.decode('utf8')
++            content_tuple = tuple(content.split('\x00'))
++            if content_tuple[0] == 'deleted':
++                entry = None
++            else:
++                entry = _parse_entry(
++                    newpath, file_id, parent_id, last_modified, content_tuple)
              delta_item = (oldpath, newpath, file_id, entry)
              result.append(delta_item)
--        return delta_parent_id, delta_version_id, result
--
--
--def _parse_entry(utf8_path, file_id, parent_id, last_modified, content):
++        return (delta_parent_id, delta_version_id, delta_versioned_root,
++                delta_tree_references, result)
++
++
++def _parse_entry(path, file_id, parent_id, last_modified, content):
      entry_factory = {
          'dir': _dir_to_entry,
          'file': _file_to_entry,
@@ -292,8 +366,10 @@
          'tree': _tree_to_entry,
+     }
      kind = content[0]
--    path = utf8_path[1:].decode('utf8')
++    if path.startswith('/'):
++        raise AssertionError
      name = basename(path)
      return entry_factory[content[0]](
              content, name, parent_id, file_id, last_modified)
++
 === modified file 'bzrlib/knit.py'
 --- bzrlib/knit.py	2009-08-04 04:36:34 +0000
 +++ bzrlib/knit.py	2009-09-09 13:05:33 +0000
@@ -1190,6 +1190,19 @@
          generator = _VFContentMapGenerator(self, [key])
          return generator._get_content(key)
++    def get_known_graph_ancestry(self, keys):
++        """Get a KnownGraph instance with the ancestry of keys."""
++        parent_map, missing_keys = self._index.find_ancestry(keys)
++        for fallback in self._fallback_vfs:
++            if not missing_keys:
++                break
++            (f_parent_map, f_missing_keys) = fallback._index.find_ancestry(
++                                                missing_keys)
++            parent_map.update(f_parent_map)
++            missing_keys = f_missing_keys
++        kg = _mod_graph.KnownGraph(parent_map)
++        return kg
++
      def get_parent_map(self, keys):
          """Get a map of the graph parents of keys.
@@ -1505,10 +1518,10 @@
                  if source is parent_maps[0]:
                      # this KnitVersionedFiles
                      records = [(key, positions[key][1]) for key in keys]
--                    for key, raw_data, sha1 in self._read_records_iter_raw(records):
++                    for key, raw_data in self._read_records_iter_unchecked(records):
                          (record_details, index_memo, _) = positions[key]
                          yield KnitContentFactory(key, global_map[key],
--                            record_details, sha1, raw_data, self._factory.annotated, None)
++                            record_details, None, raw_data, self._factory.annotated, None)
                  else:
                      vf = self._fallback_vfs[parent_maps.index(source) - 1]
                      for record in vf.get_record_stream(keys, ordering,
@@ -1583,6 +1596,13 @@
          # key = basis_parent, value = index entry to add
          buffered_index_entries = {}
          for record in stream:
++            kind = record.storage_kind
++            if kind.startswith('knit-') and kind.endswith('-gz'):
++                # Check that the ID in the header of the raw knit bytes matches
++                # the record metadata.
++                raw_data = record._raw_record
++                df, rec = self._parse_record_header(record.key, raw_data)
++                df.close()
              buffered = False
              parents = record.parents
              if record.storage_kind in delta_types:
@@ -2560,6 +2580,33 @@
          except KeyError:
              raise RevisionNotPresent(key, self)
++    def find_ancestry(self, keys):
++        """See CombinedGraphIndex.find_ancestry()"""
++        prefixes = set(key[:-1] for key in keys)
++        self._load_prefixes(prefixes)
++        result = {}
++        parent_map = {}
++        missing_keys = set()
++        pending_keys = list(keys)
++        # This assumes that keys will not reference parents in a different
++        # prefix, which is accurate so far.
++        while pending_keys:
++            key = pending_keys.pop()
++            if key in parent_map:
++                continue
++            prefix = key[:-1]
++            try:
++                suffix_parents = self._kndx_cache[prefix][0][key[-1]][4]
++            except KeyError:
++                missing_keys.add(key)
++            else:
++                parent_keys = tuple([prefix + (suffix,)
++                                     for suffix in suffix_parents])
++                parent_map[key] = parent_keys
++                pending_keys.extend([p for p in parent_keys
++                                        if p not in parent_map])
++        return parent_map, missing_keys
++
      def get_parent_map(self, keys):
          """Get a map of the parents of keys.
@@ -2737,9 +2784,20 @@
  class _KeyRefs(object):
--    def __init__(self):
++    def __init__(self, track_new_keys=False):
          # dict mapping 'key' to 'set of keys referring to that key'
          self.refs = {}
++        if track_new_keys:
++            # set remembering all new keys
++            self.new_keys = set()
++        else:
++            self.new_keys = None
++
++    def clear(self):
++        if self.refs:
++            self.refs.clear()
++        if self.new_keys:
++            self.new_keys.clear()
      def add_references(self, key, refs):
          # Record the new references
@@ -2752,19 +2810,28 @@
          # Discard references satisfied by the new key
          self.add_key(key)
++    def get_new_keys(self):
++        return self.new_keys
++
      def get_unsatisfied_refs(self):
          return self.refs.iterkeys()
--    def add_key(self, key):
++    def _satisfy_refs_for_key(self, key):
          try:
              del self.refs[key]
          except KeyError:
              # No keys depended on this key.  That's ok.
              pass
--    def add_keys(self, keys):
++    def add_key(self, key):
++        # satisfy refs for key, and remember that we've seen this key.
++        self._satisfy_refs_for_key(key)
++        if self.new_keys is not None:
++            self.new_keys.add(key)
++
++    def satisfy_refs_for_keys(self, keys):
          for key in keys:
--            self.add_key(key)
++            self._satisfy_refs_for_key(key)
      def get_referrers(self):
          result = set()
@@ -2932,7 +2999,7 @@
          # If updating this, you should also update
          # groupcompress._GCGraphIndex.get_missing_parents
          # We may have false positives, so filter those out.
--        self._key_dependencies.add_keys(
++        self._key_dependencies.satisfy_refs_for_keys(
              self.get_parent_map(self._key_dependencies.get_unsatisfied_refs()))
          return frozenset(self._key_dependencies.get_unsatisfied_refs())
@@ -3049,6 +3116,10 @@
              options.append('no-eol')
          return options
++    def find_ancestry(self, keys):
++        """See CombinedGraphIndex.find_ancestry()"""
++        return self._graph_index.find_ancestry(keys, 0)
++
      def get_parent_map(self, keys):
          """Get a map of the parents of keys.
 === modified file 'bzrlib/lock.py'
 --- bzrlib/lock.py	2009-07-20 08:47:58 +0000
 +++ bzrlib/lock.py	2009-07-31 16:51:48 +0000
@@ -190,6 +190,13 @@
              if self.filename in _fcntl_WriteLock._open_locks:
                  self._clear_f()
                  raise errors.LockContention(self.filename)
++            if self.filename in _fcntl_ReadLock._open_locks:
++                if 'strict_locks' in debug.debug_flags:
++                    self._clear_f()
++                    raise errors.LockContention(self.filename)
++                else:
++                    trace.mutter('Write lock taken w/ an open read lock on: %s'
++                                 % (self.filename,))
              self._open(self.filename, 'rb+')
              # reserve a slot for this lock - even if the lockf call fails,
@@ -220,6 +227,14 @@
          def __init__(self, filename):
              super(_fcntl_ReadLock, self).__init__()
              self.filename = osutils.realpath(filename)
++            if self.filename in _fcntl_WriteLock._open_locks:
++                if 'strict_locks' in debug.debug_flags:
++                    # We raise before calling _open so we don't need to
++                    # _clear_f
++                    raise errors.LockContention(self.filename)
++                else:
++                    trace.mutter('Read lock taken w/ an open write lock on: %s'
++                                 % (self.filename,))
              _fcntl_ReadLock._open_locks.setdefault(self.filename, 0)
              _fcntl_ReadLock._open_locks[self.filename] += 1
              self._open(filename, 'rb')
@@ -418,15 +433,15 @@
              DWORD,                 # dwFlagsAndAttributes
              HANDLE                 # hTemplateFile
          )((_function_name, ctypes.windll.kernel32))
--
++
      INVALID_HANDLE_VALUE = -1
--
++
      GENERIC_READ = 0x80000000
      GENERIC_WRITE = 0x40000000
      FILE_SHARE_READ = 1
      OPEN_ALWAYS = 4
      FILE_ATTRIBUTE_NORMAL = 128
--
++
      ERROR_ACCESS_DENIED = 5
      ERROR_SHARING_VIOLATION = 32
 === modified file 'bzrlib/lsprof.py'
 --- bzrlib/lsprof.py	2009-03-08 06:18:06 +0000
 +++ bzrlib/lsprof.py	2009-08-24 21:05:09 +0000
@@ -13,45 +13,74 @@
  __all__ = ['profile', 'Stats']
--_g_threadmap = {}
--
--
--def _thread_profile(f, *args, **kwds):
--    # we lose the first profile point for a new thread in order to trampoline
--    # a new Profile object into place
--    global _g_threadmap
--    thr = thread.get_ident()
--    _g_threadmap[thr] = p = Profiler()
--    # this overrides our sys.setprofile hook:
--    p.enable(subcalls=True, builtins=True)
--
--
  def profile(f, *args, **kwds):
      """Run a function profile.
      Exceptions are not caught: If you need stats even when exceptions are to be
--    raised, passing in a closure that will catch the exceptions and transform
--    them appropriately for your driver function.
++    raised, pass in a closure that will catch the exceptions and transform them
++    appropriately for your driver function.
      :return: The functions return value and a stats object.
      """
--    global _g_threadmap
--    p = Profiler()
--    p.enable(subcalls=True)
--    threading.setprofile(_thread_profile)
++    profiler = BzrProfiler()
++    profiler.start()
      try:
          ret = f(*args, **kwds)
      finally:
--        p.disable()
--        for pp in _g_threadmap.values():
++        stats = profiler.stop()
++    return ret, stats
++
++
++class BzrProfiler(object):
++    """Bzr utility wrapper around Profiler.
++
++    For most uses the module level 'profile()' function will be suitable.
++    However profiling when a simple wrapped function isn't available may
++    be easier to accomplish using this class.
++
++    To use it, create a BzrProfiler and call start() on it. Some arbitrary
++    time later call stop() to stop profiling and retrieve the statistics
++    from the code executed in the interim.
++    """
++
++    def start(self):
++        """Start profiling.
++
++        This hooks into threading and will record all calls made until
++        stop() is called.
++        """
++        self._g_threadmap = {}
++        self.p = Profiler()
++        self.p.enable(subcalls=True)
++        threading.setprofile(self._thread_profile)
++
++    def stop(self):
++        """Stop profiling.
++
++        This unhooks from threading and cleans up the profiler, returning
++        the gathered Stats object.
++
++        :return: A bzrlib.lsprof.Stats object.
++        """
++        self.p.disable()
++        for pp in self._g_threadmap.values():
              pp.disable()
          threading.setprofile(None)
++        p = self.p
++        self.p = None
++        threads = {}
++        for tid, pp in self._g_threadmap.items():
++            threads[tid] = Stats(pp.getstats(), {})
++        self._g_threadmap = None
++        return Stats(p.getstats(), threads)
--    threads = {}
--    for tid, pp in _g_threadmap.items():
--        threads[tid] = Stats(pp.getstats(), {})
--    _g_threadmap = {}
--    return ret, Stats(p.getstats(), threads)
++    def _thread_profile(self, f, *args, **kwds):
++        # we lose the first profile point for a new thread in order to
++        # trampoline a new Profile object into place
++        thr = thread.get_ident()
++        self._g_threadmap[thr] = p = Profiler()
++        # this overrides our sys.setprofile hook:
++        p.enable(subcalls=True, builtins=True)
  class Stats(object):
 === modified file 'bzrlib/mail_client.py'
 --- bzrlib/mail_client.py	2009-06-10 03:56:49 +0000
 +++ bzrlib/mail_client.py	2009-09-02 08:26:27 +0000
@@ -424,6 +424,10 @@
      _client_commands = ['emacsclient']
++    def __init__(self, config):
++        super(EmacsMail, self).__init__(config)
++        self.elisp_tmp_file = None
++
      def _prepare_send_function(self):
          """Write our wrapper function into a temporary file.
@@ -500,6 +504,7 @@
          if attach_path is not None:
              # Do not create a file if there is no attachment
              elisp = self._prepare_send_function()
++            self.elisp_tmp_file = elisp
              lmmform = '(load "%s")' % elisp
              mmform  = '(bzr-add-mime-att "%s")' % \
                  self._encode_path(attach_path, 'attachment')
 === modified file 'bzrlib/merge.py'
 --- bzrlib/merge.py	2009-07-02 13:07:14 +0000
 +++ bzrlib/merge.py	2009-09-11 13:32:55 +0000
@@ -64,8 +64,12 @@
  def transform_tree(from_tree, to_tree, interesting_ids=None):
--    merge_inner(from_tree.branch, to_tree, from_tree, ignore_zero=True,
--                interesting_ids=interesting_ids, this_tree=from_tree)
++    from_tree.lock_tree_write()
++    try:
++        merge_inner(from_tree.branch, to_tree, from_tree, ignore_zero=True,
++                    interesting_ids=interesting_ids, this_tree=from_tree)
++    finally:
++        from_tree.unlock()
  class Merger(object):
@@ -102,6 +106,17 @@
          self._is_criss_cross = None
          self._lca_trees = None
++    def cache_trees_with_revision_ids(self, trees):
++        """Cache any tree in trees if it has a revision_id."""
++        for maybe_tree in trees:
++            if maybe_tree is None:
++                continue
++            try:
++                rev_id = maybe_tree.get_revision_id()
++            except AttributeError:
++                continue
++            self._cached_trees[rev_id] = maybe_tree
++
      @property
      def revision_graph(self):
          if self._revision_graph is None:
@@ -598,19 +613,21 @@
          self.this_tree.lock_tree_write()
          self.base_tree.lock_read()
          self.other_tree.lock_read()
--        self.tt = TreeTransform(self.this_tree, self.pb)
          try:
--            self.pp.next_phase()
--            self._compute_transform()
--            self.pp.next_phase()
--            results = self.tt.apply(no_conflicts=True)
--            self.write_modified(results)
++            self.tt = TreeTransform(self.this_tree, self.pb)
              try:
--                self.this_tree.add_conflicts(self.cooked_conflicts)
--            except UnsupportedOperation:
--                pass
++                self.pp.next_phase()
++                self._compute_transform()
++                self.pp.next_phase()
++                results = self.tt.apply(no_conflicts=True)
++                self.write_modified(results)
++                try:
++                    self.this_tree.add_conflicts(self.cooked_conflicts)
++                except UnsupportedOperation:
++                    pass
++            finally:
++                self.tt.finalize()
          finally:
--            self.tt.finalize()
              self.other_tree.unlock()
              self.base_tree.unlock()
              self.this_tree.unlock()
@@ -1516,6 +1533,7 @@
      get_revision_id = getattr(base_tree, 'get_revision_id', None)
      if get_revision_id is None:
          get_revision_id = base_tree.last_revision
++    merger.cache_trees_with_revision_ids([other_tree, base_tree, this_tree])
      merger.set_base_revision(get_revision_id(), this_branch)
      return merger.do_merge()
 === modified file 'bzrlib/missing.py'
 --- bzrlib/missing.py	2009-03-23 14:59:43 +0000
 +++ bzrlib/missing.py	2009-08-17 18:52:01 +0000
@@ -138,31 +138,13 @@
      if not ancestry: #Empty ancestry, no need to do any work
          return []
--    mainline_revs, rev_nos, start_rev_id, end_rev_id = log._get_mainline_revs(
--        branch, None, tip_revno)
--    if not mainline_revs:
--        return []
--
--    # This asks for all mainline revisions, which is size-of-history and
--    # should be addressed (but currently the only way to get correct
--    # revnos).
--
--    # mainline_revisions always includes an extra revision at the
--    # beginning, so don't request it.
--    parent_map = dict(((key, value) for key, value
--                       in graph.iter_ancestry(mainline_revs[1:])
--                       if value is not None))
--    # filter out ghosts; merge_sort errors on ghosts.
--    # XXX: is this needed here ? -- vila080910
--    rev_graph = _mod_repository._strip_NULL_ghosts(parent_map)
--    # XXX: what if rev_graph is empty now ? -- vila080910
--    merge_sorted_revisions = tsort.merge_sort(rev_graph, tip,
--                                              mainline_revs,
--                                              generate_revno=True)
++    merge_sorted_revisions = branch.iter_merge_sorted_revisions()
      # Now that we got the correct revnos, keep only the relevant
      # revisions.
      merge_sorted_revisions = [
--        (s, revid, n, d, e) for s, revid, n, d, e in merge_sorted_revisions
++        # log.reverse_by_depth expects seq_num to be present, but it is
++        # stripped by iter_merge_sorted_revisions()
++        (0, revid, n, d, e) for revid, n, d, e in merge_sorted_revisions
          if revid in ancestry]
      if not backward:
          merge_sorted_revisions = log.reverse_by_depth(merge_sorted_revisions)
 === modified file 'bzrlib/mutabletree.py'
 --- bzrlib/mutabletree.py	2009-07-10 08:33:11 +0000
 +++ bzrlib/mutabletree.py	2009-09-07 23:14:05 +0000
@@ -226,6 +226,9 @@
              revprops=revprops,
              possible_master_transports=possible_master_transports,
              *args, **kwargs)
++        post_hook_params = PostCommitHookParams(self)
++        for hook in MutableTree.hooks['post_commit']:
++            hook(post_hook_params)
          return committed_id
      def _gather_kinds(self, files, kinds):
@@ -581,14 +584,32 @@
          self.create_hook(hooks.HookPoint('start_commit',
              "Called before a commit is performed on a tree. The start commit "
              "hook is able to change the tree before the commit takes place. "
--            "start_commit is called with the bzrlib.tree.MutableTree that the "
--            "commit is being performed on.", (1, 4), None))
++            "start_commit is called with the bzrlib.mutabletree.MutableTree "
++            "that the commit is being performed on.", (1, 4), None))
++        self.create_hook(hooks.HookPoint('post_commit',
++            "Called after a commit is performed on a tree. The hook is "
++            "called with a bzrlib.mutabletree.PostCommitHookParams object. "
++            "The mutable tree the commit was performed on is available via "
++            "the mutable_tree attribute of that object.", (2, 0), None))
  # install the default hooks into the MutableTree class.
  MutableTree.hooks = MutableTreeHooks()
++class PostCommitHookParams(object):
++    """Parameters for the post_commit hook.
++
++    To access the parameters, use the following attributes:
++
++    * mutable_tree - the MutableTree object
++    """
++
++    def __init__(self, mutable_tree):
++        """Create the parameters for the post_commit hook."""
++        self.mutable_tree = mutable_tree
++
++
  class _FastPath(object):
      """A path object with fast accessors for things like basename."""
 === modified file 'bzrlib/plugin.py'
 --- bzrlib/plugin.py	2009-03-24 01:53:42 +0000
 +++ bzrlib/plugin.py	2009-09-04 15:36:48 +0000
@@ -52,12 +52,16 @@
  from bzrlib import plugins as _mod_plugins
  """)
--from bzrlib.symbol_versioning import deprecated_function
++from bzrlib.symbol_versioning import (
++    deprecated_function,
++    deprecated_in,
++    )
  DEFAULT_PLUGIN_PATH = None
  _loaded = False
++@deprecated_function(deprecated_in((2, 0, 0)))