OverflowError calling fp.read on large repository

Bug #303538 reported by davide
10
Affects Status Importance Assigned to Milestone
Bazaar
Fix Released
High
Unassigned

Bug Description

Trying to branch or checkout a my image repository, I got the following error. Yes, it's big, but no, it's not huge (namely, 20Gb total).

$ bzr branch file:///home/anna/Pictures pic.bzr
bzr: ERROR: exceptions.OverflowError: long int too large to convert to int

Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 834, in run_bzr_catch_errors
    return run_bzr(argv)
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 790, in run_bzr
    ret = run(*run_argv)
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 492, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/usr/lib/python2.5/site-packages/bzrlib/builtins.py", line 927, in run
    hardlink=hardlink)
  File "/usr/lib/python2.5/site-packages/bzrlib/bzrdir.py", line 941, in sprout
    revision_id=revision_id)
  File "/usr/lib/python2.5/site-packages/bzrlib/decorators.py", line 127, in read_locked
    return unbound(self, *args, **kwargs)
  File "/usr/lib/python2.5/site-packages/bzrlib/repository.py", line 1036, in sprout
    dest_repo.fetch(self, revision_id=revision_id)
  File "/usr/lib/python2.5/site-packages/bzrlib/repository.py", line 949, in fetch
    return inter.fetch(revision_id=revision_id, pb=pb, find_ghosts=find_ghosts)
  File "/usr/lib/python2.5/site-packages/bzrlib/decorators.py", line 165, in write_locked
    return unbound(self, *args, **kwargs)
  File "/usr/lib/python2.5/site-packages/bzrlib/repository.py", line 2759, in fetch
    revision_ids).pack()
  File "/usr/lib/python2.5/site-packages/bzrlib/repofmt/pack_repo.py", line 589, in pack
    return self._create_pack_from_packs()
  File "/usr/lib/python2.5/site-packages/bzrlib/repofmt/pack_repo.py", line 722, in _create_pack_from_packs
    self._copy_text_texts()
  File "/usr/lib/python2.5/site-packages/bzrlib/repofmt/pack_repo.py", line 686, in _copy_text_texts
    self.new_pack.text_index, readv_group_iter, total_items))
  File "/usr/lib/python2.5/site-packages/bzrlib/repofmt/pack_repo.py", line 807, in _copy_nodes_graph
    write_index, output_lines, pb, readv_group_iter, total_items):
  File "/usr/lib/python2.5/site-packages/bzrlib/repofmt/pack_repo.py", line 830, in _do_copy_nodes_graph
    izip(reader.iter_records(), node_vector):
  File "/usr/lib/python2.5/site-packages/bzrlib/pack.py", line 272, in _iter_records
    for record in self._iter_record_objects():
  File "/usr/lib/python2.5/site-packages/bzrlib/pack.py", line 277, in _iter_record_objects
    record_kind = self.reader_func(1)
  File "/usr/lib/python2.5/site-packages/bzrlib/pack.py", line 218, in reader_func
    return self._source.read(length)
  File "/usr/lib/python2.5/site-packages/bzrlib/pack.py", line 177, in read
    self._next()
  File "/usr/lib/python2.5/site-packages/bzrlib/pack.py", line 172, in _next
    length, data = self.readv_result.next()
  File "/usr/lib/python2.5/site-packages/bzrlib/transport/__init__.py", line 721, in _seek_and_read
    data = fp.read(c_offset.length)
OverflowError: long int too large to convert to int

bzr 1.3.1 on python 2.5.2.final.0 (linux2)
arguments: ['/usr/bin/bzr', 'branch', 'file:///home/anna/Pictures', 'pic.bzr']
encoding: 'UTF-8', fsenc: 'UTF-8', lang: 'en_US.UTF-8'
plugins:
  launchpad /usr/lib/python2.5/site-packages/bzrlib/plugins/launchpad [unknown]
*** Bazaar has encountered an internal error.
    Please report a bug at https://bugs.launchpad.net/bzr/+filebug
    including this traceback, and a description of what you
    were doing when the error occurred.

Revision history for this message
Martin Pool (mbp) wrote :

Can you please try upgrading to a later version of bzr, as this bug may already have been fixed. See http://bazaar-vcs.org/Download

Changed in bzr:
importance: Undecided → High
status: New → Incomplete
Revision history for this message
davide (davide-del-vento) wrote :
Download full text (3.5 KiB)

Thanks for your answer. I apologize because I forget to mention that the version I'm using is the latest available from the official repositories for Ubuntu Hardy (LTS). In fact I was quite surprised to find that practically there wasn't any update since Hardy has been released!

I manually installed the latest version and (without recreating the repository!!) get the following:

$ ~/Desktop/bzr-1.9/bzr branch file:///home/anna/Pictures/ pic.bzr-repo
bzr: ERROR: exceptions.OverflowError: long int too large to convert to int

Traceback (most recent call last):
  File "/home/anna/Desktop/bzr-1.9/bzrlib/commands.py", line 893, in run_bzr_catch_errors
    return run_bzr(argv)
  File "/home/anna/Desktop/bzr-1.9/bzrlib/commands.py", line 839, in run_bzr
    ret = run(*run_argv)
  File "/home/anna/Desktop/bzr-1.9/bzrlib/commands.py", line 539, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/home/anna/Desktop/bzr-1.9/bzrlib/builtins.py", line 980, in run
    force_new_repo=standalone)
  File "/home/anna/Desktop/bzr-1.9/bzrlib/bzrdir.py", line 1113, in sprout
    result_repo.fetch(source_repository, revision_id=revision_id)
  File "/home/anna/Desktop/bzr-1.9/bzrlib/repository.py", line 989, in fetch
    find_ghosts=find_ghosts)
  File "/home/anna/Desktop/bzr-1.9/bzrlib/decorators.py", line 192, in write_locked
    result = unbound(self, *args, **kwargs)
  File "/home/anna/Desktop/bzr-1.9/bzrlib/repository.py", line 2849, in fetch
    return self._pack(self.source, self.target, revision_ids)
  File "/home/anna/Desktop/bzr-1.9/bzrlib/repository.py", line 2856, in _pack
    revision_ids).pack()
  File "/home/anna/Desktop/bzr-1.9/bzrlib/repofmt/pack_repo.py", line 605, in pack
    return self._create_pack_from_packs()
  File "/home/anna/Desktop/bzr-1.9/bzrlib/repofmt/pack_repo.py", line 740, in _create_pack_from_packs
    self._copy_text_texts()
  File "/home/anna/Desktop/bzr-1.9/bzrlib/repofmt/pack_repo.py", line 704, in _copy_text_texts
    self.new_pack.text_index, readv_group_iter, total_items))
  File "/home/anna/Desktop/bzr-1.9/bzrlib/repofmt/pack_repo.py", line 825, in _copy_nodes_graph
    write_index, output_lines, pb, readv_group_iter, total_items):
  File "/home/anna/Desktop/bzr-1.9/bzrlib/repofmt/pack_repo.py", line 848, in _do_copy_nodes_graph
    izip(reader.iter_records(), node_vector):
  File "/home/anna/Desktop/bzr-1.9/bzrlib/pack.py", line 272, in _iter_records
    for record in self._iter_record_objects():
  File "/home/anna/Desktop/bzr-1.9/bzrlib/pack.py", line 277, in _iter_record_objects
    record_kind = self.reader_func(1)
  File "/home/anna/Desktop/bzr-1.9/bzrlib/pack.py", line 218, in reader_func
    return self._source.read(length)
  File "/home/anna/Desktop/bzr-1.9/bzrlib/pack.py", line 177, in read
    self._next()
  File "/home/anna/Desktop/bzr-1.9/bzrlib/pack.py", line 172, in _next
    length, data = self.readv_result.next()
  File "/home/anna/Desktop/bzr-1.9/bzrlib/transport/__init__.py", line 680, in _seek_and_read
    data = fp.read(c_offset.length)
OverflowError: long int too large to convert to int

bzr 1.9 on python 2.5.2 (linux2)
arguments: ['/home/anna/Desktop/...

Read more...

Martin Pool (mbp)
Changed in bzr:
status: Incomplete → Confirmed
Revision history for this message
John A Meinel (jameinel) wrote :

We have fixed the code that does smart fetches to not fetch more than 5MB at a time, but IIRC we don't have any code that limits the local fetching.

So it seems we are trying to copy >2GB of data in one read + write, and that breaks fp.read().

As a side note, it appears that you are not using a shared repository, which means we have to copy the 20GB of history around. If you had done "bzr init-repo" at the beginning, we would only need to create new working-trees or very lightweight branches, etc.

As a quick workaround, you could do:

  bzr init-repo
  rm .bzr/repository -rf
  mv existing_branch/.bzr/repository .bzr
  touch .bzr/repository/shared-storage

Note that you only want to do this if you truly *don't* have a shared repository already, otherwise it deletes the existing shared repo.

Anyway, the simple fix is to pass "max_size=XXXX" to Transport._coalesce_offsets. At the moment we use "0" as the default value, which means unlimited. We could make 0 == 2GB, or more reasonably have a harder max-size of 10MB or so. 10MB should be a reasonable buffer size that doesn't hit all of available RAM, and also avoids enough round trips.

As an example, see the attached patch.

Revision history for this message
Martin Pool (mbp) wrote :
Changed in bzr:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.