Merge lp:~verterok/ubuntuone-client/fix-987376 into lp:ubuntuone-client

Proposed by Guillermo Gonzalez on 2012-04-24
Status: Work in progress
Proposed branch: lp:~verterok/ubuntuone-client/fix-987376
Merge into: lp:ubuntuone-client
Diff against target: 250 lines (+76/-39)
2 files modified
tests/syncdaemon/test_tritcask.py (+40/-13)
ubuntuone/syncdaemon/tritcask.py (+36/-26)
To merge this branch: bzr merge lp:~verterok/ubuntuone-client/fix-987376
Reviewer Review Type Date Requested Status
Facundo Batista 2012-04-24 Approve on 2012-05-10
Manuel de la Peña (community) Approve on 2012-05-07
Review via email: mp+103273@code.launchpad.net

Commit message

Only use mmap in files of size < 2**31, in order to avoid hitting address space limits, and fallback to standard IO.

Description of the change

Only use mmap in files of size < 2**31, in order to avoid hitting address space limits, and fallback to standard IO.
I'll be working on limiting the file growth in a following branch, to fix Bug #987382.

To post a comment you must log in.
Facundo Batista (facundo) wrote :

Thanks!

review: Approve
Manuel de la Peña (mandel) wrote :

The following happens on widows:

[ERROR]
Traceback (most recent call last):
  File "C:\Users\Mandel\Projects\Canonical\ubuntuone-client\fix-987376\tests\syn
cdaemon\test_tritcask.py", line 214, in test_iter_entries_bad_crc
    for i, entry in enumerate(db.live_file.iter_entries()):
  File "C:\Users\Mandel\Projects\Canonical\ubuntuone-client\fix-987376\ubuntuone
\syncdaemon\tritcask.py", line 257, in iter_entries
    for entry in self._iter_entries(fmmap):
  File "C:\Users\Mandel\Projects\Canonical\ubuntuone-client\fix-987376\ubuntuone
\syncdaemon\tritcask.py", line 266, in _iter_entries
    entry, new_pos = self.read(fd)
  File "C:\Users\Mandel\Projects\Canonical\ubuntuone-client\fix-987376\ubuntuone
\syncdaemon\tritcask.py", line 323, in read
    data = fd.read(key_sz+value_sz)
exceptions.OverflowError: Python int too large to convert to C long

tests.syncdaemon.test_tritcask.DataFileTest.test_iter_entries_bad_crc
===============================================================================
[ERROR]
Traceback (most recent call last):
  File "C:\Users\Mandel\Projects\Canonical\ubuntuone-client\fix-987376\tests\syn
cdaemon\test_tritcask.py", line 214, in test_iter_entries_bad_crc
    for i, entry in enumerate(db.live_file.iter_entries()):
  File "C:\Users\Mandel\Projects\Canonical\ubuntuone-client\fix-987376\ubuntuone
\syncdaemon\tritcask.py", line 252, in iter_entries
    for entry in self._iter_entries(self.fd):
  File "C:\Users\Mandel\Projects\Canonical\ubuntuone-client\fix-987376\ubuntuone
\syncdaemon\tritcask.py", line 266, in _iter_entries
    entry, new_pos = self.read(fd)
  File "C:\Users\Mandel\Projects\Canonical\ubuntuone-client\fix-987376\ubuntuone
\syncdaemon\tritcask.py", line 323, in read
    data = fd.read(key_sz+value_sz)
exceptions.MemoryError:

tests.syncdaemon.test_tritcask.NoMmapDataFileTest.test_iter_entries_bad_crc

review: Needs Fixing
Guillermo Gonzalez (verterok) wrote :

Thanks for spotting that one.
It's fixed and pushed.

Manuel de la Peña (mandel) wrote :

Everything works in all currently supported platforms!

review: Approve
Ubuntu One Auto Pilot (otto-pilot) wrote :
Download full text (247.1 KiB)

The attempt to merge lp:~verterok/ubuntuone-client/fix-987376 into lp:ubuntuone-client failed. Below is the output from the failed tests.

/usr/bin/gnome-autogen.sh
checking for autoconf >= 2.53...
  testing autoconf2.50... not found.
  testing autoconf... found 2.68
checking for automake >= 1.10...
  testing automake-1.11... found 1.11.3
checking for libtool >= 1.5...
  testing libtoolize... found 2.4.2
checking for intltool >= 0.30...
  testing intltoolize... found 0.50.2
checking for pkg-config >= 0.14.0...
  testing pkg-config... found 0.26
checking for gtk-doc >= 1.0...
  testing gtkdocize... found 1.18
Checking for required M4 macros...
Checking for forbidden M4 macros...
Processing ./configure.ac
Running libtoolize...
libtoolize: putting auxiliary files in `.'.
libtoolize: copying file `./ltmain.sh'
libtoolize: putting macros in AC_CONFIG_MACRO_DIR, `m4'.
libtoolize: copying file `m4/libtool.m4'
libtoolize: copying file `m4/ltoptions.m4'
libtoolize: copying file `m4/ltsugar.m4'
libtoolize: copying file `m4/ltversion.m4'
libtoolize: copying file `m4/lt~obsolete.m4'
Running intltoolize...
Running gtkdocize...
Running aclocal-1.11...
Running autoconf...
Running autoheader...
Running automake-1.11...
Running ./configure --enable-gtk-doc --enable-debug ...
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... no
checking for mawk... mawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking for style of include used by make... GNU
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking dependency style of gcc... gcc3
checking for library containing strerror... none required
checking for gcc... (cached) gcc
checking whether we are using the GNU C compiler... (cached) yes
checking whether gcc accepts -g... (cached) yes
checking for gcc option to accept ISO C89... (cached) none needed
checking dependency style of gcc... (cached) gcc3
checking build system type... i686-pc-linux-gnu
checking host system type... i686-pc-linux-gnu
checking how to print strings... printf
checking for a sed that does not truncate output... /bin/sed
checking for grep that handles long lines and -e... /bin/grep
checking for egrep... /bin/grep -E
checking for fgrep... /bin/grep -F
checking for ld used by gcc... /usr/bin/ld
checking if the linker (/usr/bin/ld) is GNU ld... yes
checking for BSD- or MS-compatible name lister (nm)... /usr/bin/nm -B
checking the name lister (/usr/bin/nm -B) interface... BSD nm
checking whether ln -s works... yes
checking the maximum length of command line arguments... 1572864
checking whether the shell understands some XSI constructs... yes
checking whether the shell understands "+="... yes
checking how to con...

Facundo Batista (facundo) wrote :

Tested again latest changes.

review: Approve
Ubuntu One Auto Pilot (otto-pilot) wrote :
Download full text (246.2 KiB)

The attempt to merge lp:~verterok/ubuntuone-client/fix-987376 into lp:ubuntuone-client failed. Below is the output from the failed tests.

/usr/bin/gnome-autogen.sh
checking for autoconf >= 2.53...
  testing autoconf2.50... not found.
  testing autoconf... found 2.68
checking for automake >= 1.10...
  testing automake-1.11... found 1.11.3
checking for libtool >= 1.5...
  testing libtoolize... found 2.4.2
checking for intltool >= 0.30...
  testing intltoolize... found 0.50.2
checking for pkg-config >= 0.14.0...
  testing pkg-config... found 0.26
checking for gtk-doc >= 1.0...
  testing gtkdocize... found 1.18
Checking for required M4 macros...
Checking for forbidden M4 macros...
Processing ./configure.ac
Running libtoolize...
libtoolize: putting auxiliary files in `.'.
libtoolize: copying file `./ltmain.sh'
libtoolize: putting macros in AC_CONFIG_MACRO_DIR, `m4'.
libtoolize: copying file `m4/libtool.m4'
libtoolize: copying file `m4/ltoptions.m4'
libtoolize: copying file `m4/ltsugar.m4'
libtoolize: copying file `m4/ltversion.m4'
libtoolize: copying file `m4/lt~obsolete.m4'
Running intltoolize...
Running gtkdocize...
Running aclocal-1.11...
Running autoconf...
Running autoheader...
Running automake-1.11...
Running ./configure --enable-gtk-doc --enable-debug ...
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... no
checking for mawk... mawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking for style of include used by make... GNU
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking dependency style of gcc... gcc3
checking for library containing strerror... none required
checking for gcc... (cached) gcc
checking whether we are using the GNU C compiler... (cached) yes
checking whether gcc accepts -g... (cached) yes
checking for gcc option to accept ISO C89... (cached) none needed
checking dependency style of gcc... (cached) gcc3
checking build system type... i686-pc-linux-gnu
checking host system type... i686-pc-linux-gnu
checking how to print strings... printf
checking for a sed that does not truncate output... /bin/sed
checking for grep that handles long lines and -e... /bin/grep
checking for egrep... /bin/grep -E
checking for fgrep... /bin/grep -F
checking for ld used by gcc... /usr/bin/ld
checking if the linker (/usr/bin/ld) is GNU ld... yes
checking for BSD- or MS-compatible name lister (nm)... /usr/bin/nm -B
checking the name lister (/usr/bin/nm -B) interface... BSD nm
checking whether ln -s works... yes
checking the maximum length of command line arguments... 1572864
checking whether the shell understands some XSI constructs... yes
checking whether the shell understands "+="... yes
checking how to con...

Unmerged revisions

1234. By Guillermo Gonzalez on 2012-05-08

fix tests

1233. By Guillermo Gonzalez on 2012-05-04

fix bad_crc tests to run on windows.

1232. By Guillermo Gonzalez on 2012-04-25

don't iterate the generator in DataFile.iter_entries, just return it.

1231. By Guillermo Gonzalez on 2012-04-24

fix max mmaped file limit

1230. By Guillermo Gonzalez on 2012-04-24

only use mmap in data files <= 2**32, to avoid hitting address space limits, and fallback to standard IO.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'tests/syncdaemon/test_tritcask.py'
2--- tests/syncdaemon/test_tritcask.py 2012-04-09 20:07:05 +0000
3+++ tests/syncdaemon/test_tritcask.py 2012-05-08 14:03:17 +0000
4@@ -205,8 +205,8 @@
5 db.put(i, 'foo%d' % (i,), 'bar%s' % (i,))
6 # write a different value -> random bytes
7 # now write some garbage to the end of file
8- db.live_file.fd.write(os.urandom(100))
9- db.live_file.fd.flush()
10+ db.live_file.fd.seek(db.live_file.fd.tell()-3)
11+ db.live_file.fd.write(os.urandom(3))
12 # and add 10 new entries
13 for i in range(10, 20):
14 db.put(i, 'foo%d' % (i,), 'bar%s' % (i,))
15@@ -290,7 +290,7 @@
16 with contextlib.closing(fmap):
17 current_pos = 0
18 (crc32, tstamp, key_sz, value_sz, row_type,
19- key, value, pos), new_pos = data_file.read(fmap, current_pos)
20+ key, value, pos), new_pos = data_file.read(fmap)
21 current_pos = new_pos
22 self.assertEqual(crc32_size+header_size+key_sz+value_sz, new_pos)
23 self.assertEqual(orig_tstamp, tstamp)
24@@ -299,7 +299,7 @@
25 self.assertEqual('bar', value)
26 self.assertEqual(0, row_type)
27 (crc32, tstamp, key_sz, value_sz, row_type,
28- key, value, pos), new_pos = data_file.read(fmap, current_pos)
29+ key, value, pos), new_pos = data_file.read(fmap)
30 self.assertEqual(
31 crc32_size+header_size+key_sz+value_sz+current_pos, new_pos)
32 self.assertEqual(tstamp1, tstamp)
33@@ -323,7 +323,7 @@
34 fd.flush()
35 fmap = mmap.mmap(fd.fileno(), 0, access=mmap.ACCESS_READ)
36 with contextlib.closing(fmap):
37- self.assertRaises(BadCrc, data_file.read, fmap, 0)
38+ self.assertRaises(BadCrc, data_file.read, fmap)
39
40 def test_read_bad_header(self):
41 """Test for read method with a bad header/unpack error."""
42@@ -340,7 +340,14 @@
43 fd.flush()
44 fmap = mmap.mmap(fd.fileno(), 0, access=mmap.ACCESS_READ)
45 with contextlib.closing(fmap):
46- self.assertRaises(BadHeader, data_file.read, fmap, 0)
47+ self.assertRaises(BadHeader, data_file.read, fmap)
48+
49+
50+class NoMmapDataFileTest(DataFileTest):
51+
52+ def setUp(self):
53+ self.patch(DataFile, 'max_mmap_size', 1)
54+ return super(NoMmapDataFileTest, self).setUp()
55
56
57 class TempDataFileTest(DataFileTest):
58@@ -466,10 +473,9 @@
59 db = Tritcask(self.base_dir)
60 for i in range(10):
61 db.put(i, 'foo%d' % (i,), 'bar%s' % (i,))
62- # write a different value -> random bytes
63- # now write some garbage to the end of file
64- db.live_file.fd.write(os.urandom(100))
65- db.live_file.fd.flush()
66+ # write a different value to the last item.
67+ db.live_file.fd.seek(db.live_file.fd.tell()-3)
68+ db.live_file.fd.write(os.urandom(3))
69 # and add 10 new entries
70 for i in range(10, 20):
71 db.put(i, 'foo%d' % (i,), 'bar%s' % (i,))
72@@ -567,6 +573,25 @@
73 self.assertEqual(data_file.hint_size, len("some data"))
74
75
76+class NoMmapImmutableDataFileTest(ImmutableDataFileTest):
77+
78+ def setUp(self):
79+ self.patch(ImmutableDataFile, 'max_mmap_size', 1)
80+ return super(NoMmapImmutableDataFileTest, self).setUp()
81+
82+ def test__open(self):
83+ """Test the _open private method."""
84+ new_file = DataFile(self.base_dir)
85+ # write some data
86+ new_file.fd.write('foo')
87+ immutable_file = new_file.make_immutable()
88+ self.assertTrue(immutable_file.fd is not None)
89+ self.assertTrue(immutable_file.fmmap is None)
90+ # check that the file is opened only for read
91+ self.assertRaises(IOError, immutable_file.fd.write, 'foo')
92+ immutable_file.close()
93+
94+
95 class DeadDataFileTest(ImmutableDataFileTest):
96 """Tests for DeadDataFile."""
97
98@@ -1277,9 +1302,10 @@
99 for i in range(10):
100 db.put(i, 'foo%d' % (i,), 'bar%s' % (i,))
101 self.assertFalse(db.should_rotate())
102- # write a different value -> random bytes
103- # now write some garbage to the end of file
104- db.live_file.fd.write(os.urandom(100))
105+ # write a different value to the last item.
106+ # write a different value to the last item.
107+ db.live_file.fd.seek(db.live_file.fd.tell()-5)
108+ db.live_file.fd.write(os.urandom(5))
109 db.live_file.fd.flush()
110 db.shutdown()
111 called = []
112@@ -1885,3 +1911,4 @@
113 """Test that the initial value is > 0."""
114 timer = WindowsTimer()
115 self.assertTrue(int(timer.time()) > 0)
116+
117
118=== modified file 'ubuntuone/syncdaemon/tritcask.py'
119--- ubuntuone/syncdaemon/tritcask.py 2012-04-09 20:07:05 +0000
120+++ ubuntuone/syncdaemon/tritcask.py 2012-05-08 14:03:17 +0000
121@@ -74,10 +74,6 @@
122 VERSION = 'v1'
123 FILE_SUFFIX = '.tritcask-%s.data' % VERSION
124
125-EXTRA_SEEK = False
126-if sys.platform == 'win32':
127- EXTRA_SEEK = True
128-
129 logger = logging.getLogger('ubuntuone.SyncDaemon.tritcask')
130
131
132@@ -182,6 +178,7 @@
133 """Class that encapsulates data file handling."""
134
135 last_generated_id = 0
136+ max_mmap_size = int(2**31-1) # cause 2**31 it's a long in 32bits.
137
138 def __init__(self, base_path, filename=None):
139 """Create a DataFile instance.
140@@ -250,18 +247,23 @@
141 self.fd = None
142
143 def iter_entries(self):
144- """Return a generator for the entries in the file."""
145- fmmap = mmap.mmap(self.fd.fileno(), 0, access=mmap.ACCESS_READ)
146- with contextlib.closing(fmmap):
147- for entry in self._iter_mmaped_entries(fmmap):
148+ """Return a generator for the entries in the file using mmap."""
149+ if self.size >= self.max_mmap_size:
150+ for entry in self._iter_entries(self.fd):
151 yield entry
152+ else:
153+ fmmap = mmap.mmap(self.fd.fileno(), 0, access=mmap.ACCESS_READ)
154+ with contextlib.closing(fmmap):
155+ for entry in self._iter_entries(fmmap):
156+ yield entry
157
158- def _iter_mmaped_entries(self, fmmap):
159+ def _iter_entries(self, fd):
160 """Return a generator for the entries in the mmaped file."""
161 current_pos = 0
162+ fd.seek(current_pos)
163 while True:
164 try:
165- entry, new_pos = self.read(fmmap, current_pos)
166+ entry, new_pos = self.read(fd)
167 current_pos = new_pos
168 yield entry
169 except EOFError:
170@@ -294,10 +296,8 @@
171 tstamp = timestamp()
172 header = header_struct.pack(tstamp, key_sz, value_sz, row_type)
173 crc32 = crc32_struct.pack(zlib.crc32(header + key + value))
174- if EXTRA_SEEK:
175- # seek to end of file even if we are in append mode, but py2.x IO
176- # in win32 is really buggy, see: http://bugs.python.org/issue3207
177- self.fd.seek(0, os.SEEK_END)
178+ # always go to the EOF before write, as we aren't using mmap any more.
179+ self.fd.seek(0, os.SEEK_END)
180 self.fd.write(crc32 + header)
181 self.fd.write(key)
182 value_pos = self.fd.tell()
183@@ -305,12 +305,13 @@
184 self.fd.flush()
185 return tstamp, value_pos, value_sz
186
187- def read(self, fmmap, current_pos):
188+ def read(self, fd):
189 """Read a single entry from the current position."""
190- crc32_bytes = fmmap[current_pos:current_pos + crc32_size]
191- current_pos += crc32_size
192- header = fmmap[current_pos:current_pos + header_size]
193- current_pos += header_size
194+ current_pos = fd.tell()
195+ data = fd.read(crc32_size+header_size)
196+ current_pos += crc32_size+header_size
197+ crc32_bytes = data[:crc32_size]
198+ header = data[crc32_size:]
199 if header == '' or crc32_bytes == '':
200 # reached EOF
201 raise EOFError
202@@ -319,10 +320,11 @@
203 tstamp, key_sz, value_sz, row_type = header_struct.unpack(header)
204 except struct.error, e:
205 raise BadHeader(e)
206- key = fmmap[current_pos:current_pos + key_sz]
207+ data = fd.read(key_sz+value_sz)
208+ key = data[:key_sz]
209 current_pos += key_sz
210 value_pos = current_pos
211- value = fmmap[current_pos:current_pos + value_sz]
212+ value = data[key_sz:]
213 current_pos += value_sz
214 # verify the crc32 of the data
215 if zlib.crc32(header + key + value) == crc32:
216@@ -370,8 +372,9 @@
217
218 def _open(self):
219 self.fd = open(self.filename, 'rb')
220- fmmap = mmap.mmap(self.fd.fileno(), 0, access=mmap.ACCESS_READ)
221- self.fmmap = fmmap
222+ if self.size < self.max_mmap_size:
223+ fmmap = mmap.mmap(self.fd.fileno(), 0, access=mmap.ACCESS_READ)
224+ self.fmmap = fmmap
225
226 def close(self):
227 """Close the file descriptor and mmap."""
228@@ -387,13 +390,20 @@
229
230 def iter_entries(self):
231 """Return a generator for the entries in the mmaped file."""
232- for entry in self._iter_mmaped_entries(self.fmmap):
233- yield entry
234+ fd = self.fd
235+ if self.fmmap is not None:
236+ fd = self.fmmap
237+ return self._iter_entries(fd)
238
239 def __getitem__(self, item):
240 """__getitem__ to support slicing and *only* slicing."""
241 if isinstance(item, slice):
242- return self.fmmap[item]
243+ # if we have an mmap, use it.
244+ if self.fmmap is not None:
245+ return self.fmmap[item]
246+ else:
247+ self.fd.seek(item.start)
248+ return self.fd.read(item.stop - item.start)
249 else:
250 raise ValueError('Only slice is supported')
251

Subscribers

People subscribed via source and target branches