Merge lp:~jderose/dmedia/document-filestore-design into lp:dmedia
- document-filestore-design
- Merge into trunk
Proposed by
Jason Gerard DeRose
Status: | Merged |
---|---|
Approved by: | Jason Gerard DeRose |
Approved revision: | 172 |
Merged at revision: | 160 |
Proposed branch: | lp:~jderose/dmedia/document-filestore-design |
Merge into: | lp:dmedia |
Diff against target: |
810 lines (+438/-143) 3 files modified
dmedia/filestore.py (+289/-77) dmedia/schema.py (+2/-2) dmedia/tests/test_filestore.py (+147/-64) |
To merge this branch: | bzr merge lp:~jderose/dmedia/document-filestore-design |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
dmedia Dev | Pending | ||
Review via email: mp+49763@code.launchpad.net |
Commit message
Description of the change
Mostly additional docstrings and cosmetic changes.
* Documented key FileStore design decisions in filestore.py module docstring
* Reordered FileStore methods into a bit more logical order, save with FileStore method tests
* Added docstrings for FileStore methods that lacked them
* Added tests for FileStore.exists(), open(), and remove() methods
* Added new FileStore.verify() method and test
* Other misc cleanup
To post a comment you must log in.
Revision history for this message
David Jordan (dmj726) wrote : | # |
Revision history for this message
Jason Gerard DeRose (jderose) wrote : | # |
Thanks for the review!
Preview Diff
[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1 | === modified file 'dmedia/filestore.py' |
2 | --- dmedia/filestore.py 2011-02-13 06:49:38 +0000 |
3 | +++ dmedia/filestore.py 2011-02-15 03:58:48 +0000 |
4 | @@ -21,20 +21,134 @@ |
5 | # with `dmedia`. If not, see <http://www.gnu.org/licenses/>. |
6 | |
7 | """ |
8 | -Store media files in a special layout according to their content hash. |
9 | - |
10 | -Security note: this module must be carefully designed to prevent path traversal! |
11 | -Two lines of defense are used: |
12 | - |
13 | - * `safe_b32()` and `safe_ext()` validate the (assumed) untrusted *chash*, |
14 | - *quickid*, and *ext* values |
15 | - |
16 | - * `FileStore.join()` and `FileStore.create_parent()` check the paths they |
17 | - create to insure that the path did not traverse outside of the file store |
18 | - base directory |
19 | - |
20 | -Either should fully prevent path traversal but are used together for extra |
21 | -safety. |
22 | +Store files in a special layout according to their content-hash. |
23 | + |
24 | +The `FileStore` is the heart of dmedia. Files are assigned a canonical name |
25 | +based on the file's content-hash, and are placed in a special layout within the |
26 | +`FileStore` base directory. |
27 | + |
28 | +The files in a `FileStore` are read-only... they must be as modifying a file |
29 | +will change its content-hash. The only way to modify a file is to copy the |
30 | +original to a temporary file, modify it, and then place the new file into the |
31 | +`FileStore`. This might seem like an unreasonable restriction, but it perfectly |
32 | +captures the use case dmedia is concerned with... a distributed library of media |
33 | +files. |
34 | + |
35 | +On the content-creation side, non-destructive editing is certainly the best |
36 | +practice, especially in professional use cases. On the content consumption |
37 | +side, modifying a file is rather rare. And the somewhat common use case -- |
38 | +modifying a file for the sake of updating metadata (say, EXIF) -- can instead be |
39 | +accomplished by updating metadata in the corresponding CouchDB document. |
40 | + |
41 | +Importantly, without the read-only restriction, it would be impossible to make a |
42 | +distributed file system whose file operations remain robust and atomic in the |
43 | +face of arbitrary and prolonged network outages. True to its CouchDB |
44 | +foundations, dmedia is designing with the assumption that network connectivity |
45 | +is the exception rather than the rule. |
46 | + |
47 | +Please read on for the rationale of some key `FileStore` design decisions... |
48 | + |
49 | + |
50 | +Design Decision: base32-encoded content-hash |
51 | +============================================ |
52 | + |
53 | +The `FileStore` layout was designed to allow the canonical filename to be |
54 | +constructed from the content-hash in the simplest way possible, without |
55 | +requiring any special decoding or encoding. For this reason, the content-hash |
56 | +(as stored in CouchDB) is base32-encoded. |
57 | + |
58 | +Base32-encoding was chosen because: |
59 | + |
60 | + 1. It's more compact than base16/hex |
61 | + |
62 | + 2. It can be used to name files on case *insensitive* filesystems (whereas |
63 | + base64-encoding cannot) |
64 | + |
65 | +Inside the `FileStore`, the first 2 characters of the content-hash are used for |
66 | +the subdirectory name, and the remaining characters for the filename within that |
67 | +subdirectory. For example: |
68 | + |
69 | +>>> from os import path |
70 | +>>> chash = 'ZR765XWSF6S7JQHLUI4GCG5BHGPE252O' |
71 | +>>> path.join('/foo', chash[:2], chash[2:]) |
72 | +'/foo/ZR/765XWSF6S7JQHLUI4GCG5BHGPE252O' |
73 | + |
74 | + |
75 | +Design Decision: canonical filenames have file extensions |
76 | +========================================================= |
77 | + |
78 | +Strictly speaking, there is no technical reason to include a file extension on |
79 | +the canonical filenames. However, there are some practical reasons that make |
80 | +including the file extension worthwhile, despite additional complexity it adds |
81 | +to the `FileStore` API. |
82 | + |
83 | +Most importantly, it allows files in a `FileStore` layout to be served with the |
84 | +correct Content-Type by a vanilla web-server. A key design goal was to be able |
85 | +to point, say, Apache at a dmedia `FileStore` directory have a useful dmedia |
86 | +file server without requiring special Apache plugins for dmedia integration. |
87 | + |
88 | +It also provides broader software compatibility as many applications and |
89 | +libraries do rely on the file extension for type determination. And the file |
90 | +extension is helpful for developers, as a bit of intelligible information in |
91 | +canonical filename will make the layout easier to explore, aid debugging. |
92 | + |
93 | +The current `FileStore` always includes the file extension on the canonical name |
94 | +when the extension is provided by the calling code. However, the API is |
95 | +designed to accommodate `FileStore` implementations that do not include the |
96 | +file extension. The API is also designed so that the calling code isn't |
97 | +required to provide the file extension... say, if the extension was ever removed |
98 | +from the CouchDB schema. |
99 | + |
100 | +To accomplish this, files are identified by the content-hash and extension |
101 | +together, and the extension is optional, defaulting to ``None``. This is the |
102 | +typical calling signature: |
103 | + |
104 | +>>> def canonical(chash, ext=None): |
105 | +... pass |
106 | + |
107 | +For example: |
108 | + |
109 | +>>> FileStore.relpath('ZR765XWSF6S7JQHLUI4GCG5BHGPE252O') |
110 | +('ZR', '765XWSF6S7JQHLUI4GCG5BHGPE252O') |
111 | +>>> FileStore.relpath('ZR765XWSF6S7JQHLUI4GCG5BHGPE252O', 'mov') |
112 | +('ZR', '765XWSF6S7JQHLUI4GCG5BHGPE252O.mov') |
113 | + |
114 | + |
115 | +Design Decision: security good, path traversals bad |
116 | +=================================================== |
117 | + |
118 | +The `FileStore` is probably the most security sensitive part of dmedia in that |
119 | +untrusted data (content-hash, file extension) is used to construct paths on the |
120 | +filesystem. This means that the `FileStore` must be carefully designed to |
121 | +prevent path traversal attacks (aka directory traversal attacks). |
122 | + |
123 | +Two lines of defense are used. First, the content-hash and file extension are |
124 | +validated with the following functions: |
125 | + |
126 | + * `safe_b32()` - validates the content-hash |
127 | + |
128 | + * `safe_ext()` - validates the file extension |
129 | + |
130 | +Second, there are methods that ensure that paths constructed relative to the |
131 | +`FileStore` base directory cannot be outside of the base directory: |
132 | + |
133 | + * `FileStore.check_path()` - ensures that a path is inside the base |
134 | + directory |
135 | + |
136 | + * `FileStore.join()` - creates a path relative to the base directory, |
137 | + ensures resulting path is inside the base directory |
138 | + |
139 | + * `FileStore.create_parent()` - creates a file's parent directory only if |
140 | + that parent directory is inside the base directory |
141 | + |
142 | +Each line of defense is designed to fully prevent path traversals, assumes the |
143 | +other defense doesn't exist or will fail. Together, they should provide a |
144 | +strong defense against path traversal attacks. |
145 | + |
146 | +If you discover any security vulnerability in dmedia, please immediately file a |
147 | +bug: |
148 | + |
149 | + https://bugs.launchpad.net/dmedia/+filebug |
150 | """ |
151 | |
152 | import os |
153 | @@ -77,13 +191,11 @@ |
154 | ... |
155 | AmbiguousPath: '/foo/../root' resolves to '/root' |
156 | |
157 | - |
158 | Otherwise *pathname* is returned unchanged: |
159 | |
160 | >>> safe_path('/foo/bar') |
161 | '/foo/bar' |
162 | |
163 | - |
164 | Also see `safe_open()`. |
165 | """ |
166 | if path.abspath(pathname) != pathname: |
167 | @@ -106,7 +218,6 @@ |
168 | ... |
169 | AmbiguousPath: '/foo/../root' resolves to '/root' |
170 | |
171 | - |
172 | Otherwise returns a ``file`` instance created with ``open()``. |
173 | """ |
174 | return open(safe_path(filename), mode) |
175 | @@ -285,6 +396,13 @@ |
176 | |
177 | |
178 | def pack_leaves(leaves, digest_bytes=20): |
179 | + """ |
180 | + Pack leaves together into a ``bytes`` instance for CouchDB attachment. |
181 | + |
182 | + :param leaves: a ``list`` containing content-hash of each leaf in the file |
183 | + (content-hash is binary digest, not base32-encoded) |
184 | + :param digest_bytes: digest size in bytes; default is 20 (160 bits) |
185 | + """ |
186 | for (i, leaf) in enumerate(leaves): |
187 | if len(leaf) != digest_bytes: |
188 | raise ValueError('digest_bytes=%d, but len(leaves[%d]) is %d' % ( |
189 | @@ -295,6 +413,13 @@ |
190 | |
191 | |
192 | def unpack_leaves(data, digest_bytes=20): |
193 | + """ |
194 | + Unpack binary *data* into a list of leaf digests. |
195 | + |
196 | + :param data: a ``bytes`` instance containing the packed leaf digests |
197 | + :param digest_bytes: digest size in bytes; default is 20 (160 bits) |
198 | + """ |
199 | + assert isinstance(data, bytes) |
200 | if len(data) % digest_bytes != 0: |
201 | raise ValueError( |
202 | 'len(data)=%d, not multiple of digest_bytes=%d' % ( |
203 | @@ -349,7 +474,6 @@ |
204 | return False |
205 | |
206 | |
207 | - |
208 | class FileStore(object): |
209 | """ |
210 | Arranges files in a special layout according to their content-hash. |
211 | @@ -429,52 +553,8 @@ |
212 | def __repr__(self): |
213 | return '%s(%r)' % (self.__class__.__name__, self.base) |
214 | |
215 | - @staticmethod |
216 | - def relpath(chash, ext=None): |
217 | - """ |
218 | - Relative path of file with *chash*, ending with *ext*. |
219 | - |
220 | - For example: |
221 | - |
222 | - >>> FileStore.relpath('NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW') |
223 | - ('NW', 'BNVXVK5DQGIOW7MYR4K3KA5K22W7NW') |
224 | - |
225 | - Or with the file extension *ext*: |
226 | - |
227 | - >>> FileStore.relpath('NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW', ext='mov') |
228 | - ('NW', 'BNVXVK5DQGIOW7MYR4K3KA5K22W7NW.mov') |
229 | - |
230 | - Also see `FileStore.reltmp()`. |
231 | - """ |
232 | - chash = safe_b32(chash) |
233 | - dname = chash[:2] |
234 | - fname = chash[2:] |
235 | - if ext: |
236 | - return (dname, '.'.join((fname, safe_ext(ext)))) |
237 | - return (dname, fname) |
238 | - |
239 | - @staticmethod |
240 | - def reltmp(chash, ext=None): |
241 | - """ |
242 | - Relative path of temporary file with *chash*, ending with *ext*. |
243 | - |
244 | - For example: |
245 | - |
246 | - >>> FileStore.reltmp('NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW') |
247 | - ('transfers', 'NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW') |
248 | - |
249 | - Or with the file extension *ext*: |
250 | - |
251 | - >>> FileStore.reltmp('NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW', ext='mov') |
252 | - ('transfers', 'NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW.mov') |
253 | - |
254 | - Also see `FileStore.relpath()`. |
255 | - """ |
256 | - chash = safe_b32(chash) |
257 | - if ext: |
258 | - return (TRANSFERS_DIR, '.'.join([chash, safe_ext(ext)])) |
259 | - return (TRANSFERS_DIR, chash) |
260 | - |
261 | + ############################################ |
262 | + # Methods to prevent path traversals attacks |
263 | def check_path(self, pathname): |
264 | """ |
265 | Verify that *pathname* in inside this filestore base directory. |
266 | @@ -500,7 +580,6 @@ |
267 | >>> fs.join('NW', 'BNVXVK5DQGIOW7MYR4K3KA5K22W7NW') #doctest: +ELLIPSIS |
268 | '/tmp/store.../NW/BNVXVK5DQGIOW7MYR4K3KA5K22W7NW' |
269 | |
270 | - |
271 | However, a `FileStoreTraversal` is raised if *parts* cause a path |
272 | traversal outside of the `FileStore` base directory: |
273 | |
274 | @@ -509,7 +588,6 @@ |
275 | ... |
276 | FileStoreTraversal: '/tmp/ssh' outside base '/tmp/store...' |
277 | |
278 | - |
279 | Or Likewise if an absolute path is included in *parts*: |
280 | |
281 | >>> fs.join('NW', '/etc', 'ssh') #doctest: +ELLIPSIS |
282 | @@ -517,7 +595,6 @@ |
283 | ... |
284 | FileStoreTraversal: '/etc/ssh' outside base '/tmp/store...' |
285 | |
286 | - |
287 | Also see `FileStore.create_parent()`. |
288 | """ |
289 | fullpath = path.join(self.base, *parts) |
290 | @@ -536,7 +613,6 @@ |
291 | ... |
292 | FileStoreTraversal: '/foo/my.ogv' outside base '/tmp/store...' |
293 | |
294 | - |
295 | It also protects against malicious filenames like this: |
296 | |
297 | >>> fs.create_parent('/foo/../bar/my.ogv') #doctest: +ELLIPSIS |
298 | @@ -544,7 +620,6 @@ |
299 | ... |
300 | FileStoreTraversal: '/bar/my.ogv' outside base '/tmp/store...' |
301 | |
302 | - |
303 | If doesn't already exists, the directory containing *filename* is |
304 | created. Returns the directory containing *filename*. |
305 | |
306 | @@ -556,6 +631,36 @@ |
307 | os.makedirs(containing) |
308 | return containing |
309 | |
310 | + |
311 | + ################################################# |
312 | + # Methods for working with files in the FileStore |
313 | + @staticmethod |
314 | + def relpath(chash, ext=None): |
315 | + """ |
316 | + Relative path of file with *chash*, ending with *ext*. |
317 | + |
318 | + For example: |
319 | + |
320 | + >>> FileStore.relpath('NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW') |
321 | + ('NW', 'BNVXVK5DQGIOW7MYR4K3KA5K22W7NW') |
322 | + |
323 | + Or with the file extension *ext*: |
324 | + |
325 | + >>> FileStore.relpath('NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW', ext='mov') |
326 | + ('NW', 'BNVXVK5DQGIOW7MYR4K3KA5K22W7NW.mov') |
327 | + |
328 | + Also see `FileStore.reltmp()`. |
329 | + |
330 | + :param chash: base32-encoded content-hash |
331 | + :param ext: normalized lowercase file extension, eg ``'mov'`` |
332 | + """ |
333 | + chash = safe_b32(chash) |
334 | + dname = chash[:2] |
335 | + fname = chash[2:] |
336 | + if ext: |
337 | + return (dname, '.'.join((fname, safe_ext(ext)))) |
338 | + return (dname, fname) |
339 | + |
340 | def path(self, chash, ext=None, create=False): |
341 | """ |
342 | Returns path of file with content-hash *chash* and extension *ext*. |
343 | @@ -566,15 +671,18 @@ |
344 | >>> fs.path('NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW') #doctest: +ELLIPSIS |
345 | '/tmp/store.../NW/BNVXVK5DQGIOW7MYR4K3KA5K22W7NW' |
346 | |
347 | - |
348 | Or with a file extension: |
349 | |
350 | >>> fs.path('NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW', 'txt') #doctest: +ELLIPSIS |
351 | '/tmp/store.../NW/BNVXVK5DQGIOW7MYR4K3KA5K22W7NW.txt' |
352 | |
353 | - |
354 | If called with ``create=True``, the parent directory is created with |
355 | `FileStore.create_parent()`. |
356 | + |
357 | + :param chash: base32-encoded content-hash |
358 | + :param ext: normalized lowercase file extension, eg ``'mov'`` |
359 | + :param create: if ``True``, create parent directory if it does not |
360 | + already exist; default is ``False`` |
361 | """ |
362 | filename = self.join(*self.relpath(chash, ext)) |
363 | if create: |
364 | @@ -584,23 +692,81 @@ |
365 | def exists(self, chash, ext=None): |
366 | """ |
367 | Return ``True`` if a file with *chash* and *ext* exists. |
368 | + |
369 | + :param chash: base32-encoded content-hash |
370 | + :param ext: normalized lowercase file extension, eg ``'mov'`` |
371 | """ |
372 | return path.isfile(self.path(chash, ext)) |
373 | |
374 | def open(self, chash, ext=None): |
375 | """ |
376 | Open the file with *chash* and *ext* in ``'rb'`` mode. |
377 | + |
378 | + :param chash: base32-encoded content-hash |
379 | + :param ext: normalized lowercase file extension, eg ``'mov'`` |
380 | """ |
381 | return open(self.path(chash, ext), 'rb') |
382 | |
383 | + def verify(self, chash, ext=None): |
384 | + """ |
385 | + Verify integrity of file with *chash* and *ext*. |
386 | + |
387 | + If the file's content-hash does not equal *chash*, an `IntegrityError` |
388 | + is raised. |
389 | + |
390 | + Otherwise, the open ``file`` is returned after calling ``file.seek(0)`` |
391 | + to put read position back at the start of the file. |
392 | + |
393 | + :param chash: base32-encoded content-hash |
394 | + :param ext: normalized lowercase file extension, eg ``'mov'`` |
395 | + """ |
396 | + src_fp = self.open(chash, ext) |
397 | + h = HashList(src_fp) |
398 | + got = h.run() |
399 | + if got != chash: |
400 | + raise IntegrityError(got=got, expected=chash, filename=src_fp.name) |
401 | + src_fp.seek(0) |
402 | + return src_fp |
403 | + |
404 | def remove(self, chash, ext=None): |
405 | """ |
406 | Delete file with *chash* and *ext* from underlying filesystem. |
407 | + |
408 | + :param chash: base32-encoded content-hash |
409 | + :param ext: normalized lowercase file extension, eg ``'mov'`` |
410 | """ |
411 | filename = self.path(chash, ext) |
412 | log.info('Deleting file %r from %r', filename, self) |
413 | os.remove(filename) |
414 | |
415 | + |
416 | + ########################################################### |
417 | + # Methods for working with temporary files in the FileStore |
418 | + @staticmethod |
419 | + def reltmp(chash, ext=None): |
420 | + """ |
421 | + Relative path of temporary file with *chash*, ending with *ext*. |
422 | + |
423 | + For example: |
424 | + |
425 | + >>> FileStore.reltmp('NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW') |
426 | + ('transfers', 'NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW') |
427 | + |
428 | + Or with the file extension *ext*: |
429 | + |
430 | + >>> FileStore.reltmp('NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW', ext='mov') |
431 | + ('transfers', 'NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW.mov') |
432 | + |
433 | + Also see `FileStore.relpath()`. |
434 | + |
435 | + :param chash: base32-encoded content-hash |
436 | + :param ext: normalized lowercase file extension, eg ``'mov'`` |
437 | + """ |
438 | + chash = safe_b32(chash) |
439 | + if ext: |
440 | + return (TRANSFERS_DIR, '.'.join([chash, safe_ext(ext)])) |
441 | + return (TRANSFERS_DIR, chash) |
442 | + |
443 | def tmp(self, chash, ext=None, create=False): |
444 | """ |
445 | Returns path of temporary file with *chash*, ending with *ext*. |
446 | @@ -612,15 +778,18 @@ |
447 | >>> fs.tmp('NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW') #doctest: +ELLIPSIS |
448 | '/tmp/store.../transfers/NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW' |
449 | |
450 | - |
451 | Or with a file extension: |
452 | |
453 | >>> fs.tmp('NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW', 'txt') #doctest: +ELLIPSIS |
454 | '/tmp/store.../transfers/NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW.txt' |
455 | |
456 | - |
457 | If called with ``create=True``, the parent directory is created with |
458 | `FileStore.create_parent()`. |
459 | + |
460 | + :param chash: base32-encoded content-hash |
461 | + :param ext: normalized lowercase file extension, eg ``'mov'`` |
462 | + :param create: if ``True``, create parent directory if it does not |
463 | + already exist; default is ``False`` |
464 | """ |
465 | filename = self.join(*self.reltmp(chash, ext)) |
466 | if create: |
467 | @@ -628,6 +797,36 @@ |
468 | return filename |
469 | |
470 | def allocate_for_transfer(self, size, chash, ext=None): |
471 | + """ |
472 | + Open the canonical temporary file for a transfer (download or upload). |
473 | + |
474 | + When transferring files from other dmedia peers, the content-hash is |
475 | + already known. As we must be able to easily resume a download or |
476 | + upload, transfers use a stable, canonical temporary filename derived |
477 | + from the content-hash and file extension. |
478 | + |
479 | + The file *size* is also known, so an attempt is made to efficiently |
480 | + pre-allocate the temporary file using `fallocate()`. |
481 | + |
482 | + If the temporary file already exists, it means we're resuming a |
483 | + transfer. The file is opened in ``'r+b'`` mode, leaving data in the |
484 | + temporary file intact. It is the responsibility of higher-level code |
485 | + to verify the file leaf by leaf in order to determine what portions of |
486 | + the file have been transfered, what portions of the file still need to |
487 | + be transferred. |
488 | + |
489 | + Note that as the temporary file will likely be pre-allocated, higher- |
490 | + level code cannot use the size of the temporary file as a means of |
491 | + determining how much of the file has been transfered. |
492 | + |
493 | + If the temporary does not exist, and cannot be pre-allocated, a new |
494 | + empty file is opened in ``'wb'`` mode. Higher-level code must check |
495 | + the mode of the ``file`` instance and act accordingly. |
496 | + |
497 | + :param size: file size in bytes (an ``int``) |
498 | + :param chash: base32-encoded content-hash |
499 | + :param ext: normalized lowercase file extension, eg ``'mov'`` |
500 | + """ |
501 | filename = self.tmp(chash, ext, create=True) |
502 | fallocate(size, filename) |
503 | try: |
504 | @@ -639,6 +838,23 @@ |
505 | return open(filename, 'wb') |
506 | |
507 | def allocate_for_import(self, size, ext=None): |
508 | + """ |
509 | + Open a random temporary file for an import operation. |
510 | + |
511 | + When importing a file, the content-hash is computed as the file is |
512 | + copied into the `FileStore`. As the content-hash isn't known when |
513 | + allocating the temporary file, a randomly named temporary file is used. |
514 | + |
515 | + However, the file *size* is known, so an attempt is made to efficiently |
516 | + pre-allocate the temporary file using `fallocate()`. |
517 | + |
518 | + The file extension *ext* is optional and serves no other purpose than to |
519 | + aid in debugging. The value of *ext* used here has no effect on the |
520 | + ultimate canonical file name. |
521 | + |
522 | + :param size: file size in bytes (an ``int``) |
523 | + :param ext: normalized lowercase file extension, eg ``'mov'`` |
524 | + """ |
525 | imports = self.join(IMPORTS_DIR) |
526 | if not path.exists(imports): |
527 | os.makedirs(imports) |
528 | @@ -662,7 +878,6 @@ |
529 | >>> fs.tmp_move(tmp_fp, chash, 'mov') #doctest: +ELLIPSIS |
530 | '/tmp/store.../ZR/765XWSF6S7JQHLUI4GCG5BHGPE252O.mov' |
531 | |
532 | - |
533 | Note, however, that this method does *not* verify the content hash of |
534 | the temporary file! This is by design as many operations will compute |
535 | the content hash as they write to the temporary file. Other operations |
536 | @@ -756,7 +971,6 @@ |
537 | >>> tmp #doctest: +ELLIPSIS |
538 | '/tmp/store.../transfers/ZR765XWSF6S7JQHLUI4GCG5BHGPE252O.mov' |
539 | |
540 | - |
541 | Then the downloader will write to the temporary file as it's being |
542 | downloaded: |
543 | |
544 | @@ -771,7 +985,6 @@ |
545 | ... |
546 | >>> tmp_fp.close() |
547 | |
548 | - |
549 | Finally, the downloader will move the temporary file into its canonical |
550 | location: |
551 | |
552 | @@ -779,7 +992,6 @@ |
553 | >>> dst #doctest: +ELLIPSIS |
554 | '/tmp/store.../ZR/765XWSF6S7JQHLUI4GCG5BHGPE252O.mov' |
555 | |
556 | - |
557 | The return value is the absolute path of the canonical file. |
558 | |
559 | :param chash: base32-encoded content-hash |
560 | |
561 | === modified file 'dmedia/schema.py' |
562 | --- dmedia/schema.py 2011-02-08 07:15:02 +0000 |
563 | +++ dmedia/schema.py 2011-02-15 03:58:48 +0000 |
564 | @@ -51,8 +51,8 @@ |
565 | |
566 | |
567 | These test functions are used in the dmedia test suite, and 3rd-party apps would |
568 | -be well served by doing the same. Please read on for the rationale for some of |
569 | -the key dmedia schema design decisions... |
570 | +be well served by doing the same. Please read on for the rationale of some key |
571 | +dmedia schema design decisions... |
572 | |
573 | |
574 | |
575 | |
576 | === modified file 'dmedia/tests/test_filestore.py' |
577 | --- dmedia/tests/test_filestore.py 2011-02-12 10:56:53 +0000 |
578 | +++ dmedia/tests/test_filestore.py 2011-02-15 03:58:48 +0000 |
579 | @@ -499,70 +499,16 @@ |
580 | self.assertTrue(inst.base.startswith('/tmp/store.')) |
581 | self.assertEqual(inst.record, path.join(inst.base, 'store.json')) |
582 | |
583 | - def test_relpath(self): |
584 | - self.assertEqual( |
585 | - self.klass.relpath('NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW'), |
586 | - ('NW', 'BNVXVK5DQGIOW7MYR4K3KA5K22W7NW') |
587 | - ) |
588 | - self.assertEqual( |
589 | - self.klass.relpath('NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW', ext='ogv'), |
590 | - ('NW', 'BNVXVK5DQGIOW7MYR4K3KA5K22W7NW.ogv') |
591 | - ) |
592 | - |
593 | - # Test to make sure hashes are getting checked with safe_b32(): |
594 | - bad = 'NWBNVXVK5..GIOW7MYR4K3KA5K22W7NW' |
595 | - e = raises(ValueError, self.klass.relpath, bad) |
596 | - self.assertEqual( |
597 | - str(e), |
598 | - 'b32: cannot b32decode %r: Non-base32 digit found' % bad |
599 | - ) |
600 | - e = raises(ValueError, self.klass.relpath, bad, ext='ogv') |
601 | - self.assertEqual( |
602 | - str(e), |
603 | - 'b32: cannot b32decode %r: Non-base32 digit found' % bad |
604 | - ) |
605 | - |
606 | - # Test to make sure ext is getting checked with safe_ext(): |
607 | - chash = 'NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW' |
608 | - bad = '/../../../.ssh/id_pub' |
609 | - e = raises(ValueError, self.klass.relpath, chash, bad) |
610 | - self.assertEqual( |
611 | - str(e), |
612 | - 'ext %r does not match pattern %r' % (bad, EXT_PAT) |
613 | - ) |
614 | - |
615 | - def test_reltmp(self): |
616 | - self.assertEqual( |
617 | - self.klass.reltmp('NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW'), |
618 | - ('transfers', 'NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW') |
619 | - ) |
620 | - self.assertEqual( |
621 | - self.klass.reltmp('NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW', ext='ogv'), |
622 | - ('transfers', 'NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW.ogv') |
623 | - ) |
624 | - |
625 | - # Test to make sure hashes are getting checked with safe_b32(): |
626 | - bad = 'NWBNVXVK5..GIOW7MYR4K3KA5K22W7NW' |
627 | - e = raises(ValueError, self.klass.reltmp, bad) |
628 | - self.assertEqual( |
629 | - str(e), |
630 | - 'b32: cannot b32decode %r: Non-base32 digit found' % bad |
631 | - ) |
632 | - e = raises(ValueError, self.klass.reltmp, bad, ext='ogv') |
633 | - self.assertEqual( |
634 | - str(e), |
635 | - 'b32: cannot b32decode %r: Non-base32 digit found' % bad |
636 | - ) |
637 | - |
638 | - # Test to make sure ext is getting checked with safe_ext(): |
639 | - chash = 'NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW' |
640 | - bad = '/../../../.ssh/id_pub' |
641 | - e = raises(ValueError, self.klass.reltmp, chash, bad) |
642 | - self.assertEqual( |
643 | - str(e), |
644 | - 'ext %r does not match pattern %r' % (bad, EXT_PAT) |
645 | - ) |
646 | - |
647 | + def test_repr(self): |
648 | + tmp = TempDir() |
649 | + inst = self.klass(tmp.path) |
650 | + self.assertEqual( |
651 | + repr(inst), |
652 | + 'FileStore(%r)' % tmp.path |
653 | + ) |
654 | + |
655 | + ############################################ |
656 | + # Methods to prevent path traversals attacks |
657 | def test_check_path(self): |
658 | tmp = TempDir() |
659 | base = tmp.join('foo', 'bar') |
660 | @@ -694,6 +640,41 @@ |
661 | self.assertFalse(path.exists(bad)) |
662 | self.assertFalse(path.exists(baddir)) |
663 | |
664 | + |
665 | + ################################################# |
666 | + # Methods for working with files in the FileStore |
667 | + def test_relpath(self): |
668 | + self.assertEqual( |
669 | + self.klass.relpath('NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW'), |
670 | + ('NW', 'BNVXVK5DQGIOW7MYR4K3KA5K22W7NW') |
671 | + ) |
672 | + self.assertEqual( |
673 | + self.klass.relpath('NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW', ext='ogv'), |
674 | + ('NW', 'BNVXVK5DQGIOW7MYR4K3KA5K22W7NW.ogv') |
675 | + ) |
676 | + |
677 | + # Test to make sure hashes are getting checked with safe_b32(): |
678 | + bad = 'NWBNVXVK5..GIOW7MYR4K3KA5K22W7NW' |
679 | + e = raises(ValueError, self.klass.relpath, bad) |
680 | + self.assertEqual( |
681 | + str(e), |
682 | + 'b32: cannot b32decode %r: Non-base32 digit found' % bad |
683 | + ) |
684 | + e = raises(ValueError, self.klass.relpath, bad, ext='ogv') |
685 | + self.assertEqual( |
686 | + str(e), |
687 | + 'b32: cannot b32decode %r: Non-base32 digit found' % bad |
688 | + ) |
689 | + |
690 | + # Test to make sure ext is getting checked with safe_ext(): |
691 | + chash = 'NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW' |
692 | + bad = '/../../../.ssh/id_pub' |
693 | + e = raises(ValueError, self.klass.relpath, chash, bad) |
694 | + self.assertEqual( |
695 | + str(e), |
696 | + 'ext %r does not match pattern %r' % (bad, EXT_PAT) |
697 | + ) |
698 | + |
699 | def test_path(self): |
700 | tmp = TempDir() |
701 | base = tmp.join('foo', 'bar') |
702 | @@ -751,6 +732,108 @@ |
703 | self.assertFalse(path.exists(f)) |
704 | self.assertTrue(path.isdir(d)) |
705 | |
706 | + def test_exists(self): |
707 | + tmp = TempDir() |
708 | + inst = self.klass(tmp.path) |
709 | + |
710 | + # File doesn't exist |
711 | + self.assertFalse(inst.exists(mov_hash, 'mov')) |
712 | + |
713 | + # File exists |
714 | + tmp.touch(mov_hash[:2], mov_hash[2:] + '.mov') |
715 | + self.assertTrue(inst.exists(mov_hash, 'mov')) |
716 | + |
717 | + def test_open(self): |
718 | + tmp = TempDir() |
719 | + inst = self.klass(tmp.path) |
720 | + |
721 | + # File doesn't exist |
722 | + e = raises(IOError, inst.open, mov_hash, 'mov') |
723 | + self.assertEqual(e.errno, 2) |
724 | + |
725 | + # File exists |
726 | + canonical = inst.path(mov_hash, 'mov', create=True) |
727 | + shutil.copy2(sample_mov, canonical) |
728 | + fp = inst.open(mov_hash, 'mov') |
729 | + self.assertTrue(isinstance(fp, file)) |
730 | + self.assertEqual(fp.mode, 'rb') |
731 | + self.assertEqual(fp.tell(), 0) |
732 | + self.assertEqual(fp.name, canonical) |
733 | + |
734 | + def test_verify(self): |
735 | + tmp = TempDir() |
736 | + inst = self.klass(tmp.path) |
737 | + |
738 | + # File doesn't exist |
739 | + e = raises(IOError, inst.verify, mov_hash, 'mov') |
740 | + self.assertEqual(e.errno, 2) |
741 | + |
742 | + # File exists |
743 | + canonical = inst.path(mov_hash, 'mov', create=True) |
744 | + shutil.copy2(sample_mov, canonical) |
745 | + fp = inst.verify(mov_hash, 'mov') |
746 | + self.assertTrue(isinstance(fp, file)) |
747 | + self.assertEqual(fp.mode, 'rb') |
748 | + self.assertEqual(fp.tell(), 0) |
749 | + self.assertEqual(fp.name, canonical) |
750 | + |
751 | + # File has wrong content-hash |
752 | + os.remove(canonical) |
753 | + shutil.copy2(sample_thm, canonical) |
754 | + e = raises(IntegrityError, inst.verify, mov_hash, 'mov') |
755 | + self.assertEqual(e.got, thm_hash) |
756 | + self.assertEqual(e.expected, mov_hash) |
757 | + self.assertEqual(e.filename, canonical) |
758 | + |
759 | + def test_remove(self): |
760 | + tmp = TempDir() |
761 | + inst = self.klass(tmp.path) |
762 | + |
763 | + # File doesn't exist |
764 | + e = raises(OSError, inst.remove, mov_hash, 'mov') |
765 | + self.assertEqual(e.errno, 2) |
766 | + |
767 | + # File exists |
768 | + f = tmp.touch(mov_hash[:2], mov_hash[2:] + '.mov') |
769 | + self.assertTrue(path.isfile(f)) |
770 | + inst.remove(mov_hash, 'mov') |
771 | + self.assertFalse(path.exists(f)) |
772 | + |
773 | + |
774 | + ########################################################### |
775 | + # Methods for working with temporary files in the FileStore |
776 | + def test_reltmp(self): |
777 | + self.assertEqual( |
778 | + self.klass.reltmp('NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW'), |
779 | + ('transfers', 'NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW') |
780 | + ) |
781 | + self.assertEqual( |
782 | + self.klass.reltmp('NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW', ext='ogv'), |
783 | + ('transfers', 'NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW.ogv') |
784 | + ) |
785 | + |
786 | + # Test to make sure hashes are getting checked with safe_b32(): |
787 | + bad = 'NWBNVXVK5..GIOW7MYR4K3KA5K22W7NW' |
788 | + e = raises(ValueError, self.klass.reltmp, bad) |
789 | + self.assertEqual( |
790 | + str(e), |
791 | + 'b32: cannot b32decode %r: Non-base32 digit found' % bad |
792 | + ) |
793 | + e = raises(ValueError, self.klass.reltmp, bad, ext='ogv') |
794 | + self.assertEqual( |
795 | + str(e), |
796 | + 'b32: cannot b32decode %r: Non-base32 digit found' % bad |
797 | + ) |
798 | + |
799 | + # Test to make sure ext is getting checked with safe_ext(): |
800 | + chash = 'NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW' |
801 | + bad = '/../../../.ssh/id_pub' |
802 | + e = raises(ValueError, self.klass.reltmp, chash, bad) |
803 | + self.assertEqual( |
804 | + str(e), |
805 | + 'ext %r does not match pattern %r' % (bad, EXT_PAT) |
806 | + ) |
807 | + |
808 | def test_tmp(self): |
809 | tmp = TempDir() |
810 | inst = self.klass(tmp.path) |
Nice Documentation, don't notice any blatant issues...approved.