Merge lp:~jderose/dmedia/document-filestore-design into lp:dmedia

Proposed by Jason Gerard DeRose
Status: Merged
Approved by: Jason Gerard DeRose
Approved revision: 172
Merged at revision: 160
Proposed branch: lp:~jderose/dmedia/document-filestore-design
Merge into: lp:dmedia
Diff against target: 810 lines (+438/-143)
3 files modified
dmedia/filestore.py (+289/-77)
dmedia/schema.py (+2/-2)
dmedia/tests/test_filestore.py (+147/-64)
To merge this branch: bzr merge lp:~jderose/dmedia/document-filestore-design
Reviewer Review Type Date Requested Status
dmedia Dev Pending
Review via email: mp+49763@code.launchpad.net

Description of the change

Mostly additional docstrings and cosmetic changes.

 * Documented key FileStore design decisions in filestore.py module docstring

 * Reordered FileStore methods into a bit more logical order, save with FileStore method tests

 * Added docstrings for FileStore methods that lacked them

 * Added tests for FileStore.exists(), open(), and remove() methods

 * Added new FileStore.verify() method and test

 * Other misc cleanup

To post a comment you must log in.
Revision history for this message
David Jordan (dmj726) wrote :

Nice Documentation, don't notice any blatant issues...approved.

Revision history for this message
Jason Gerard DeRose (jderose) wrote :

Thanks for the review!

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'dmedia/filestore.py'
2--- dmedia/filestore.py 2011-02-13 06:49:38 +0000
3+++ dmedia/filestore.py 2011-02-15 03:58:48 +0000
4@@ -21,20 +21,134 @@
5 # with `dmedia`. If not, see <http://www.gnu.org/licenses/>.
6
7 """
8-Store media files in a special layout according to their content hash.
9-
10-Security note: this module must be carefully designed to prevent path traversal!
11-Two lines of defense are used:
12-
13- * `safe_b32()` and `safe_ext()` validate the (assumed) untrusted *chash*,
14- *quickid*, and *ext* values
15-
16- * `FileStore.join()` and `FileStore.create_parent()` check the paths they
17- create to insure that the path did not traverse outside of the file store
18- base directory
19-
20-Either should fully prevent path traversal but are used together for extra
21-safety.
22+Store files in a special layout according to their content-hash.
23+
24+The `FileStore` is the heart of dmedia. Files are assigned a canonical name
25+based on the file's content-hash, and are placed in a special layout within the
26+`FileStore` base directory.
27+
28+The files in a `FileStore` are read-only... they must be as modifying a file
29+will change its content-hash. The only way to modify a file is to copy the
30+original to a temporary file, modify it, and then place the new file into the
31+`FileStore`. This might seem like an unreasonable restriction, but it perfectly
32+captures the use case dmedia is concerned with... a distributed library of media
33+files.
34+
35+On the content-creation side, non-destructive editing is certainly the best
36+practice, especially in professional use cases. On the content consumption
37+side, modifying a file is rather rare. And the somewhat common use case --
38+modifying a file for the sake of updating metadata (say, EXIF) -- can instead be
39+accomplished by updating metadata in the corresponding CouchDB document.
40+
41+Importantly, without the read-only restriction, it would be impossible to make a
42+distributed file system whose file operations remain robust and atomic in the
43+face of arbitrary and prolonged network outages. True to its CouchDB
44+foundations, dmedia is designing with the assumption that network connectivity
45+is the exception rather than the rule.
46+
47+Please read on for the rationale of some key `FileStore` design decisions...
48+
49+
50+Design Decision: base32-encoded content-hash
51+============================================
52+
53+The `FileStore` layout was designed to allow the canonical filename to be
54+constructed from the content-hash in the simplest way possible, without
55+requiring any special decoding or encoding. For this reason, the content-hash
56+(as stored in CouchDB) is base32-encoded.
57+
58+Base32-encoding was chosen because:
59+
60+ 1. It's more compact than base16/hex
61+
62+ 2. It can be used to name files on case *insensitive* filesystems (whereas
63+ base64-encoding cannot)
64+
65+Inside the `FileStore`, the first 2 characters of the content-hash are used for
66+the subdirectory name, and the remaining characters for the filename within that
67+subdirectory. For example:
68+
69+>>> from os import path
70+>>> chash = 'ZR765XWSF6S7JQHLUI4GCG5BHGPE252O'
71+>>> path.join('/foo', chash[:2], chash[2:])
72+'/foo/ZR/765XWSF6S7JQHLUI4GCG5BHGPE252O'
73+
74+
75+Design Decision: canonical filenames have file extensions
76+=========================================================
77+
78+Strictly speaking, there is no technical reason to include a file extension on
79+the canonical filenames. However, there are some practical reasons that make
80+including the file extension worthwhile, despite additional complexity it adds
81+to the `FileStore` API.
82+
83+Most importantly, it allows files in a `FileStore` layout to be served with the
84+correct Content-Type by a vanilla web-server. A key design goal was to be able
85+to point, say, Apache at a dmedia `FileStore` directory have a useful dmedia
86+file server without requiring special Apache plugins for dmedia integration.
87+
88+It also provides broader software compatibility as many applications and
89+libraries do rely on the file extension for type determination. And the file
90+extension is helpful for developers, as a bit of intelligible information in
91+canonical filename will make the layout easier to explore, aid debugging.
92+
93+The current `FileStore` always includes the file extension on the canonical name
94+when the extension is provided by the calling code. However, the API is
95+designed to accommodate `FileStore` implementations that do not include the
96+file extension. The API is also designed so that the calling code isn't
97+required to provide the file extension... say, if the extension was ever removed
98+from the CouchDB schema.
99+
100+To accomplish this, files are identified by the content-hash and extension
101+together, and the extension is optional, defaulting to ``None``. This is the
102+typical calling signature:
103+
104+>>> def canonical(chash, ext=None):
105+... pass
106+
107+For example:
108+
109+>>> FileStore.relpath('ZR765XWSF6S7JQHLUI4GCG5BHGPE252O')
110+('ZR', '765XWSF6S7JQHLUI4GCG5BHGPE252O')
111+>>> FileStore.relpath('ZR765XWSF6S7JQHLUI4GCG5BHGPE252O', 'mov')
112+('ZR', '765XWSF6S7JQHLUI4GCG5BHGPE252O.mov')
113+
114+
115+Design Decision: security good, path traversals bad
116+===================================================
117+
118+The `FileStore` is probably the most security sensitive part of dmedia in that
119+untrusted data (content-hash, file extension) is used to construct paths on the
120+filesystem. This means that the `FileStore` must be carefully designed to
121+prevent path traversal attacks (aka directory traversal attacks).
122+
123+Two lines of defense are used. First, the content-hash and file extension are
124+validated with the following functions:
125+
126+ * `safe_b32()` - validates the content-hash
127+
128+ * `safe_ext()` - validates the file extension
129+
130+Second, there are methods that ensure that paths constructed relative to the
131+`FileStore` base directory cannot be outside of the base directory:
132+
133+ * `FileStore.check_path()` - ensures that a path is inside the base
134+ directory
135+
136+ * `FileStore.join()` - creates a path relative to the base directory,
137+ ensures resulting path is inside the base directory
138+
139+ * `FileStore.create_parent()` - creates a file's parent directory only if
140+ that parent directory is inside the base directory
141+
142+Each line of defense is designed to fully prevent path traversals, assumes the
143+other defense doesn't exist or will fail. Together, they should provide a
144+strong defense against path traversal attacks.
145+
146+If you discover any security vulnerability in dmedia, please immediately file a
147+bug:
148+
149+ https://bugs.launchpad.net/dmedia/+filebug
150 """
151
152 import os
153@@ -77,13 +191,11 @@
154 ...
155 AmbiguousPath: '/foo/../root' resolves to '/root'
156
157-
158 Otherwise *pathname* is returned unchanged:
159
160 >>> safe_path('/foo/bar')
161 '/foo/bar'
162
163-
164 Also see `safe_open()`.
165 """
166 if path.abspath(pathname) != pathname:
167@@ -106,7 +218,6 @@
168 ...
169 AmbiguousPath: '/foo/../root' resolves to '/root'
170
171-
172 Otherwise returns a ``file`` instance created with ``open()``.
173 """
174 return open(safe_path(filename), mode)
175@@ -285,6 +396,13 @@
176
177
178 def pack_leaves(leaves, digest_bytes=20):
179+ """
180+ Pack leaves together into a ``bytes`` instance for CouchDB attachment.
181+
182+ :param leaves: a ``list`` containing content-hash of each leaf in the file
183+ (content-hash is binary digest, not base32-encoded)
184+ :param digest_bytes: digest size in bytes; default is 20 (160 bits)
185+ """
186 for (i, leaf) in enumerate(leaves):
187 if len(leaf) != digest_bytes:
188 raise ValueError('digest_bytes=%d, but len(leaves[%d]) is %d' % (
189@@ -295,6 +413,13 @@
190
191
192 def unpack_leaves(data, digest_bytes=20):
193+ """
194+ Unpack binary *data* into a list of leaf digests.
195+
196+ :param data: a ``bytes`` instance containing the packed leaf digests
197+ :param digest_bytes: digest size in bytes; default is 20 (160 bits)
198+ """
199+ assert isinstance(data, bytes)
200 if len(data) % digest_bytes != 0:
201 raise ValueError(
202 'len(data)=%d, not multiple of digest_bytes=%d' % (
203@@ -349,7 +474,6 @@
204 return False
205
206
207-
208 class FileStore(object):
209 """
210 Arranges files in a special layout according to their content-hash.
211@@ -429,52 +553,8 @@
212 def __repr__(self):
213 return '%s(%r)' % (self.__class__.__name__, self.base)
214
215- @staticmethod
216- def relpath(chash, ext=None):
217- """
218- Relative path of file with *chash*, ending with *ext*.
219-
220- For example:
221-
222- >>> FileStore.relpath('NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW')
223- ('NW', 'BNVXVK5DQGIOW7MYR4K3KA5K22W7NW')
224-
225- Or with the file extension *ext*:
226-
227- >>> FileStore.relpath('NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW', ext='mov')
228- ('NW', 'BNVXVK5DQGIOW7MYR4K3KA5K22W7NW.mov')
229-
230- Also see `FileStore.reltmp()`.
231- """
232- chash = safe_b32(chash)
233- dname = chash[:2]
234- fname = chash[2:]
235- if ext:
236- return (dname, '.'.join((fname, safe_ext(ext))))
237- return (dname, fname)
238-
239- @staticmethod
240- def reltmp(chash, ext=None):
241- """
242- Relative path of temporary file with *chash*, ending with *ext*.
243-
244- For example:
245-
246- >>> FileStore.reltmp('NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW')
247- ('transfers', 'NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW')
248-
249- Or with the file extension *ext*:
250-
251- >>> FileStore.reltmp('NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW', ext='mov')
252- ('transfers', 'NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW.mov')
253-
254- Also see `FileStore.relpath()`.
255- """
256- chash = safe_b32(chash)
257- if ext:
258- return (TRANSFERS_DIR, '.'.join([chash, safe_ext(ext)]))
259- return (TRANSFERS_DIR, chash)
260-
261+ ############################################
262+ # Methods to prevent path traversals attacks
263 def check_path(self, pathname):
264 """
265 Verify that *pathname* in inside this filestore base directory.
266@@ -500,7 +580,6 @@
267 >>> fs.join('NW', 'BNVXVK5DQGIOW7MYR4K3KA5K22W7NW') #doctest: +ELLIPSIS
268 '/tmp/store.../NW/BNVXVK5DQGIOW7MYR4K3KA5K22W7NW'
269
270-
271 However, a `FileStoreTraversal` is raised if *parts* cause a path
272 traversal outside of the `FileStore` base directory:
273
274@@ -509,7 +588,6 @@
275 ...
276 FileStoreTraversal: '/tmp/ssh' outside base '/tmp/store...'
277
278-
279 Or Likewise if an absolute path is included in *parts*:
280
281 >>> fs.join('NW', '/etc', 'ssh') #doctest: +ELLIPSIS
282@@ -517,7 +595,6 @@
283 ...
284 FileStoreTraversal: '/etc/ssh' outside base '/tmp/store...'
285
286-
287 Also see `FileStore.create_parent()`.
288 """
289 fullpath = path.join(self.base, *parts)
290@@ -536,7 +613,6 @@
291 ...
292 FileStoreTraversal: '/foo/my.ogv' outside base '/tmp/store...'
293
294-
295 It also protects against malicious filenames like this:
296
297 >>> fs.create_parent('/foo/../bar/my.ogv') #doctest: +ELLIPSIS
298@@ -544,7 +620,6 @@
299 ...
300 FileStoreTraversal: '/bar/my.ogv' outside base '/tmp/store...'
301
302-
303 If doesn't already exists, the directory containing *filename* is
304 created. Returns the directory containing *filename*.
305
306@@ -556,6 +631,36 @@
307 os.makedirs(containing)
308 return containing
309
310+
311+ #################################################
312+ # Methods for working with files in the FileStore
313+ @staticmethod
314+ def relpath(chash, ext=None):
315+ """
316+ Relative path of file with *chash*, ending with *ext*.
317+
318+ For example:
319+
320+ >>> FileStore.relpath('NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW')
321+ ('NW', 'BNVXVK5DQGIOW7MYR4K3KA5K22W7NW')
322+
323+ Or with the file extension *ext*:
324+
325+ >>> FileStore.relpath('NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW', ext='mov')
326+ ('NW', 'BNVXVK5DQGIOW7MYR4K3KA5K22W7NW.mov')
327+
328+ Also see `FileStore.reltmp()`.
329+
330+ :param chash: base32-encoded content-hash
331+ :param ext: normalized lowercase file extension, eg ``'mov'``
332+ """
333+ chash = safe_b32(chash)
334+ dname = chash[:2]
335+ fname = chash[2:]
336+ if ext:
337+ return (dname, '.'.join((fname, safe_ext(ext))))
338+ return (dname, fname)
339+
340 def path(self, chash, ext=None, create=False):
341 """
342 Returns path of file with content-hash *chash* and extension *ext*.
343@@ -566,15 +671,18 @@
344 >>> fs.path('NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW') #doctest: +ELLIPSIS
345 '/tmp/store.../NW/BNVXVK5DQGIOW7MYR4K3KA5K22W7NW'
346
347-
348 Or with a file extension:
349
350 >>> fs.path('NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW', 'txt') #doctest: +ELLIPSIS
351 '/tmp/store.../NW/BNVXVK5DQGIOW7MYR4K3KA5K22W7NW.txt'
352
353-
354 If called with ``create=True``, the parent directory is created with
355 `FileStore.create_parent()`.
356+
357+ :param chash: base32-encoded content-hash
358+ :param ext: normalized lowercase file extension, eg ``'mov'``
359+ :param create: if ``True``, create parent directory if it does not
360+ already exist; default is ``False``
361 """
362 filename = self.join(*self.relpath(chash, ext))
363 if create:
364@@ -584,23 +692,81 @@
365 def exists(self, chash, ext=None):
366 """
367 Return ``True`` if a file with *chash* and *ext* exists.
368+
369+ :param chash: base32-encoded content-hash
370+ :param ext: normalized lowercase file extension, eg ``'mov'``
371 """
372 return path.isfile(self.path(chash, ext))
373
374 def open(self, chash, ext=None):
375 """
376 Open the file with *chash* and *ext* in ``'rb'`` mode.
377+
378+ :param chash: base32-encoded content-hash
379+ :param ext: normalized lowercase file extension, eg ``'mov'``
380 """
381 return open(self.path(chash, ext), 'rb')
382
383+ def verify(self, chash, ext=None):
384+ """
385+ Verify integrity of file with *chash* and *ext*.
386+
387+ If the file's content-hash does not equal *chash*, an `IntegrityError`
388+ is raised.
389+
390+ Otherwise, the open ``file`` is returned after calling ``file.seek(0)``
391+ to put read position back at the start of the file.
392+
393+ :param chash: base32-encoded content-hash
394+ :param ext: normalized lowercase file extension, eg ``'mov'``
395+ """
396+ src_fp = self.open(chash, ext)
397+ h = HashList(src_fp)
398+ got = h.run()
399+ if got != chash:
400+ raise IntegrityError(got=got, expected=chash, filename=src_fp.name)
401+ src_fp.seek(0)
402+ return src_fp
403+
404 def remove(self, chash, ext=None):
405 """
406 Delete file with *chash* and *ext* from underlying filesystem.
407+
408+ :param chash: base32-encoded content-hash
409+ :param ext: normalized lowercase file extension, eg ``'mov'``
410 """
411 filename = self.path(chash, ext)
412 log.info('Deleting file %r from %r', filename, self)
413 os.remove(filename)
414
415+
416+ ###########################################################
417+ # Methods for working with temporary files in the FileStore
418+ @staticmethod
419+ def reltmp(chash, ext=None):
420+ """
421+ Relative path of temporary file with *chash*, ending with *ext*.
422+
423+ For example:
424+
425+ >>> FileStore.reltmp('NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW')
426+ ('transfers', 'NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW')
427+
428+ Or with the file extension *ext*:
429+
430+ >>> FileStore.reltmp('NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW', ext='mov')
431+ ('transfers', 'NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW.mov')
432+
433+ Also see `FileStore.relpath()`.
434+
435+ :param chash: base32-encoded content-hash
436+ :param ext: normalized lowercase file extension, eg ``'mov'``
437+ """
438+ chash = safe_b32(chash)
439+ if ext:
440+ return (TRANSFERS_DIR, '.'.join([chash, safe_ext(ext)]))
441+ return (TRANSFERS_DIR, chash)
442+
443 def tmp(self, chash, ext=None, create=False):
444 """
445 Returns path of temporary file with *chash*, ending with *ext*.
446@@ -612,15 +778,18 @@
447 >>> fs.tmp('NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW') #doctest: +ELLIPSIS
448 '/tmp/store.../transfers/NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW'
449
450-
451 Or with a file extension:
452
453 >>> fs.tmp('NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW', 'txt') #doctest: +ELLIPSIS
454 '/tmp/store.../transfers/NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW.txt'
455
456-
457 If called with ``create=True``, the parent directory is created with
458 `FileStore.create_parent()`.
459+
460+ :param chash: base32-encoded content-hash
461+ :param ext: normalized lowercase file extension, eg ``'mov'``
462+ :param create: if ``True``, create parent directory if it does not
463+ already exist; default is ``False``
464 """
465 filename = self.join(*self.reltmp(chash, ext))
466 if create:
467@@ -628,6 +797,36 @@
468 return filename
469
470 def allocate_for_transfer(self, size, chash, ext=None):
471+ """
472+ Open the canonical temporary file for a transfer (download or upload).
473+
474+ When transferring files from other dmedia peers, the content-hash is
475+ already known. As we must be able to easily resume a download or
476+ upload, transfers use a stable, canonical temporary filename derived
477+ from the content-hash and file extension.
478+
479+ The file *size* is also known, so an attempt is made to efficiently
480+ pre-allocate the temporary file using `fallocate()`.
481+
482+ If the temporary file already exists, it means we're resuming a
483+ transfer. The file is opened in ``'r+b'`` mode, leaving data in the
484+ temporary file intact. It is the responsibility of higher-level code
485+ to verify the file leaf by leaf in order to determine what portions of
486+ the file have been transfered, what portions of the file still need to
487+ be transferred.
488+
489+ Note that as the temporary file will likely be pre-allocated, higher-
490+ level code cannot use the size of the temporary file as a means of
491+ determining how much of the file has been transfered.
492+
493+ If the temporary does not exist, and cannot be pre-allocated, a new
494+ empty file is opened in ``'wb'`` mode. Higher-level code must check
495+ the mode of the ``file`` instance and act accordingly.
496+
497+ :param size: file size in bytes (an ``int``)
498+ :param chash: base32-encoded content-hash
499+ :param ext: normalized lowercase file extension, eg ``'mov'``
500+ """
501 filename = self.tmp(chash, ext, create=True)
502 fallocate(size, filename)
503 try:
504@@ -639,6 +838,23 @@
505 return open(filename, 'wb')
506
507 def allocate_for_import(self, size, ext=None):
508+ """
509+ Open a random temporary file for an import operation.
510+
511+ When importing a file, the content-hash is computed as the file is
512+ copied into the `FileStore`. As the content-hash isn't known when
513+ allocating the temporary file, a randomly named temporary file is used.
514+
515+ However, the file *size* is known, so an attempt is made to efficiently
516+ pre-allocate the temporary file using `fallocate()`.
517+
518+ The file extension *ext* is optional and serves no other purpose than to
519+ aid in debugging. The value of *ext* used here has no effect on the
520+ ultimate canonical file name.
521+
522+ :param size: file size in bytes (an ``int``)
523+ :param ext: normalized lowercase file extension, eg ``'mov'``
524+ """
525 imports = self.join(IMPORTS_DIR)
526 if not path.exists(imports):
527 os.makedirs(imports)
528@@ -662,7 +878,6 @@
529 >>> fs.tmp_move(tmp_fp, chash, 'mov') #doctest: +ELLIPSIS
530 '/tmp/store.../ZR/765XWSF6S7JQHLUI4GCG5BHGPE252O.mov'
531
532-
533 Note, however, that this method does *not* verify the content hash of
534 the temporary file! This is by design as many operations will compute
535 the content hash as they write to the temporary file. Other operations
536@@ -756,7 +971,6 @@
537 >>> tmp #doctest: +ELLIPSIS
538 '/tmp/store.../transfers/ZR765XWSF6S7JQHLUI4GCG5BHGPE252O.mov'
539
540-
541 Then the downloader will write to the temporary file as it's being
542 downloaded:
543
544@@ -771,7 +985,6 @@
545 ...
546 >>> tmp_fp.close()
547
548-
549 Finally, the downloader will move the temporary file into its canonical
550 location:
551
552@@ -779,7 +992,6 @@
553 >>> dst #doctest: +ELLIPSIS
554 '/tmp/store.../ZR/765XWSF6S7JQHLUI4GCG5BHGPE252O.mov'
555
556-
557 The return value is the absolute path of the canonical file.
558
559 :param chash: base32-encoded content-hash
560
561=== modified file 'dmedia/schema.py'
562--- dmedia/schema.py 2011-02-08 07:15:02 +0000
563+++ dmedia/schema.py 2011-02-15 03:58:48 +0000
564@@ -51,8 +51,8 @@
565
566
567 These test functions are used in the dmedia test suite, and 3rd-party apps would
568-be well served by doing the same. Please read on for the rationale for some of
569-the key dmedia schema design decisions...
570+be well served by doing the same. Please read on for the rationale of some key
571+dmedia schema design decisions...
572
573
574
575
576=== modified file 'dmedia/tests/test_filestore.py'
577--- dmedia/tests/test_filestore.py 2011-02-12 10:56:53 +0000
578+++ dmedia/tests/test_filestore.py 2011-02-15 03:58:48 +0000
579@@ -499,70 +499,16 @@
580 self.assertTrue(inst.base.startswith('/tmp/store.'))
581 self.assertEqual(inst.record, path.join(inst.base, 'store.json'))
582
583- def test_relpath(self):
584- self.assertEqual(
585- self.klass.relpath('NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW'),
586- ('NW', 'BNVXVK5DQGIOW7MYR4K3KA5K22W7NW')
587- )
588- self.assertEqual(
589- self.klass.relpath('NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW', ext='ogv'),
590- ('NW', 'BNVXVK5DQGIOW7MYR4K3KA5K22W7NW.ogv')
591- )
592-
593- # Test to make sure hashes are getting checked with safe_b32():
594- bad = 'NWBNVXVK5..GIOW7MYR4K3KA5K22W7NW'
595- e = raises(ValueError, self.klass.relpath, bad)
596- self.assertEqual(
597- str(e),
598- 'b32: cannot b32decode %r: Non-base32 digit found' % bad
599- )
600- e = raises(ValueError, self.klass.relpath, bad, ext='ogv')
601- self.assertEqual(
602- str(e),
603- 'b32: cannot b32decode %r: Non-base32 digit found' % bad
604- )
605-
606- # Test to make sure ext is getting checked with safe_ext():
607- chash = 'NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW'
608- bad = '/../../../.ssh/id_pub'
609- e = raises(ValueError, self.klass.relpath, chash, bad)
610- self.assertEqual(
611- str(e),
612- 'ext %r does not match pattern %r' % (bad, EXT_PAT)
613- )
614-
615- def test_reltmp(self):
616- self.assertEqual(
617- self.klass.reltmp('NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW'),
618- ('transfers', 'NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW')
619- )
620- self.assertEqual(
621- self.klass.reltmp('NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW', ext='ogv'),
622- ('transfers', 'NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW.ogv')
623- )
624-
625- # Test to make sure hashes are getting checked with safe_b32():
626- bad = 'NWBNVXVK5..GIOW7MYR4K3KA5K22W7NW'
627- e = raises(ValueError, self.klass.reltmp, bad)
628- self.assertEqual(
629- str(e),
630- 'b32: cannot b32decode %r: Non-base32 digit found' % bad
631- )
632- e = raises(ValueError, self.klass.reltmp, bad, ext='ogv')
633- self.assertEqual(
634- str(e),
635- 'b32: cannot b32decode %r: Non-base32 digit found' % bad
636- )
637-
638- # Test to make sure ext is getting checked with safe_ext():
639- chash = 'NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW'
640- bad = '/../../../.ssh/id_pub'
641- e = raises(ValueError, self.klass.reltmp, chash, bad)
642- self.assertEqual(
643- str(e),
644- 'ext %r does not match pattern %r' % (bad, EXT_PAT)
645- )
646-
647+ def test_repr(self):
648+ tmp = TempDir()
649+ inst = self.klass(tmp.path)
650+ self.assertEqual(
651+ repr(inst),
652+ 'FileStore(%r)' % tmp.path
653+ )
654+
655+ ############################################
656+ # Methods to prevent path traversals attacks
657 def test_check_path(self):
658 tmp = TempDir()
659 base = tmp.join('foo', 'bar')
660@@ -694,6 +640,41 @@
661 self.assertFalse(path.exists(bad))
662 self.assertFalse(path.exists(baddir))
663
664+
665+ #################################################
666+ # Methods for working with files in the FileStore
667+ def test_relpath(self):
668+ self.assertEqual(
669+ self.klass.relpath('NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW'),
670+ ('NW', 'BNVXVK5DQGIOW7MYR4K3KA5K22W7NW')
671+ )
672+ self.assertEqual(
673+ self.klass.relpath('NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW', ext='ogv'),
674+ ('NW', 'BNVXVK5DQGIOW7MYR4K3KA5K22W7NW.ogv')
675+ )
676+
677+ # Test to make sure hashes are getting checked with safe_b32():
678+ bad = 'NWBNVXVK5..GIOW7MYR4K3KA5K22W7NW'
679+ e = raises(ValueError, self.klass.relpath, bad)
680+ self.assertEqual(
681+ str(e),
682+ 'b32: cannot b32decode %r: Non-base32 digit found' % bad
683+ )
684+ e = raises(ValueError, self.klass.relpath, bad, ext='ogv')
685+ self.assertEqual(
686+ str(e),
687+ 'b32: cannot b32decode %r: Non-base32 digit found' % bad
688+ )
689+
690+ # Test to make sure ext is getting checked with safe_ext():
691+ chash = 'NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW'
692+ bad = '/../../../.ssh/id_pub'
693+ e = raises(ValueError, self.klass.relpath, chash, bad)
694+ self.assertEqual(
695+ str(e),
696+ 'ext %r does not match pattern %r' % (bad, EXT_PAT)
697+ )
698+
699 def test_path(self):
700 tmp = TempDir()
701 base = tmp.join('foo', 'bar')
702@@ -751,6 +732,108 @@
703 self.assertFalse(path.exists(f))
704 self.assertTrue(path.isdir(d))
705
706+ def test_exists(self):
707+ tmp = TempDir()
708+ inst = self.klass(tmp.path)
709+
710+ # File doesn't exist
711+ self.assertFalse(inst.exists(mov_hash, 'mov'))
712+
713+ # File exists
714+ tmp.touch(mov_hash[:2], mov_hash[2:] + '.mov')
715+ self.assertTrue(inst.exists(mov_hash, 'mov'))
716+
717+ def test_open(self):
718+ tmp = TempDir()
719+ inst = self.klass(tmp.path)
720+
721+ # File doesn't exist
722+ e = raises(IOError, inst.open, mov_hash, 'mov')
723+ self.assertEqual(e.errno, 2)
724+
725+ # File exists
726+ canonical = inst.path(mov_hash, 'mov', create=True)
727+ shutil.copy2(sample_mov, canonical)
728+ fp = inst.open(mov_hash, 'mov')
729+ self.assertTrue(isinstance(fp, file))
730+ self.assertEqual(fp.mode, 'rb')
731+ self.assertEqual(fp.tell(), 0)
732+ self.assertEqual(fp.name, canonical)
733+
734+ def test_verify(self):
735+ tmp = TempDir()
736+ inst = self.klass(tmp.path)
737+
738+ # File doesn't exist
739+ e = raises(IOError, inst.verify, mov_hash, 'mov')
740+ self.assertEqual(e.errno, 2)
741+
742+ # File exists
743+ canonical = inst.path(mov_hash, 'mov', create=True)
744+ shutil.copy2(sample_mov, canonical)
745+ fp = inst.verify(mov_hash, 'mov')
746+ self.assertTrue(isinstance(fp, file))
747+ self.assertEqual(fp.mode, 'rb')
748+ self.assertEqual(fp.tell(), 0)
749+ self.assertEqual(fp.name, canonical)
750+
751+ # File has wrong content-hash
752+ os.remove(canonical)
753+ shutil.copy2(sample_thm, canonical)
754+ e = raises(IntegrityError, inst.verify, mov_hash, 'mov')
755+ self.assertEqual(e.got, thm_hash)
756+ self.assertEqual(e.expected, mov_hash)
757+ self.assertEqual(e.filename, canonical)
758+
759+ def test_remove(self):
760+ tmp = TempDir()
761+ inst = self.klass(tmp.path)
762+
763+ # File doesn't exist
764+ e = raises(OSError, inst.remove, mov_hash, 'mov')
765+ self.assertEqual(e.errno, 2)
766+
767+ # File exists
768+ f = tmp.touch(mov_hash[:2], mov_hash[2:] + '.mov')
769+ self.assertTrue(path.isfile(f))
770+ inst.remove(mov_hash, 'mov')
771+ self.assertFalse(path.exists(f))
772+
773+
774+ ###########################################################
775+ # Methods for working with temporary files in the FileStore
776+ def test_reltmp(self):
777+ self.assertEqual(
778+ self.klass.reltmp('NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW'),
779+ ('transfers', 'NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW')
780+ )
781+ self.assertEqual(
782+ self.klass.reltmp('NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW', ext='ogv'),
783+ ('transfers', 'NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW.ogv')
784+ )
785+
786+ # Test to make sure hashes are getting checked with safe_b32():
787+ bad = 'NWBNVXVK5..GIOW7MYR4K3KA5K22W7NW'
788+ e = raises(ValueError, self.klass.reltmp, bad)
789+ self.assertEqual(
790+ str(e),
791+ 'b32: cannot b32decode %r: Non-base32 digit found' % bad
792+ )
793+ e = raises(ValueError, self.klass.reltmp, bad, ext='ogv')
794+ self.assertEqual(
795+ str(e),
796+ 'b32: cannot b32decode %r: Non-base32 digit found' % bad
797+ )
798+
799+ # Test to make sure ext is getting checked with safe_ext():
800+ chash = 'NWBNVXVK5DQGIOW7MYR4K3KA5K22W7NW'
801+ bad = '/../../../.ssh/id_pub'
802+ e = raises(ValueError, self.klass.reltmp, chash, bad)
803+ self.assertEqual(
804+ str(e),
805+ 'ext %r does not match pattern %r' % (bad, EXT_PAT)
806+ )
807+
808 def test_tmp(self):
809 tmp = TempDir()
810 inst = self.klass(tmp.path)

Subscribers

People subscribed via source and target branches