Merge lp:~adeuring/charmworld/charm-revision-files into lp:~juju-jitsu/charmworld/trunk

Proposed by Abel Deuring
Status: Merged
Approved by: Abel Deuring
Approved revision: 327
Merged at revision: 326
Proposed branch: lp:~adeuring/charmworld/charm-revision-files
Merge into: lp:~juju-jitsu/charmworld/trunk
Diff against target: 427 lines (+231/-67)
3 files modified
charmworld/jobs/ingest.py (+1/-15)
charmworld/models.py (+109/-52)
charmworld/tests/test_models.py (+121/-0)
To merge this branch: bzr merge lp:~adeuring/charmworld/charm-revision-files
Reviewer Review Type Date Requested Status
Curtis Hovey (community) code Approve
Review via email: mp+177160@code.launchpad.net

Commit message

new method CharmFileSet.save_revision_files()

Description of the change

his branch adds a method save_revision_files() to class CharmFileSet.

Ir is similar to the existing method save_files(), but it stores only
a dictionary file_path -> GridFS_id as charm['files'].

The method uses the function slurp_files(), which was moved from
jobs.ingest to models.

slurp_files() stored all files from a branch in GridFS, while we want
to store only a subset of these files for charms. On the other hand,
slurp_files() ignored symlinks. I fixed this by adding an optional
parameter entry_source(), which should return a sequence of
(path_name, bzr-file-id) items. A new method CharmFileSet.find_files()
generates such a sequence for save_revision_files(); it is also uyed
by the already existing method save_files().

The charm sample data has a symlink hooks/linkstart, pointing to
hooks/start, which means that slurp_files() tries to save two files
with identical content, when called by save_revision_files().
GridFS.put() raises an exception when it is called with an already
existing _id.

To post a comment you must log in.
Revision history for this message
Curtis Hovey (sinzui) wrote :

Hi Abel.

This branch looks good. I think It fixes bug 1189992. I have some questions.

I would like a doc comment added to find_files(). Maybe?
   """Yield file_path, file_id of valid charm files.

    Unrecognised files and directories are ignored. Symlink outside of the charm
    are ignored.
    """

I know we have tests for the conditions described by my proposed docstring, but the are probably still names ...save_files....

save_revision_files docstring needs fixing:
    """Store relevant files from a charm revision."""

slurp_files() needs a docstring now that there is a non-obvious contract proved by entry_source. Maybe
    """Return a dict of file paths and file content hashes.

    :param fs: A file system, such as gridFS.
    :param tree: A bazaar tree.
    :param entry_source: A generator of (file_path, file_id) tuples to be slurped.
        The default entry_source only finds files.
    """

I think test_save_revision_files_from_old_revision() and test_save_revision_files_from_new_revision() are functionally identical. I think the two extra lines in the "new" test will pass when added to the "old" test. Is the setup wrong or is the revision under test wrong?

I expected to see a test where the content of a file is changed in a new revision. A comparison of charm_data['files'] for the old and new revisions will confirm the changed file has a new hash.

I think test_slurp_files_saves_files_with_identical_content() needs a different name. My reading of the code is that the method does not save duplicate files. It guarantees that it returns the hash/id of the file content, and may save the file content if it does not already exist.

review: Needs Information (code)
327. By Abel Deuring

several doc strings added; minor test refactoring; test that file changes are properly tracked by slurp_files()

Revision history for this message
Abel Deuring (adeuring) wrote :
Download full text (3.4 KiB)

On 26.07.2013 18:30, Curtis Hovey wrote:
> Review: Needs Information code
>
> Hi Abel.
>
> This branch looks good. I think It fixes bug 1189992. I have some questions.

well, not immediately -- we'll need to use the new data and the new
method that creates it.

>
> I would like a doc comment added to find_files(). Maybe?
> """Yield file_path, file_id of valid charm files.
>
> Unrecognised files and directories are ignored. Symlink outside of the charm
> are ignored.
> """

Added (with minor changes).

>
> I know we have tests for the conditions described by my proposed docstring, but the are probably still names ...save_files....
>
> save_revision_files docstring needs fixing:
> """Store relevant files from a charm revision."""
>
> slurp_files() needs a docstring now that there is a non-obvious contract proved by entry_source. Maybe
> """Return a dict of file paths and file content hashes.
>
> :param fs: A file system, such as gridFS.
> :param tree: A bazaar tree.
> :param entry_source: A generator of (file_path, file_id) tuples to be slurped.
> The default entry_source only finds files.
> """
>
> I think test_save_revision_files_from_old_revision() and test_save_revision_files_from_new_revision() are functionally identical. I think the two extra lines in the "new" test will pass when added to the "old" test. Is the setup wrong or is the revision under test wrong?

They are of course similar, but not fully equivalent. The idea of the
two tests is to show that it is possible to successfully select a
revision of a branch and that all files from the specified revision are
stored, but not files that were added later.

The two lines

        expected = self.expected_files()
        expected['hooks/new_hook'] = 'A new hook'

include the file added in the second commit to the expected data. The
method check_revision_files() ensures that the set of expected and
actually added files are identical:

        expected_names = set(expected_files)
        found_names = set(charm_revision['files'])
        self.assertEqual(expected_names, found_names)

So, adding hooks/now_hook to expected_files in
test_save_revision_files_from_old_revision() would cause a test failure.

(There is a subtle difference in the two helper functions
add_revision_two() defined in
test_save_revision_files_from_old_revision() and
test_save_revision_files_from_new_revision(): The one in
...old_revision() returns None, while the other returns the new revision
ID. This has influence one the value returned by save_revision_files().
This somewhat convoluted behaviour is the result of a review discussion
of older work related to saving files for s specific revision... I've
refactored the two helper functions which makes the difference hopefully
a bit more obvious.)

>
> I expected to see a test where the content of a file is changed in a new revision. A comparison of charm_data['files'] for the old and new revisions will confirm the changed file has a new hash.

The changes between the two version of the test branch now include a
file change.

>
> I think test_slurp_files_saves_files_with_identical_content() needs a different name...

Read more...

Revision history for this message
Curtis Hovey (sinzui) wrote :

Thank you.

I missed the subtle meaning of `return tree.commit('revision 2')` and `return`. Your helper method make it clear how the test is setup.

review: Approve (code)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'charmworld/jobs/ingest.py'
2--- charmworld/jobs/ingest.py 2013-07-25 19:48:14 +0000
3+++ charmworld/jobs/ingest.py 2013-07-29 11:31:27 +0000
4@@ -8,7 +8,6 @@
5 datetime,
6 timedelta,
7 )
8-import hashlib
9 import logging
10 import os
11 import shutil
12@@ -21,7 +20,6 @@
13 from bzrlib.branch import Branch
14 from bzrlib.revision import NULL_REVISION
15 from bzrlib.transport import get_transport
16-import magic
17 import requests
18 import yaml
19
20@@ -41,6 +39,7 @@
21 getdb,
22 getfs,
23 options_to_storage,
24+ slurp_files,
25 store_bundles,
26 )
27 from charmworld.search import (
28@@ -303,19 +302,6 @@
29 store_bundles(collection, deployer_config, basket_data['_id'])
30
31
32-def slurp_files(fs, tree):
33- hashes = {}
34- for path, entry in tree.iter_entries_by_dir():
35- if entry.kind != 'file':
36- continue
37- bytes = tree.get_file_text(entry.file_id)
38- content_type = magic.from_buffer(bytes, mime=True)
39- _id = hashlib.sha256(bytes).hexdigest()
40- fs.put(bytes, contentType=content_type, _id=_id)
41- hashes[path] = _id
42- return hashes
43-
44-
45 def _rev_info(r, branch):
46 d = {
47 'authors': r.get_apparent_authors(),
48
49=== modified file 'charmworld/models.py'
50--- charmworld/models.py 2013-07-22 20:49:00 +0000
51+++ charmworld/models.py 2013-07-29 11:31:27 +0000
52@@ -5,6 +5,7 @@
53 from bzrlib.branch import Branch
54 from bzrlib.errors import NoSuchRevision
55 from calendar import timegm
56+import hashlib
57 import logging
58 from mimetypes import guess_type
59 import os
60@@ -732,7 +733,7 @@
61 )
62
63 @classmethod
64- def save_file(cls, fs, logger, charm_data, file_info, tree):
65+ def save_file(cls, fs, logger, charm_data, file_path, file_id, tree):
66 """Do the actual storage of the file contents.
67
68 :param fs: The mongo fs connection to use.
69@@ -741,29 +742,6 @@
70 :param file_info: The file's data as returned by tree.list_files().
71 :tree: The working tree of the branch.
72 """
73- file_path, ignore, file_type, file_id = file_info[:4]
74- if file_type == 'symlink':
75- target = tree.get_symlink_target(file_id)
76- if target.startswith('/'):
77- logger.warn(
78- "Invalid file path: symlink %s points to the absolute "
79- "path %s." % (
80- file_path, target))
81- return
82- directory = dirname(file_path)
83- resolved = normpath(join(directory, target))
84- if resolved.startswith('../'):
85- logger.warn(
86- "Invalid file path: Path %s points to %s, which is not "
87- "inside the charm's root directory" % (
88- file_path, resolved))
89- return
90- file_id = tree.path2id(resolved)
91- if file_id is None:
92- logger.warn(
93- "File does not exist: %s (resolving to %s)." % (
94- file_path, resolved))
95- return
96 plain_file_name = split(file_path)[1]
97 charm_file = cls.make_charm_file(
98 fs, charm_data, file_path, plain_file_name)
99@@ -812,43 +790,95 @@
100 return cf
101
102 @classmethod
103- def save_files(cls, fs, charm_data, charm_bzr_path, logger):
104- """Given a bzr branch, load the files we care about."""
105+ def get_tree(cls, charm_data, charm_bzr_path, logger):
106 bzr_revision = charm_data['store_data']['digest']
107 branch = Branch.open(charm_bzr_path)
108 try:
109- tree = branch.repository.revision_tree(bzr_revision)
110+ return branch.repository.revision_tree(bzr_revision)
111 except NoSuchRevision:
112 logger.warn('Revision %s not found' % bzr_revision)
113+ return None
114+
115+ @classmethod
116+ def find_files(cls, tree, logger):
117+ """Yield file_path, file_id of relevant charm files.
118+
119+ Irrelevant files and directories are ignored. Symlinks outside
120+ of the charm are ignored.
121+ """
122+ for file_path, entry in tree.iter_entries_by_dir():
123+ if entry.kind == 'directory':
124+ continue
125+ found = False
126+ for pattern in cls.store_files:
127+ if re.match(pattern, file_path, flags=re.I):
128+ found = True
129+ break
130+ if not found:
131+ for directory in cls.store_directories:
132+ if file_path.startswith(directory):
133+ found = True
134+ break
135+ if not found:
136+ continue
137+
138+ if entry.kind == 'symlink':
139+ target = tree.get_symlink_target(entry.file_id)
140+ if target.startswith('/'):
141+ logger.warn(
142+ "Invalid file path: symlink %s points to the absolute "
143+ "path %s." % (
144+ file_path, target))
145+ continue
146+ directory = dirname(file_path)
147+ resolved = normpath(join(directory, target))
148+ if resolved.startswith('../'):
149+ logger.warn(
150+ "Invalid file path: Path %s points to %s, which is "
151+ "not inside the charm's root directory" % (
152+ file_path, resolved))
153+ continue
154+ file_id = tree.path2id(resolved)
155+ if file_id is None:
156+ logger.warn(
157+ "File does not exist: %s (resolving to %s)." % (
158+ file_path, resolved))
159+ continue
160+ else:
161+ file_id = entry.file_id
162+ yield file_path, file_id
163+
164+ @classmethod
165+ def save_files(cls, fs, charm_data, charm_bzr_path, logger):
166+ """Given a bzr branch, load the files we care about."""
167+ tree = cls.get_tree(charm_data, charm_bzr_path, logger)
168+ if tree is None:
169 return []
170- with read_locked(branch):
171+ with read_locked(tree):
172 stored = []
173- for file_info in tree.list_files():
174- file_path = file_info[0]
175- file_type = file_info[2]
176- if file_type == 'directory':
177- continue
178- found = False
179- for pattern in cls.store_files:
180- if re.match(pattern, file_path, flags=re.I):
181- found = True
182- break
183- if not found:
184- for directory in cls.store_directories:
185- if file_path.startswith(directory):
186- found = True
187- break
188- if found:
189- charm_file = cls.save_file(
190- fs,
191- logger,
192- charm_data,
193- file_info,
194- tree)
195- if charm_file is not None:
196- stored.append(charm_file)
197+ for file_path, file_id in cls.find_files(tree, logger):
198+ charm_file = cls.save_file(
199+ fs,
200+ logger,
201+ charm_data,
202+ file_path,
203+ file_id,
204+ tree)
205+ if charm_file is not None:
206+ stored.append(charm_file)
207 return stored
208
209+ @classmethod
210+ def save_revision_files(cls, fs, charm_data, charm_bzr_path, logger):
211+ """Store relevant files from a charm revision"""
212+ tree = cls.get_tree(charm_data, charm_bzr_path, logger)
213+ if tree is None:
214+ charm_data['files'] = {}
215+ return
216+ with read_locked(tree):
217+ charm_data['files'] = slurp_files(
218+ fs, tree, cls.find_files(tree, logger))
219+
220
221 class CharmFile(object):
222 """Representation of the GridFS stored document.
223@@ -1190,3 +1220,30 @@
224 'data': get_flattened_deployment(deployer_data, key),
225 }
226 collection.save(bundle_doc)
227+
228+
229+def slurp_files(fs, tree, entry_source=None):
230+ """Return a dict of file paths and file content hashes.
231+
232+ :param fs: A file system, such as gridFS.
233+ :param tree: A bazaar tree.
234+ :param entry_source: A generator of (file_path, file_id) tuples to
235+ be slurped. The default entry_source only finds files.
236+ """
237+ if entry_source is None:
238+ def entry_source():
239+ for path, entry in tree.iter_entries_by_dir():
240+ if entry.kind != 'file':
241+ continue
242+ yield path, entry.file_id
243+ entry_source = entry_source()
244+
245+ hashes = {}
246+ for path, file_id in entry_source:
247+ bytes = tree.get_file_text(file_id)
248+ content_type = magic.from_buffer(bytes, mime=True)
249+ _id = hashlib.sha256(bytes).hexdigest()
250+ if not fs.exists(_id):
251+ fs.put(bytes, contentType=content_type, _id=_id)
252+ hashes[path] = _id
253+ return hashes
254
255=== modified file 'charmworld/tests/test_models.py'
256--- charmworld/tests/test_models.py 2013-07-22 17:57:14 +0000
257+++ charmworld/tests/test_models.py 2013-07-29 11:31:27 +0000
258@@ -3,16 +3,19 @@
259 # the file LICENSE).
260
261 """Unit tests for our application models."""
262+from bzrlib.branch import Branch
263 from datetime import (
264 date,
265 datetime,
266 )
267+import hashlib
268 import logging
269 import md5
270 import os
271 from os.path import dirname
272 from os.path import join
273 from os.path import split
274+import stat
275 from textwrap import dedent
276
277 from mock import patch
278@@ -32,6 +35,7 @@
279 find_charms,
280 options_to_storage,
281 QAData,
282+ slurp_files,
283 storage_to_options,
284 store_bundles,
285 sync_index,
286@@ -48,6 +52,7 @@
287 from charmworld.testing.factory import charm_branch
288 from charmworld.utils import (
289 quote_key,
290+ read_locked,
291 )
292
293
294@@ -740,6 +745,16 @@
295 branch_dir,
296 self.logger)
297
298+ def save_revision_files(self, charm_path=None, extra_revisions=[]):
299+ with charm_branch(charm_path, extra_revisions) as (
300+ branch_dir, revision):
301+ charm_data = self.charm_data.copy()
302+ charm_data['branch_dir'] = branch_dir
303+ charm_data['store_data'] = {'digest': revision}
304+ CharmFileSet.save_revision_files(
305+ self.fs, charm_data, branch_dir, self.logger)
306+ return charm_data
307+
308 def test_fileid_generation(self):
309 """A charm must generate an unique fileid."""
310 self.assertEqual(
311@@ -966,6 +981,60 @@
312 saved = [charm_file._filename for charm_file in saved]
313 self.assertEqual(['a_hook'], saved)
314
315+ def check_revision_files(self, expected_files, charm_revision):
316+ expected_names = set(expected_files)
317+ found_names = set(charm_revision['files'])
318+ self.assertEqual(expected_names, found_names)
319+ for name in expected_names:
320+ expected_fs_id = hashlib.sha256(expected_files[name]).hexdigest()
321+ self.assertEqual(expected_fs_id, charm_revision['files'][name])
322+
323+ def test_save_revision_files(self):
324+ charm_revision = self.save_revision_files()
325+ self.check_revision_files(self.expected_files(), charm_revision)
326+
327+ def add_revision(self, cwd, tree, return_new_revision):
328+ path = join(cwd, 'branch/trunk/hooks/new_hook')
329+ with open(path, 'w') as f:
330+ f.write('A new hook')
331+ tree.add(['hooks/new_hook'])
332+ path = join(cwd, 'branch/trunk/README.md')
333+ with open(path, 'a') as f:
334+ f.write('appended text')
335+ if return_new_revision:
336+ return tree.commit('revision 2')
337+ else:
338+ return None
339+
340+ def test_save_revision_files_from_old_revision(self):
341+ def add_revision_two(cwd, tree):
342+ return self.add_revision(cwd, tree, False)
343+ revision = self.save_revision_files(extra_revisions=[add_revision_two])
344+ self.check_revision_files(self.expected_files(), revision)
345+
346+ def test_save_revision_files_from_new_revision(self):
347+ def add_revision_two(cwd, tree):
348+ return self.add_revision(cwd, tree, True)
349+ revision = self.save_revision_files(extra_revisions=[add_revision_two])
350+ expected = self.expected_files()
351+ expected['hooks/new_hook'] = 'A new hook'
352+ expected['README.md'] = expected['README.md'] + 'appended text'
353+ self.check_revision_files(expected, revision)
354+
355+ def test_save_revision_files_with_non_existent_bzr_revision(self):
356+ with charm_branch() as (branch_dir, revision):
357+ charm_data = self.charm_data.copy()
358+ charm_data['branch_dir'] = branch_dir
359+ charm_data['store_data'] = {'digest': 'bad-revision-id'}
360+ CharmFileSet.save_revision_files(
361+ self.fs,
362+ charm_data,
363+ branch_dir,
364+ self.logger)
365+ self.assertEqual({}, charm_data['files'])
366+ messages = [rec.msg for rec in self.log_handler.buffer]
367+ self.assertEqual(['Revision bad-revision-id not found'], messages)
368+
369
370 class TestCharmFile(MongoTestBase):
371
372@@ -1327,3 +1396,55 @@
373 })
374 self.assertEqual(
375 "jc:~bac/apache-6/tiny", bundle.permanent_url)
376+
377+
378+class TestSlurpFiles(MongoTestBase):
379+
380+ def test_slurp_files_saves_all_files_by_default(self):
381+ with charm_branch() as (branch_dir, revision):
382+ branch = Branch.open(branch_dir)
383+ tree = branch.repository.revision_tree(revision)
384+ with read_locked(tree):
385+ result = slurp_files(self.fs, tree)
386+ test_data_dir = join(
387+ dirname(dirname(__file__)), 'testing/data/sample_charm')
388+ expected_file_paths = set()
389+ for dirpath, dirnames, filenames in os.walk(test_data_dir):
390+ for filename in filenames:
391+ filepath = os.path.join(dirpath, filename)
392+ if stat.S_ISLNK(os.lstat(filepath).st_mode):
393+ continue
394+ relpath = filepath[len(test_data_dir) + 1:]
395+ expected_file_paths.add(relpath)
396+ content = open(filepath).read()
397+ expected_hash = hashlib.sha256(content).hexdigest()
398+ self.assertEqual(expected_hash, result[relpath])
399+ self.assertEqual(expected_file_paths, set(result))
400+
401+ def test_slurp_files_with_identical_content_saved_only_once(self):
402+
403+ def add_revision_two(cwd, tree):
404+ src_path = join(cwd, 'branch/trunk/common')
405+ dst_path = join(cwd, 'branch/trunk/also-common')
406+ with open(dst_path, 'w') as f:
407+ f.write(open(src_path).read())
408+ tree.add(['also-common'])
409+ return tree.commit('revision 2')
410+
411+ with charm_branch(extra=[add_revision_two]) as (branch_dir, revision):
412+ branch = Branch.open(branch_dir)
413+ tree = branch.repository.revision_tree(revision)
414+ with read_locked(tree):
415+ result = slurp_files(self.fs, tree)
416+ test_data_dir = join(
417+ dirname(dirname(__file__)), 'testing/data/sample_charm')
418+ expected_file_paths = set()
419+ for dirpath, dirnames, filenames in os.walk(test_data_dir):
420+ for filename in filenames:
421+ filepath = os.path.join(dirpath, filename)
422+ if stat.S_ISLNK(os.lstat(filepath).st_mode):
423+ continue
424+ relpath = filepath[len(test_data_dir) + 1:]
425+ expected_file_paths.add(relpath)
426+ expected_file_paths.add('also-common')
427+ self.assertEqual(result['common'], result['also-common'])

Subscribers

People subscribed via source and target branches