Merge lp:~adeuring/charmworld/charm-revision-files into lp:~juju-jitsu/charmworld/trunk
- charm-revision-files
- Merge into trunk
Status: | Merged |
---|---|
Approved by: | Abel Deuring |
Approved revision: | 327 |
Merged at revision: | 326 |
Proposed branch: | lp:~adeuring/charmworld/charm-revision-files |
Merge into: | lp:~juju-jitsu/charmworld/trunk |
Diff against target: |
427 lines (+231/-67) 3 files modified
charmworld/jobs/ingest.py (+1/-15) charmworld/models.py (+109/-52) charmworld/tests/test_models.py (+121/-0) |
To merge this branch: | bzr merge lp:~adeuring/charmworld/charm-revision-files |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Curtis Hovey (community) | code | Approve | |
Review via email: mp+177160@code.launchpad.net |
Commit message
new method CharmFileSet.
Description of the change
his branch adds a method save_revision_
Ir is similar to the existing method save_files(), but it stores only
a dictionary file_path -> GridFS_id as charm['files'].
The method uses the function slurp_files(), which was moved from
jobs.ingest to models.
slurp_files() stored all files from a branch in GridFS, while we want
to store only a subset of these files for charms. On the other hand,
slurp_files() ignored symlinks. I fixed this by adding an optional
parameter entry_source(), which should return a sequence of
(path_name, bzr-file-id) items. A new method CharmFileSet.
generates such a sequence for save_revision_
by the already existing method save_files().
The charm sample data has a symlink hooks/linkstart, pointing to
hooks/start, which means that slurp_files() tries to save two files
with identical content, when called by save_revision_
GridFS.put() raises an exception when it is called with an already
existing _id.
- 327. By Abel Deuring
-
several doc strings added; minor test refactoring; test that file changes are properly tracked by slurp_files()
Abel Deuring (adeuring) wrote : | # |
On 26.07.2013 18:30, Curtis Hovey wrote:
> Review: Needs Information code
>
> Hi Abel.
>
> This branch looks good. I think It fixes bug 1189992. I have some questions.
well, not immediately -- we'll need to use the new data and the new
method that creates it.
>
> I would like a doc comment added to find_files(). Maybe?
> """Yield file_path, file_id of valid charm files.
>
> Unrecognised files and directories are ignored. Symlink outside of the charm
> are ignored.
> """
Added (with minor changes).
>
> I know we have tests for the conditions described by my proposed docstring, but the are probably still names ...save_files....
>
> save_revision_files docstring needs fixing:
> """Store relevant files from a charm revision."""
>
> slurp_files() needs a docstring now that there is a non-obvious contract proved by entry_source. Maybe
> """Return a dict of file paths and file content hashes.
>
> :param fs: A file system, such as gridFS.
> :param tree: A bazaar tree.
> :param entry_source: A generator of (file_path, file_id) tuples to be slurped.
> The default entry_source only finds files.
> """
>
> I think test_save_
They are of course similar, but not fully equivalent. The idea of the
two tests is to show that it is possible to successfully select a
revision of a branch and that all files from the specified revision are
stored, but not files that were added later.
The two lines
expected = self.expected_
include the file added in the second commit to the expected data. The
method check_revision_
actually added files are identical:
found_names = set(charm_
So, adding hooks/now_hook to expected_files in
test_save_
(There is a subtle difference in the two helper functions
add_revision_two() defined in
test_save_
test_save_
...old_revision() returns None, while the other returns the new revision
ID. This has influence one the value returned by save_revision_
This somewhat convoluted behaviour is the result of a review discussion
of older work related to saving files for s specific revision... I've
refactored the two helper functions which makes the difference hopefully
a bit more obvious.)
>
> I expected to see a test where the content of a file is changed in a new revision. A comparison of charm_data['files'] for the old and new revisions will confirm the changed file has a new hash.
The changes between the two version of the test branch now include a
file change.
>
> I think test_slurp_
Curtis Hovey (sinzui) wrote : | # |
Thank you.
I missed the subtle meaning of `return tree.commit(
Preview Diff
1 | === modified file 'charmworld/jobs/ingest.py' |
2 | --- charmworld/jobs/ingest.py 2013-07-25 19:48:14 +0000 |
3 | +++ charmworld/jobs/ingest.py 2013-07-29 11:31:27 +0000 |
4 | @@ -8,7 +8,6 @@ |
5 | datetime, |
6 | timedelta, |
7 | ) |
8 | -import hashlib |
9 | import logging |
10 | import os |
11 | import shutil |
12 | @@ -21,7 +20,6 @@ |
13 | from bzrlib.branch import Branch |
14 | from bzrlib.revision import NULL_REVISION |
15 | from bzrlib.transport import get_transport |
16 | -import magic |
17 | import requests |
18 | import yaml |
19 | |
20 | @@ -41,6 +39,7 @@ |
21 | getdb, |
22 | getfs, |
23 | options_to_storage, |
24 | + slurp_files, |
25 | store_bundles, |
26 | ) |
27 | from charmworld.search import ( |
28 | @@ -303,19 +302,6 @@ |
29 | store_bundles(collection, deployer_config, basket_data['_id']) |
30 | |
31 | |
32 | -def slurp_files(fs, tree): |
33 | - hashes = {} |
34 | - for path, entry in tree.iter_entries_by_dir(): |
35 | - if entry.kind != 'file': |
36 | - continue |
37 | - bytes = tree.get_file_text(entry.file_id) |
38 | - content_type = magic.from_buffer(bytes, mime=True) |
39 | - _id = hashlib.sha256(bytes).hexdigest() |
40 | - fs.put(bytes, contentType=content_type, _id=_id) |
41 | - hashes[path] = _id |
42 | - return hashes |
43 | - |
44 | - |
45 | def _rev_info(r, branch): |
46 | d = { |
47 | 'authors': r.get_apparent_authors(), |
48 | |
49 | === modified file 'charmworld/models.py' |
50 | --- charmworld/models.py 2013-07-22 20:49:00 +0000 |
51 | +++ charmworld/models.py 2013-07-29 11:31:27 +0000 |
52 | @@ -5,6 +5,7 @@ |
53 | from bzrlib.branch import Branch |
54 | from bzrlib.errors import NoSuchRevision |
55 | from calendar import timegm |
56 | +import hashlib |
57 | import logging |
58 | from mimetypes import guess_type |
59 | import os |
60 | @@ -732,7 +733,7 @@ |
61 | ) |
62 | |
63 | @classmethod |
64 | - def save_file(cls, fs, logger, charm_data, file_info, tree): |
65 | + def save_file(cls, fs, logger, charm_data, file_path, file_id, tree): |
66 | """Do the actual storage of the file contents. |
67 | |
68 | :param fs: The mongo fs connection to use. |
69 | @@ -741,29 +742,6 @@ |
70 | :param file_info: The file's data as returned by tree.list_files(). |
71 | :tree: The working tree of the branch. |
72 | """ |
73 | - file_path, ignore, file_type, file_id = file_info[:4] |
74 | - if file_type == 'symlink': |
75 | - target = tree.get_symlink_target(file_id) |
76 | - if target.startswith('/'): |
77 | - logger.warn( |
78 | - "Invalid file path: symlink %s points to the absolute " |
79 | - "path %s." % ( |
80 | - file_path, target)) |
81 | - return |
82 | - directory = dirname(file_path) |
83 | - resolved = normpath(join(directory, target)) |
84 | - if resolved.startswith('../'): |
85 | - logger.warn( |
86 | - "Invalid file path: Path %s points to %s, which is not " |
87 | - "inside the charm's root directory" % ( |
88 | - file_path, resolved)) |
89 | - return |
90 | - file_id = tree.path2id(resolved) |
91 | - if file_id is None: |
92 | - logger.warn( |
93 | - "File does not exist: %s (resolving to %s)." % ( |
94 | - file_path, resolved)) |
95 | - return |
96 | plain_file_name = split(file_path)[1] |
97 | charm_file = cls.make_charm_file( |
98 | fs, charm_data, file_path, plain_file_name) |
99 | @@ -812,43 +790,95 @@ |
100 | return cf |
101 | |
102 | @classmethod |
103 | - def save_files(cls, fs, charm_data, charm_bzr_path, logger): |
104 | - """Given a bzr branch, load the files we care about.""" |
105 | + def get_tree(cls, charm_data, charm_bzr_path, logger): |
106 | bzr_revision = charm_data['store_data']['digest'] |
107 | branch = Branch.open(charm_bzr_path) |
108 | try: |
109 | - tree = branch.repository.revision_tree(bzr_revision) |
110 | + return branch.repository.revision_tree(bzr_revision) |
111 | except NoSuchRevision: |
112 | logger.warn('Revision %s not found' % bzr_revision) |
113 | + return None |
114 | + |
115 | + @classmethod |
116 | + def find_files(cls, tree, logger): |
117 | + """Yield file_path, file_id of relevant charm files. |
118 | + |
119 | + Irrelevant files and directories are ignored. Symlinks outside |
120 | + of the charm are ignored. |
121 | + """ |
122 | + for file_path, entry in tree.iter_entries_by_dir(): |
123 | + if entry.kind == 'directory': |
124 | + continue |
125 | + found = False |
126 | + for pattern in cls.store_files: |
127 | + if re.match(pattern, file_path, flags=re.I): |
128 | + found = True |
129 | + break |
130 | + if not found: |
131 | + for directory in cls.store_directories: |
132 | + if file_path.startswith(directory): |
133 | + found = True |
134 | + break |
135 | + if not found: |
136 | + continue |
137 | + |
138 | + if entry.kind == 'symlink': |
139 | + target = tree.get_symlink_target(entry.file_id) |
140 | + if target.startswith('/'): |
141 | + logger.warn( |
142 | + "Invalid file path: symlink %s points to the absolute " |
143 | + "path %s." % ( |
144 | + file_path, target)) |
145 | + continue |
146 | + directory = dirname(file_path) |
147 | + resolved = normpath(join(directory, target)) |
148 | + if resolved.startswith('../'): |
149 | + logger.warn( |
150 | + "Invalid file path: Path %s points to %s, which is " |
151 | + "not inside the charm's root directory" % ( |
152 | + file_path, resolved)) |
153 | + continue |
154 | + file_id = tree.path2id(resolved) |
155 | + if file_id is None: |
156 | + logger.warn( |
157 | + "File does not exist: %s (resolving to %s)." % ( |
158 | + file_path, resolved)) |
159 | + continue |
160 | + else: |
161 | + file_id = entry.file_id |
162 | + yield file_path, file_id |
163 | + |
164 | + @classmethod |
165 | + def save_files(cls, fs, charm_data, charm_bzr_path, logger): |
166 | + """Given a bzr branch, load the files we care about.""" |
167 | + tree = cls.get_tree(charm_data, charm_bzr_path, logger) |
168 | + if tree is None: |
169 | return [] |
170 | - with read_locked(branch): |
171 | + with read_locked(tree): |
172 | stored = [] |
173 | - for file_info in tree.list_files(): |
174 | - file_path = file_info[0] |
175 | - file_type = file_info[2] |
176 | - if file_type == 'directory': |
177 | - continue |
178 | - found = False |
179 | - for pattern in cls.store_files: |
180 | - if re.match(pattern, file_path, flags=re.I): |
181 | - found = True |
182 | - break |
183 | - if not found: |
184 | - for directory in cls.store_directories: |
185 | - if file_path.startswith(directory): |
186 | - found = True |
187 | - break |
188 | - if found: |
189 | - charm_file = cls.save_file( |
190 | - fs, |
191 | - logger, |
192 | - charm_data, |
193 | - file_info, |
194 | - tree) |
195 | - if charm_file is not None: |
196 | - stored.append(charm_file) |
197 | + for file_path, file_id in cls.find_files(tree, logger): |
198 | + charm_file = cls.save_file( |
199 | + fs, |
200 | + logger, |
201 | + charm_data, |
202 | + file_path, |
203 | + file_id, |
204 | + tree) |
205 | + if charm_file is not None: |
206 | + stored.append(charm_file) |
207 | return stored |
208 | |
209 | + @classmethod |
210 | + def save_revision_files(cls, fs, charm_data, charm_bzr_path, logger): |
211 | + """Store relevant files from a charm revision""" |
212 | + tree = cls.get_tree(charm_data, charm_bzr_path, logger) |
213 | + if tree is None: |
214 | + charm_data['files'] = {} |
215 | + return |
216 | + with read_locked(tree): |
217 | + charm_data['files'] = slurp_files( |
218 | + fs, tree, cls.find_files(tree, logger)) |
219 | + |
220 | |
221 | class CharmFile(object): |
222 | """Representation of the GridFS stored document. |
223 | @@ -1190,3 +1220,30 @@ |
224 | 'data': get_flattened_deployment(deployer_data, key), |
225 | } |
226 | collection.save(bundle_doc) |
227 | + |
228 | + |
229 | +def slurp_files(fs, tree, entry_source=None): |
230 | + """Return a dict of file paths and file content hashes. |
231 | + |
232 | + :param fs: A file system, such as gridFS. |
233 | + :param tree: A bazaar tree. |
234 | + :param entry_source: A generator of (file_path, file_id) tuples to |
235 | + be slurped. The default entry_source only finds files. |
236 | + """ |
237 | + if entry_source is None: |
238 | + def entry_source(): |
239 | + for path, entry in tree.iter_entries_by_dir(): |
240 | + if entry.kind != 'file': |
241 | + continue |
242 | + yield path, entry.file_id |
243 | + entry_source = entry_source() |
244 | + |
245 | + hashes = {} |
246 | + for path, file_id in entry_source: |
247 | + bytes = tree.get_file_text(file_id) |
248 | + content_type = magic.from_buffer(bytes, mime=True) |
249 | + _id = hashlib.sha256(bytes).hexdigest() |
250 | + if not fs.exists(_id): |
251 | + fs.put(bytes, contentType=content_type, _id=_id) |
252 | + hashes[path] = _id |
253 | + return hashes |
254 | |
255 | === modified file 'charmworld/tests/test_models.py' |
256 | --- charmworld/tests/test_models.py 2013-07-22 17:57:14 +0000 |
257 | +++ charmworld/tests/test_models.py 2013-07-29 11:31:27 +0000 |
258 | @@ -3,16 +3,19 @@ |
259 | # the file LICENSE). |
260 | |
261 | """Unit tests for our application models.""" |
262 | +from bzrlib.branch import Branch |
263 | from datetime import ( |
264 | date, |
265 | datetime, |
266 | ) |
267 | +import hashlib |
268 | import logging |
269 | import md5 |
270 | import os |
271 | from os.path import dirname |
272 | from os.path import join |
273 | from os.path import split |
274 | +import stat |
275 | from textwrap import dedent |
276 | |
277 | from mock import patch |
278 | @@ -32,6 +35,7 @@ |
279 | find_charms, |
280 | options_to_storage, |
281 | QAData, |
282 | + slurp_files, |
283 | storage_to_options, |
284 | store_bundles, |
285 | sync_index, |
286 | @@ -48,6 +52,7 @@ |
287 | from charmworld.testing.factory import charm_branch |
288 | from charmworld.utils import ( |
289 | quote_key, |
290 | + read_locked, |
291 | ) |
292 | |
293 | |
294 | @@ -740,6 +745,16 @@ |
295 | branch_dir, |
296 | self.logger) |
297 | |
298 | + def save_revision_files(self, charm_path=None, extra_revisions=[]): |
299 | + with charm_branch(charm_path, extra_revisions) as ( |
300 | + branch_dir, revision): |
301 | + charm_data = self.charm_data.copy() |
302 | + charm_data['branch_dir'] = branch_dir |
303 | + charm_data['store_data'] = {'digest': revision} |
304 | + CharmFileSet.save_revision_files( |
305 | + self.fs, charm_data, branch_dir, self.logger) |
306 | + return charm_data |
307 | + |
308 | def test_fileid_generation(self): |
309 | """A charm must generate an unique fileid.""" |
310 | self.assertEqual( |
311 | @@ -966,6 +981,60 @@ |
312 | saved = [charm_file._filename for charm_file in saved] |
313 | self.assertEqual(['a_hook'], saved) |
314 | |
315 | + def check_revision_files(self, expected_files, charm_revision): |
316 | + expected_names = set(expected_files) |
317 | + found_names = set(charm_revision['files']) |
318 | + self.assertEqual(expected_names, found_names) |
319 | + for name in expected_names: |
320 | + expected_fs_id = hashlib.sha256(expected_files[name]).hexdigest() |
321 | + self.assertEqual(expected_fs_id, charm_revision['files'][name]) |
322 | + |
323 | + def test_save_revision_files(self): |
324 | + charm_revision = self.save_revision_files() |
325 | + self.check_revision_files(self.expected_files(), charm_revision) |
326 | + |
327 | + def add_revision(self, cwd, tree, return_new_revision): |
328 | + path = join(cwd, 'branch/trunk/hooks/new_hook') |
329 | + with open(path, 'w') as f: |
330 | + f.write('A new hook') |
331 | + tree.add(['hooks/new_hook']) |
332 | + path = join(cwd, 'branch/trunk/README.md') |
333 | + with open(path, 'a') as f: |
334 | + f.write('appended text') |
335 | + if return_new_revision: |
336 | + return tree.commit('revision 2') |
337 | + else: |
338 | + return None |
339 | + |
340 | + def test_save_revision_files_from_old_revision(self): |
341 | + def add_revision_two(cwd, tree): |
342 | + return self.add_revision(cwd, tree, False) |
343 | + revision = self.save_revision_files(extra_revisions=[add_revision_two]) |
344 | + self.check_revision_files(self.expected_files(), revision) |
345 | + |
346 | + def test_save_revision_files_from_new_revision(self): |
347 | + def add_revision_two(cwd, tree): |
348 | + return self.add_revision(cwd, tree, True) |
349 | + revision = self.save_revision_files(extra_revisions=[add_revision_two]) |
350 | + expected = self.expected_files() |
351 | + expected['hooks/new_hook'] = 'A new hook' |
352 | + expected['README.md'] = expected['README.md'] + 'appended text' |
353 | + self.check_revision_files(expected, revision) |
354 | + |
355 | + def test_save_revision_files_with_non_existent_bzr_revision(self): |
356 | + with charm_branch() as (branch_dir, revision): |
357 | + charm_data = self.charm_data.copy() |
358 | + charm_data['branch_dir'] = branch_dir |
359 | + charm_data['store_data'] = {'digest': 'bad-revision-id'} |
360 | + CharmFileSet.save_revision_files( |
361 | + self.fs, |
362 | + charm_data, |
363 | + branch_dir, |
364 | + self.logger) |
365 | + self.assertEqual({}, charm_data['files']) |
366 | + messages = [rec.msg for rec in self.log_handler.buffer] |
367 | + self.assertEqual(['Revision bad-revision-id not found'], messages) |
368 | + |
369 | |
370 | class TestCharmFile(MongoTestBase): |
371 | |
372 | @@ -1327,3 +1396,55 @@ |
373 | }) |
374 | self.assertEqual( |
375 | "jc:~bac/apache-6/tiny", bundle.permanent_url) |
376 | + |
377 | + |
378 | +class TestSlurpFiles(MongoTestBase): |
379 | + |
380 | + def test_slurp_files_saves_all_files_by_default(self): |
381 | + with charm_branch() as (branch_dir, revision): |
382 | + branch = Branch.open(branch_dir) |
383 | + tree = branch.repository.revision_tree(revision) |
384 | + with read_locked(tree): |
385 | + result = slurp_files(self.fs, tree) |
386 | + test_data_dir = join( |
387 | + dirname(dirname(__file__)), 'testing/data/sample_charm') |
388 | + expected_file_paths = set() |
389 | + for dirpath, dirnames, filenames in os.walk(test_data_dir): |
390 | + for filename in filenames: |
391 | + filepath = os.path.join(dirpath, filename) |
392 | + if stat.S_ISLNK(os.lstat(filepath).st_mode): |
393 | + continue |
394 | + relpath = filepath[len(test_data_dir) + 1:] |
395 | + expected_file_paths.add(relpath) |
396 | + content = open(filepath).read() |
397 | + expected_hash = hashlib.sha256(content).hexdigest() |
398 | + self.assertEqual(expected_hash, result[relpath]) |
399 | + self.assertEqual(expected_file_paths, set(result)) |
400 | + |
401 | + def test_slurp_files_with_identical_content_saved_only_once(self): |
402 | + |
403 | + def add_revision_two(cwd, tree): |
404 | + src_path = join(cwd, 'branch/trunk/common') |
405 | + dst_path = join(cwd, 'branch/trunk/also-common') |
406 | + with open(dst_path, 'w') as f: |
407 | + f.write(open(src_path).read()) |
408 | + tree.add(['also-common']) |
409 | + return tree.commit('revision 2') |
410 | + |
411 | + with charm_branch(extra=[add_revision_two]) as (branch_dir, revision): |
412 | + branch = Branch.open(branch_dir) |
413 | + tree = branch.repository.revision_tree(revision) |
414 | + with read_locked(tree): |
415 | + result = slurp_files(self.fs, tree) |
416 | + test_data_dir = join( |
417 | + dirname(dirname(__file__)), 'testing/data/sample_charm') |
418 | + expected_file_paths = set() |
419 | + for dirpath, dirnames, filenames in os.walk(test_data_dir): |
420 | + for filename in filenames: |
421 | + filepath = os.path.join(dirpath, filename) |
422 | + if stat.S_ISLNK(os.lstat(filepath).st_mode): |
423 | + continue |
424 | + relpath = filepath[len(test_data_dir) + 1:] |
425 | + expected_file_paths.add(relpath) |
426 | + expected_file_paths.add('also-common') |
427 | + self.assertEqual(result['common'], result['also-common']) |
Hi Abel.
This branch looks good. I think It fixes bug 1189992. I have some questions.
I would like a doc comment added to find_files(). Maybe?
"""Yield file_path, file_id of valid charm files.
Unrecognised files and directories are ignored. Symlink outside of the charm
are ignored.
"""
I know we have tests for the conditions described by my proposed docstring, but the are probably still names ...save_files....
save_revision_files docstring needs fixing:
"""Store relevant files from a charm revision."""
slurp_files() needs a docstring now that there is a non-obvious contract proved by entry_source. Maybe
"""Return a dict of file paths and file content hashes.
:param fs: A file system, such as gridFS.
:param tree: A bazaar tree.
:param entry_source: A generator of (file_path, file_id) tuples to be slurped.
The default entry_source only finds files.
"""
I think test_save_ revision_ files_from_ old_revision( ) and test_save_ revision_ files_from_ new_revision( ) are functionally identical. I think the two extra lines in the "new" test will pass when added to the "old" test. Is the setup wrong or is the revision under test wrong?
I expected to see a test where the content of a file is changed in a new revision. A comparison of charm_data['files'] for the old and new revisions will confirm the changed file has a new hash.
I think test_slurp_ files_saves_ files_with_ identical_ content( ) needs a different name. My reading of the code is that the method does not save duplicate files. It guarantees that it returns the hash/id of the file content, and may save the file content if it does not already exist.