Merge lp:~mwilck/duplicity/duplicity into lp:duplicity

Proposed by Martin Wilck on 2016-07-27
Status: Merged
Merged at revision: 1116
Proposed branch: lp:~mwilck/duplicity/duplicity
Merge into: lp:duplicity
Diff against target: 74 lines (+19/-18)
2 files modified
duplicity/globmatch.py (+16/-12)
duplicity/selection.py (+3/-6)
To merge this branch: bzr merge lp:~mwilck/duplicity/duplicity
Reviewer Review Type Date Requested Status
Kenneth Loafman 2016-07-27 Approve on 2016-07-28
Review via email: mp+301268@code.launchpad.net

Description of the change

Glob matching in current duplicity uses a selector function that calls path_matches_glob(). This means that whenever a filename is matched, path_matches_glob() goes through the process of transforming a glob expression into regular expressions for filename and directory matching.

My proposed patches create a closure function instead that uses precalculated regular expressions; the regular expressions are thus constructed only once at initialization time.

This change speeds up duplicity a *lot* when complex include/exclude lists are in use: for my use case (dry-run, backup of an SSD filesystem), the speedup is a factor of 25 (runtime: 4s rather than 90s).

To post a comment you must log in.
Martin Wilck (mwilck) wrote :

The same change can be trivially merged into the 0.7 series, providing a similar speed-up. I did that in lp:~mwilck/duplicity/0.7-series. I'll postpone officially requesting this merge until someone has commented here.

Martin Wilck (mwilck) wrote :

I added a merge request for the 0.7-series now, https://code.launchpad.net/~mwilck/duplicity/0.7-series/+merge/301332.

Kenneth Loafman (kenneth-loafman) wrote :

'$ tox' yields the following:

======================================================================
FAIL: test_dirx (testing.functional.test_rdiffdir.RdiffdirTest)
Test cycle on testfiles/dirx
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/ken/workspace/duplicity-src8/testing/functional/test_rdiffdir.py", line 88, in test_dirx
    'testfiles/empty_dir'])
  File "/Users/ken/workspace/duplicity-src8/testing/functional/test_rdiffdir.py", line 50, in run_cycle
    self.run_rdiffdir("sig %s %s" % (seq_path.name, sig_path.name))
  File "/Users/ken/workspace/duplicity-src8/testing/functional/test_rdiffdir.py", line 37, in run_rdiffdir
    self.run_cmd("rdiffdir " + argstring)
  File "/Users/ken/workspace/duplicity-src8/testing/functional/test_rdiffdir.py", line 33, in run_cmd
    assert not os.system(command)
AssertionError

review: Needs Fixing
Kenneth Loafman (kenneth-loafman) wrote :

Never mind... I reran with 'tox -r' and all is well.

review: Approve
lp:~mwilck/duplicity/duplicity updated on 2016-07-28
1116. By ken on 2016-07-28

* Merged in lp:~mwilck/duplicity/duplicity
  - Speedup of path_matches_glob() by about 8x. See
    https://code.launchpad.net/~mwilck/duplicity/duplicity/+merge/301268
    for more details.

Aaron Whitehouse (aaron-whitehouse) wrote :

For reference, while this talks about a x25 (or x8) speedup, for me there is no material speed difference for the test suite between r 1115 and r 1116, so this speedup must be in a scenario not exhibited in the tests.

edso (ed.so) wrote :

On 10.02.2017 00:25, Aaron Whitehouse wrote:
> For reference, while this talks about a x25 (or x8) speedup, for me there is no material speed difference for the test suite between r 1115 and r 1116, so this speedup must be in a scenario not exhibited in the tests.
>

Aaron,

the slowdown only becomes apparent when dealing with many files. you may need to add a test with thousands of files to make it visible.

..ede/duply.net

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'duplicity/globmatch.py'
2--- duplicity/globmatch.py 2016-06-27 21:12:18 +0000
3+++ duplicity/globmatch.py 2016-07-27 13:08:36 +0000
4@@ -49,8 +49,9 @@
5 return list(map(glob_to_regex, prefixes))
6
7
8-def path_matches_glob(path, glob_str, include, ignore_case=False):
9- """Tests whether path matches glob, as per the Unix shell rules, taking as
10+def path_matches_glob_fn(glob_str, include, ignore_case=False):
11+ """Return a function test_fn(path) which
12+ tests whether path matches glob, as per the Unix shell rules, taking as
13 arguments a path, a glob string and include (0 indicating that the glob
14 string is an exclude glob and 1 indicating that it is an include glob,
15 returning:
16@@ -83,16 +84,19 @@
17 scan_comp_re = re_comp("^(%s)$" %
18 "|".join(_glob_get_prefix_regexs(glob_str)))
19
20- if match_only_dirs and not path.isdir():
21- # If the glob ended with a /, only match directories
22- return None
23- elif glob_comp_re.match(path.name):
24- return include
25- elif include == 1 and scan_comp_re.match(path.name):
26- return 2
27- else:
28- return None
29-
30+ def test_fn(path):
31+
32+ if match_only_dirs and not path.isdir():
33+ # If the glob ended with a /, only match directories
34+ return None
35+ elif glob_comp_re.match(path.name):
36+ return include
37+ elif include == 1 and scan_comp_re.match(path.name):
38+ return 2
39+ else:
40+ return None
41+
42+ return test_fn
43
44 def glob_to_regex(pat):
45 """Returned regular expression equivalent to shell glob pat
46
47=== modified file 'duplicity/selection.py'
48--- duplicity/selection.py 2016-07-24 12:30:45 +0000
49+++ duplicity/selection.py 2016-07-27 13:08:36 +0000
50@@ -33,7 +33,7 @@
51 from duplicity import diffdir
52 from duplicity import util # @Reimport
53 from duplicity.globmatch import GlobbingError, FilePrefixError, \
54- path_matches_glob
55+ path_matches_glob_fn
56
57 """Iterate exactly the requested files in a directory
58
59@@ -544,13 +544,10 @@
60 ignore_case = True
61
62 # Check to make sure prefix is ok
63- if not path_matches_glob(self.rootpath, glob_str, include=1):
64+ if not path_matches_glob_fn(glob_str, include=1)(self.rootpath):
65 raise FilePrefixError(glob_str)
66
67- def sel_func(path):
68- return path_matches_glob(path, glob_str, include, ignore_case)
69-
70- return sel_func
71+ return path_matches_glob_fn(glob_str, include, ignore_case)
72
73 def exclude_older_get_sf(self, date):
74 """Return selection function based on files older than modification date """

Subscribers

People subscribed via source and target branches