Merge lp:~mwilck/duplicity/duplicity into lp:duplicity

Proposed by Martin Wilck on 2016-07-27
Status: Merged
Merged at revision: 1116
Proposed branch: lp:~mwilck/duplicity/duplicity
Merge into: lp:duplicity
Diff against target: 74 lines (+19/-18)
2 files modified
duplicity/globmatch.py (+16/-12)
duplicity/selection.py (+3/-6)
To merge this branch: bzr merge lp:~mwilck/duplicity/duplicity
Reviewer Review Type Date Requested Status
Kenneth Loafman 2016-07-27 Approve on 2016-07-28
Review via email: mp+301268@code.launchpad.net

Description of the change

Glob matching in current duplicity uses a selector function that calls path_matches_glob(). This means that whenever a filename is matched, path_matches_glob() goes through the process of transforming a glob expression into regular expressions for filename and directory matching.

My proposed patches create a closure function instead that uses precalculated regular expressions; the regular expressions are thus constructed only once at initialization time.

This change speeds up duplicity a *lot* when complex include/exclude lists are in use: for my use case (dry-run, backup of an SSD filesystem), the speedup is a factor of 25 (runtime: 4s rather than 90s).

To post a comment you must log in.
Martin Wilck (mwilck) wrote :

The same change can be trivially merged into the 0.7 series, providing a similar speed-up. I did that in lp:~mwilck/duplicity/0.7-series. I'll postpone officially requesting this merge until someone has commented here.

Martin Wilck (mwilck) wrote :

I added a merge request for the 0.7-series now, https://code.launchpad.net/~mwilck/duplicity/0.7-series/+merge/301332.

Kenneth Loafman (kenneth-loafman) wrote :

'$ tox' yields the following:

======================================================================
FAIL: test_dirx (testing.functional.test_rdiffdir.RdiffdirTest)
Test cycle on testfiles/dirx
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/ken/workspace/duplicity-src8/testing/functional/test_rdiffdir.py", line 88, in test_dirx
    'testfiles/empty_dir'])
  File "/Users/ken/workspace/duplicity-src8/testing/functional/test_rdiffdir.py", line 50, in run_cycle
    self.run_rdiffdir("sig %s %s" % (seq_path.name, sig_path.name))
  File "/Users/ken/workspace/duplicity-src8/testing/functional/test_rdiffdir.py", line 37, in run_rdiffdir
    self.run_cmd("rdiffdir " + argstring)
  File "/Users/ken/workspace/duplicity-src8/testing/functional/test_rdiffdir.py", line 33, in run_cmd
    assert not os.system(command)
AssertionError

review: Needs Fixing
Kenneth Loafman (kenneth-loafman) wrote :

Never mind... I reran with 'tox -r' and all is well.

review: Approve
lp:~mwilck/duplicity/duplicity updated on 2016-07-28
1116. By ken on 2016-07-28

* Merged in lp:~mwilck/duplicity/duplicity
  - Speedup of path_matches_glob() by about 8x. See
    https://code.launchpad.net/~mwilck/duplicity/duplicity/+merge/301268
    for more details.

Aaron Whitehouse (aaron-whitehouse) wrote :

For reference, while this talks about a x25 (or x8) speedup, for me there is no material speed difference for the test suite between r 1115 and r 1116, so this speedup must be in a scenario not exhibited in the tests.

edso (ed.so) wrote :

On 10.02.2017 00:25, Aaron Whitehouse wrote:
> For reference, while this talks about a x25 (or x8) speedup, for me there is no material speed difference for the test suite between r 1115 and r 1116, so this speedup must be in a scenario not exhibited in the tests.
>

Aaron,

the slowdown only becomes apparent when dealing with many files. you may need to add a test with thousands of files to make it visible.

..ede/duply.net

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'duplicity/globmatch.py'
--- duplicity/globmatch.py 2016-06-27 21:12:18 +0000
+++ duplicity/globmatch.py 2016-07-27 13:08:36 +0000
@@ -49,8 +49,9 @@
49 return list(map(glob_to_regex, prefixes))49 return list(map(glob_to_regex, prefixes))
5050
5151
52def path_matches_glob(path, glob_str, include, ignore_case=False):52def path_matches_glob_fn(glob_str, include, ignore_case=False):
53 """Tests whether path matches glob, as per the Unix shell rules, taking as53 """Return a function test_fn(path) which
54 tests whether path matches glob, as per the Unix shell rules, taking as
54 arguments a path, a glob string and include (0 indicating that the glob55 arguments a path, a glob string and include (0 indicating that the glob
55 string is an exclude glob and 1 indicating that it is an include glob,56 string is an exclude glob and 1 indicating that it is an include glob,
56 returning:57 returning:
@@ -83,16 +84,19 @@
83 scan_comp_re = re_comp("^(%s)$" %84 scan_comp_re = re_comp("^(%s)$" %
84 "|".join(_glob_get_prefix_regexs(glob_str)))85 "|".join(_glob_get_prefix_regexs(glob_str)))
8586
86 if match_only_dirs and not path.isdir():87 def test_fn(path):
87 # If the glob ended with a /, only match directories88
88 return None89 if match_only_dirs and not path.isdir():
89 elif glob_comp_re.match(path.name):90 # If the glob ended with a /, only match directories
90 return include91 return None
91 elif include == 1 and scan_comp_re.match(path.name):92 elif glob_comp_re.match(path.name):
92 return 293 return include
93 else:94 elif include == 1 and scan_comp_re.match(path.name):
94 return None95 return 2
9596 else:
97 return None
98
99 return test_fn
96100
97def glob_to_regex(pat):101def glob_to_regex(pat):
98 """Returned regular expression equivalent to shell glob pat102 """Returned regular expression equivalent to shell glob pat
99103
=== modified file 'duplicity/selection.py'
--- duplicity/selection.py 2016-07-24 12:30:45 +0000
+++ duplicity/selection.py 2016-07-27 13:08:36 +0000
@@ -33,7 +33,7 @@
33from duplicity import diffdir33from duplicity import diffdir
34from duplicity import util # @Reimport34from duplicity import util # @Reimport
35from duplicity.globmatch import GlobbingError, FilePrefixError, \35from duplicity.globmatch import GlobbingError, FilePrefixError, \
36 path_matches_glob36 path_matches_glob_fn
3737
38"""Iterate exactly the requested files in a directory38"""Iterate exactly the requested files in a directory
3939
@@ -544,13 +544,10 @@
544 ignore_case = True544 ignore_case = True
545545
546 # Check to make sure prefix is ok546 # Check to make sure prefix is ok
547 if not path_matches_glob(self.rootpath, glob_str, include=1):547 if not path_matches_glob_fn(glob_str, include=1)(self.rootpath):
548 raise FilePrefixError(glob_str)548 raise FilePrefixError(glob_str)
549549
550 def sel_func(path):550 return path_matches_glob_fn(glob_str, include, ignore_case)
551 return path_matches_glob(path, glob_str, include, ignore_case)
552
553 return sel_func
554551
555 def exclude_older_get_sf(self, date):552 def exclude_older_get_sf(self, date):
556 """Return selection function based on files older than modification date """553 """Return selection function based on files older than modification date """

Subscribers

People subscribed via source and target branches