Merge lp:~aaron-whitehouse/duplicity/08-unadorned-strings into lp:~duplicity-team/duplicity/0.8-series

Proposed by Aaron Whitehouse
Status: Merged
Merged at revision: 1306
Proposed branch: lp:~aaron-whitehouse/duplicity/08-unadorned-strings
Merge into: lp:~duplicity-team/duplicity/0.8-series
Prerequisite: lp:~aaron-whitehouse/duplicity/08-pycodestyle
Diff against target: 340 lines (+253/-39)
2 files modified
testing/find_unadorned_strings.py (+73/-0)
testing/test_code.py (+180/-39)
To merge this branch: bzr merge lp:~aaron-whitehouse/duplicity/08-unadorned-strings
Reviewer Review Type Date Requested Status
duplicity-team Pending
Review via email: mp+347721@code.launchpad.net

Commit message

* Added new script to find unadorned strings (testing/find_unadorned_strings.py python_file) which prints all unadorned strings in a .py file.
* Added a new test to test_code.py that checks across all files for unadorned strings and gives an error if any are found (most files are in an ignore list at this stage, but this will allow us to incrementally remove the exceptions as we adorn the strings in each file).
* Adorn string literals in test_code.py with u/b

Description of the change

As set out in the Python 3 blueprint: https://blueprints.launchpad.net/duplicity/+spec/python3
one of the most time consuming, and least easy to automate, parts of supporting both Python 2 and 3 is string literals. This is because simple strings (e.g. a = "Hello") will be treated as bytes (e.g. encoded ASCII) in Python 2 and Unicode in Python 3. As we are trying to support both Python 2 and Python 3 for at least a transition period, we may end up with odd behaviour wherever we have an unadorned string.

The versions of Python 2 and 3 we are targeting means that we can "adorn" strings with letters to indicate what type of string (u for Unicode, b for Bytes and r for Raw/regexes).

An important preliminary step to Python 2/3 support is therefore for us to add these adornments to each and every string literal in the code base.

To ensure that we can find these and do not accidentally introduce more unadorned strings, this merge request adds a function to our test_code that automatically checks all .py files for unadorned strings and gives an error if any are found.

The actual work to adorn all of these strings will be substantial, so that is not all done in this merge request. Instead, this takes the approach we have for many of our other code style checks, where it currently contains a very long list of excluded files (which are not checked) and we can remove these exceptions as we adorn the strings in each file.

To assist people in finding and correcting all of the unadorned strings in a particular file, the new file testing/find_unadorned_strings.py can be executed directly with a python file as an argument:
./find_unadorned_strings python_file.py
and it will return a nicely-formatted list of all unadorned strings in the file that need to be corrected.

As the codebase is currently Python 2 only, marking strings as Bytes (b" ") essentially preserves current behaviour, but it is highly desirable to convert as many of these as possible to Unicode strings (u" "), as these will be much easier to work with as we transition to Python 3 and it will improve non-ASCII support. This will likely require changes to other parts of the code that interact with the string. The broad recommended approach for text is to decode at the boundaries (e.g. when reading from or writing to files) and use Unicode throughout internally. Many built-ins and libraries natively support Unicode, so in many cases very little needs to change to the code.

Many helper variables/functions have already been created in duplicity so that you can use Unicode wherever possible. For paths, for example, you can use Path.uname instead of Path.name.

To post a comment you must log in.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== added file 'testing/find_unadorned_strings.py'
--- testing/find_unadorned_strings.py 1970-01-01 00:00:00 +0000
+++ testing/find_unadorned_strings.py 2018-06-08 21:22:09 +0000
@@ -0,0 +1,73 @@
1#!/usr/bin/env python3
2# -*- Mode:Python; indent-tabs-mode:nil; tab-width:4 -*-
3#
4# Copyright 2018 Aaron Whitehouse <aaron@whitehouse.kiwi.nz>
5#
6# This file is part of duplicity.
7#
8# Duplicity is free software; you can redistribute it and/or modify it
9# under the terms of the GNU General Public License as published by the
10# Free Software Foundation; either version 2 of the License, or (at your
11# option) any later version.
12#
13# Duplicity is distributed in the hope that it will be useful, but
14# WITHOUT ANY WARRANTY; without even the implied warranty of
15# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
16# General Public License for more details.
17#
18# You should have received a copy of the GNU General Public License
19# along with duplicity; if not, write to the Free Software Foundation,
20# Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
21
22# For predictable results in python2/3 all string literals need to be marked as unicode, bytes or raw
23# This code finds all unadorned string literals (strings that are not marked with a u, b or r)
24
25import sys
26import tokenize
27import token
28
29# Unfortunately Python2 does not have the useful named tuple result from tokenize.tokenize,
30# so we have to recreate the effect using namedtuple and tokenize.generate_tokens
31from collections import namedtuple
32Python2_token = namedtuple(u'Python2_token', u'type string start end line')
33
34
35def return_unadorned_string_tokens(f):
36 if sys.version_info[0] < 3:
37 unnamed_tokens = tokenize.generate_tokens(f.readline)
38 for t in unnamed_tokens:
39 named_token = Python2_token(token.tok_name[t[0]], *t[1:])
40 if named_token.type == u"STRING" and named_token.string[0] in [u'"', u"'"]:
41 yield named_token
42
43 else:
44 named_tokens = tokenize.tokenize(f.readline)
45 for t in named_tokens:
46 if t.type == token.STRING and t.string[0] in [u'"', u"'"]:
47 yield t
48
49
50def check_file_for_unadorned(python_file):
51 unadorned_string_list = []
52 with open(python_file, u'rb') as f:
53 for s in return_unadorned_string_tokens(f):
54 unadorned_string_list.append((python_file, s.start, s.end, s.string))
55 return unadorned_string_list
56
57
58if __name__ == u"__main__":
59 import argparse
60
61 parser = argparse.ArgumentParser(description=u'Find any unadorned string literals in a Python file')
62 parser.add_argument(u'file', help=u'The file to search')
63 args = parser.parse_args()
64
65 unadorned_string_list = check_file_for_unadorned(args.file)
66 if len(unadorned_string_list) == 0:
67 print(u"There are no unadorned strings in", args.file)
68 else:
69 print(u"There are unadorned strings in", args.file, u"\n")
70 for unadorned_string in unadorned_string_list:
71 print(unadorned_string)
72 python_file, string_start, string_end, string = unadorned_string
73 print(string_start, string)
074
=== modified file 'testing/test_code.py'
--- testing/test_code.py 2017-12-13 21:03:13 +0000
+++ testing/test_code.py 2018-06-08 21:22:09 +0000
@@ -21,16 +21,19 @@
21import os21import os
22import subprocess22import subprocess
23import pytest23import pytest
24import fnmatch
25import os
2426
25if os.getenv('RUN_CODE_TESTS', None) == '1':27if os.getenv(u'RUN_CODE_TESTS', None) == u'1':
26 # Make conditional so that we do not have to import in environments that28 # Make conditional so that we do not have to import in environments that
27 # do not run the tests (e.g. the build servers)29 # do not run the tests (e.g. the build servers)
28 import pycodestyle30 import pycodestyle
2931
30from . import _top_dir, DuplicityTestCase # @IgnorePep832from . import _top_dir, DuplicityTestCase # @IgnorePep8
33from . import find_unadorned_strings
3134
32skipCodeTest = pytest.mark.skipif(not os.getenv('RUN_CODE_TESTS', None) == '1',35skipCodeTest = pytest.mark.skipif(not os.getenv(u'RUN_CODE_TESTS', None) == u'1',
33 reason='Must set environment var RUN_CODE_TESTS=1')36 reason=u'Must set environment var RUN_CODE_TESTS=1')
3437
3538
36class CodeTest(DuplicityTestCase):39class CodeTest(DuplicityTestCase):
@@ -41,61 +44,199 @@
41 stderr=subprocess.PIPE)44 stderr=subprocess.PIPE)
42 output = process.communicate()[0]45 output = process.communicate()[0]
43 self.assertTrue(process.returncode in returncodes, output)46 self.assertTrue(process.returncode in returncodes, output)
44 self.assertEqual("", output, output)47 self.assertEqual(u"", output, output)
4548
46 @skipCodeTest49 @skipCodeTest
47 def test_2to3(self):50 def test_2to3(self):
48 # As we modernize the source code, we can remove more and more nofixes51 # As we modernize the source code, we can remove more and more nofixes
49 self.run_checker([52 self.run_checker([
50 "2to3",53 u"2to3",
51 "--nofix=next",54 u"--nofix=next",
52 "--nofix=types",55 u"--nofix=types",
53 "--nofix=unicode",56 u"--nofix=unicode",
54 # The following fixes we don't want to remove, since they are false57 # The following fixes we don't want to remove, since they are false
55 # positives, things we don't care about, or real incompatibilities58 # positives, things we don't care about, or real incompatibilities
56 # but which 2to3 can fix for us better automatically.59 # but which 2to3 can fix for us better automatically.
57 "--nofix=callable",60 u"--nofix=callable",
58 "--nofix=dict",61 u"--nofix=dict",
59 "--nofix=future",62 u"--nofix=future",
60 "--nofix=imports",63 u"--nofix=imports",
61 "--nofix=print",64 u"--nofix=print",
62 "--nofix=raw_input",65 u"--nofix=raw_input",
63 "--nofix=urllib",66 u"--nofix=urllib",
64 "--nofix=xrange",67 u"--nofix=xrange",
65 _top_dir])68 _top_dir])
6669
67 @skipCodeTest70 @skipCodeTest
68 def test_pylint(self):71 def test_pylint(self):
69 """Pylint test (requires pylint to be installed to pass)"""72 u"""Pylint test (requires pylint to be installed to pass)"""
70 self.run_checker([73 self.run_checker([
71 "pylint",74 u"pylint",
72 "-E",75 u"-E",
73 "--msg-template={msg_id}: {line}: {msg}",76 u"--msg-template={msg_id}: {line}: {msg}",
74 "--disable=E0203", # Access to member before its definition line77 u"--disable=E0203", # Access to member before its definition line
75 "--disable=E0602", # Undefined variable78 u"--disable=E0602", # Undefined variable
76 "--disable=E0611", # No name in module79 u"--disable=E0611", # No name in module
77 "--disable=E1101", # Has no member80 u"--disable=E1101", # Has no member
78 "--disable=E1103", # Maybe has no member81 u"--disable=E1103", # Maybe has no member
79 "--disable=E0712", # Catching an exception which doesn't inherit from BaseException82 u"--disable=E0712", # Catching an exception which doesn't inherit from BaseException
80 "--ignore=_librsync.so",83 u"--ignore=_librsync.so",
81 "--msg-template='{path}:{line}: [{msg_id}({symbol}), {obj}] {msg}'",84 u"--msg-template='{path}:{line}: [{msg_id}({symbol}), {obj}] {msg}'",
82 os.path.join(_top_dir, 'duplicity'),85 os.path.join(_top_dir, u'duplicity'),
83 os.path.join(_top_dir, 'bin/duplicity'),86 os.path.join(_top_dir, u'bin/duplicity'),
84 os.path.join(_top_dir, 'bin/rdiffdir')],87 os.path.join(_top_dir, u'bin/rdiffdir')],
85 # Allow usage errors, older versions don't have88 # Allow usage errors, older versions don't have
86 # --msg-template89 # --msg-template
87 [0, 32])90 [0, 32])
8891
89 @skipCodeTest92 @skipCodeTest
90 def test_pep8(self):93 def test_pep8(self):
91 """Test that we conform to PEP-8 using pycodestyle."""94 u"""Test that we conform to PEP-8 using pycodestyle."""
92 # Note that the settings, ignores etc for pycodestyle are set in tox.ini, not here95 # Note that the settings, ignores etc for pycodestyle are set in tox.ini, not here
93 style = pycodestyle.StyleGuide(config_file=os.path.join(_top_dir, 'tox.ini'))96 style = pycodestyle.StyleGuide(config_file=os.path.join(_top_dir, u'tox.ini'))
94 result = style.check_files([os.path.join(_top_dir, 'duplicity'),97 result = style.check_files([os.path.join(_top_dir, u'duplicity'),
95 os.path.join(_top_dir, 'bin/duplicity'),98 os.path.join(_top_dir, u'bin/duplicity'),
96 os.path.join(_top_dir, 'bin/rdiffdir')])99 os.path.join(_top_dir, u'bin/rdiffdir')])
97 self.assertEqual(result.total_errors, 0,100 self.assertEqual(result.total_errors, 0,
98 "Found %s code style errors (and warnings)." % result.total_errors)101 u"Found %s code style errors (and warnings)." % result.total_errors)
99102
100if __name__ == "__main__":103 @skipCodeTest
104 def test_unadorned_string_literals(self):
105 u"""For predictable results in python2/3 all string literals need to be marked as unicode, bytes or raw"""
106
107 ignored_files = [os.path.join(_top_dir, u'.tox', u'*'), # These are not source files we want to check
108 os.path.join(_top_dir, u'.eggs', u'*'),
109 # TODO Every file from here down needs to be fixed and the exclusion removed
110 os.path.join(_top_dir, u'setup.py'),
111 os.path.join(_top_dir, u'docs', u'conf.py'),
112 os.path.join(_top_dir, u'duplicity', u'__init__.py'),
113 os.path.join(_top_dir, u'duplicity', u'asyncscheduler.py'),
114 os.path.join(_top_dir, u'duplicity', u'backend.py'),
115 os.path.join(_top_dir, u'duplicity', u'backends', u'__init__.py'),
116 os.path.join(_top_dir, u'duplicity', u'backends', u'_boto_multi.py'),
117 os.path.join(_top_dir, u'duplicity', u'backends', u'_boto_single.py'),
118 os.path.join(_top_dir, u'duplicity', u'backends', u'_cf_cloudfiles.py'),
119 os.path.join(_top_dir, u'duplicity', u'backends', u'_cf_pyrax.py'),
120 os.path.join(_top_dir, u'duplicity', u'backends', u'acdclibackend.py'),
121 os.path.join(_top_dir, u'duplicity', u'backends', u'adbackend.py'),
122 os.path.join(_top_dir, u'duplicity', u'backends', u'azurebackend.py'),
123 os.path.join(_top_dir, u'duplicity', u'backends', u'b2backend.py'),
124 os.path.join(_top_dir, u'duplicity', u'backends', u'botobackend.py'),
125 os.path.join(_top_dir, u'duplicity', u'backends', u'cfbackend.py'),
126 os.path.join(_top_dir, u'duplicity', u'backends', u'dpbxbackend.py'),
127 os.path.join(_top_dir, u'duplicity', u'backends', u'gdocsbackend.py'),
128 os.path.join(_top_dir, u'duplicity', u'backends', u'giobackend.py'),
129 os.path.join(_top_dir, u'duplicity', u'backends', u'hsibackend.py'),
130 os.path.join(_top_dir, u'duplicity', u'backends', u'hubicbackend.py'),
131 os.path.join(_top_dir, u'duplicity', u'backends', u'imapbackend.py'),
132 os.path.join(_top_dir, u'duplicity', u'backends', u'jottacloudbackend.py'),
133 os.path.join(_top_dir, u'duplicity', u'backends', u'lftpbackend.py'),
134 os.path.join(_top_dir, u'duplicity', u'backends', u'localbackend.py'),
135 os.path.join(_top_dir, u'duplicity', u'backends', u'mediafirebackend.py'),
136 os.path.join(_top_dir, u'duplicity', u'backends', u'megabackend.py'),
137 os.path.join(_top_dir, u'duplicity', u'backends', u'multibackend.py'),
138 os.path.join(_top_dir, u'duplicity', u'backends', u'ncftpbackend.py'),
139 os.path.join(_top_dir, u'duplicity', u'backends', u'onedrivebackend.py'),
140 os.path.join(_top_dir, u'duplicity', u'backends', u'par2backend.py'),
141 os.path.join(_top_dir, u'duplicity', u'backends', u'pcabackend.py'),
142 os.path.join(_top_dir, u'duplicity', u'backends', u'pydrivebackend.py'),
143 os.path.join(_top_dir, u'duplicity', u'backends', u'pyrax_identity', u'__init__.py'),
144 os.path.join(_top_dir, u'duplicity', u'backends', u'pyrax_identity', u'hubic.py'),
145 os.path.join(_top_dir, u'duplicity', u'backends', u'rsyncbackend.py'),
146 os.path.join(_top_dir, u'duplicity', u'backends', u'ssh_paramiko_backend.py'),
147 os.path.join(_top_dir, u'duplicity', u'backends', u'ssh_pexpect_backend.py'),
148 os.path.join(_top_dir, u'duplicity', u'backends', u'swiftbackend.py'),
149 os.path.join(_top_dir, u'duplicity', u'backends', u'sxbackend.py'),
150 os.path.join(_top_dir, u'duplicity', u'backends', u'tahoebackend.py'),
151 os.path.join(_top_dir, u'duplicity', u'backends', u'webdavbackend.py'),
152 os.path.join(_top_dir, u'duplicity', u'cached_ops.py'),
153 os.path.join(_top_dir, u'duplicity', u'collections.py'),
154 os.path.join(_top_dir, u'duplicity', u'commandline.py'),
155 os.path.join(_top_dir, u'duplicity', u'compilec.py'),
156 os.path.join(_top_dir, u'duplicity', u'diffdir.py'),
157 os.path.join(_top_dir, u'duplicity', u'dup_temp.py'),
158 os.path.join(_top_dir, u'duplicity', u'dup_threading.py'),
159 os.path.join(_top_dir, u'duplicity', u'dup_time.py'),
160 os.path.join(_top_dir, u'duplicity', u'errors.py'),
161 os.path.join(_top_dir, u'duplicity', u'file_naming.py'),
162 os.path.join(_top_dir, u'duplicity', u'filechunkio.py'),
163 os.path.join(_top_dir, u'duplicity', u'globals.py'),
164 os.path.join(_top_dir, u'duplicity', u'globmatch.py'),
165 os.path.join(_top_dir, u'duplicity', u'gpg.py'),
166 os.path.join(_top_dir, u'duplicity', u'gpginterface.py'),
167 os.path.join(_top_dir, u'duplicity', u'lazy.py'),
168 os.path.join(_top_dir, u'duplicity', u'librsync.py'),
169 os.path.join(_top_dir, u'duplicity', u'log.py'),
170 os.path.join(_top_dir, u'duplicity', u'manifest.py'),
171 os.path.join(_top_dir, u'duplicity', u'patchdir.py'),
172 os.path.join(_top_dir, u'duplicity', u'path.py'),
173 os.path.join(_top_dir, u'duplicity', u'progress.py'),
174 os.path.join(_top_dir, u'duplicity', u'robust.py'),
175 os.path.join(_top_dir, u'duplicity', u'selection.py'),
176 os.path.join(_top_dir, u'duplicity', u'statistics.py'),
177 os.path.join(_top_dir, u'duplicity', u'tarfile.py'),
178 os.path.join(_top_dir, u'duplicity', u'tempdir.py'),
179 os.path.join(_top_dir, u'duplicity', u'util.py'),
180 os.path.join(_top_dir, u'testing', u'__init__.py'),
181 os.path.join(_top_dir, u'testing', u'find_unadorned_strings.py'),
182 os.path.join(_top_dir, u'testing', u'functional', u'__init__.py'),
183 os.path.join(_top_dir, u'testing', u'functional', u'test_badupload.py'),
184 os.path.join(_top_dir, u'testing', u'functional', u'test_cleanup.py'),
185 os.path.join(_top_dir, u'testing', u'functional', u'test_final.py'),
186 os.path.join(_top_dir, u'testing', u'functional', u'test_log.py'),
187 os.path.join(_top_dir, u'testing', u'functional', u'test_rdiffdir.py'),
188 os.path.join(_top_dir, u'testing', u'functional', u'test_replicate.py'),
189 os.path.join(_top_dir, u'testing', u'functional', u'test_restart.py'),
190 os.path.join(_top_dir, u'testing', u'functional', u'test_selection.py'),
191 os.path.join(_top_dir, u'testing', u'functional', u'test_verify.py'),
192 os.path.join(_top_dir, u'testing', u'manual', u'__init__.py'),
193 os.path.join(_top_dir, u'testing', u'overrides', u'__init__.py'),
194 os.path.join(_top_dir, u'testing', u'overrides', u'gettext.py'),
195 os.path.join(_top_dir, u'testing', u'test_unadorned.py'),
196 os.path.join(_top_dir, u'testing', u'unit', u'__init__.py'),
197 os.path.join(_top_dir, u'testing', u'unit', u'test_backend_instance.py'),
198 os.path.join(_top_dir, u'testing', u'unit', u'test_backend.py'),
199 os.path.join(_top_dir, u'testing', u'unit', u'test_collections.py'),
200 os.path.join(_top_dir, u'testing', u'unit', u'test_diffdir.py'),
201 os.path.join(_top_dir, u'testing', u'unit', u'test_dup_temp.py'),
202 os.path.join(_top_dir, u'testing', u'unit', u'test_dup_time.py'),
203 os.path.join(_top_dir, u'testing', u'unit', u'test_file_naming.py'),
204 os.path.join(_top_dir, u'testing', u'unit', u'test_globmatch.py'),
205 os.path.join(_top_dir, u'testing', u'unit', u'test_gpg.py'),
206 os.path.join(_top_dir, u'testing', u'unit', u'test_gpginterface.py'),
207 os.path.join(_top_dir, u'testing', u'unit', u'test_lazy.py'),
208 os.path.join(_top_dir, u'testing', u'unit', u'test_manifest.py'),
209 os.path.join(_top_dir, u'testing', u'unit', u'test_patchdir.py'),
210 os.path.join(_top_dir, u'testing', u'unit', u'test_path.py'),
211 os.path.join(_top_dir, u'testing', u'unit', u'test_selection.py'),
212 os.path.join(_top_dir, u'testing', u'unit', u'test_statistics.py'),
213 os.path.join(_top_dir, u'testing', u'unit', u'test_tarfile.py'),
214 os.path.join(_top_dir, u'testing', u'unit', u'test_tempdir.py')]
215
216
217 # Find all the .py files in the duplicity tree
218 # We cannot use glob.glob recursive until we drop support for Python < 3.5
219 matches = []
220
221 def multi_filter(names, patterns):
222 u"""Generator function which yields the names that match one or more of the patterns."""
223 for name in names:
224 if any(fnmatch.fnmatch(name, pattern) for pattern in patterns):
225 yield name
226
227 for root, dirnames, filenames in os.walk(_top_dir):
228 for filename in fnmatch.filter(filenames, u'*.py'):
229 matches.append(os.path.join(root, filename))
230
231 excluded = multi_filter(matches, ignored_files) if ignored_files else []
232 matches = list(set(matches) - set(excluded))
233
234 for python_source_file in matches:
235 # Check each of the relevant python sources for unadorned string literals
236 unadorned_string_list = find_unadorned_strings.check_file_for_unadorned(python_source_file)
237 self.assertEqual([], unadorned_string_list,
238 u"Found %s unadorned strings: \n %s" % (len(unadorned_string_list), unadorned_string_list))
239
240
241if __name__ == u"__main__":
101 unittest.main()242 unittest.main()

Subscribers

People subscribed via source and target branches