Duplicity

Merge lp:~aaron-whitehouse/duplicity/08-unadorned-strings into lp:~duplicity-team/duplicity/0.8-series

08-unadorned-strings
Merge into 0.8-series

Proposed by Aaron Whitehouse on 2018-06-08

Status:	Merged
Merged at revision:	1306
Proposed branch:	lp:~aaron-whitehouse/duplicity/08-unadorned-strings
Merge into:	lp:~duplicity-team/duplicity/0.8-series
Prerequisite:	lp:~aaron-whitehouse/duplicity/08-pycodestyle
Diff against target:	340 lines (+253/-39) 2 files modified testing/find_unadorned_strings.py (+73/-0) testing/test_code.py (+180/-39)
To merge this branch:	bzr merge lp:~aaron-whitehouse/duplicity/08-unadorned-strings
Related bugs:	Link a bug report
Related blueprints:	Python 3 Support (High) Adorn String Literals for Better Python 2/3 Support (Undefined)

Reviewer	Review Type	Date Requested	Status
duplicity-team		2018-06-08	Pending
Review via email: mp+347721@code.launchpad.net

Commit message

* Added new script to find unadorned strings (testing/find_unadorned_strings.py python_file) which prints all unadorned strings in a .py file.
* Added a new test to test_code.py that checks across all files for unadorned strings and gives an error if any are found (most files are in an ignore list at this stage, but this will allow us to incrementally remove the exceptions as we adorn the strings in each file).
* Adorn string literals in test_code.py with u/b

Description of the change

As set out in the Python 3 blueprint: https://blueprints.launchpad.net/duplicity/+spec/python3
one of the most time consuming, and least easy to automate, parts of supporting both Python 2 and 3 is string literals. This is because simple strings (e.g. a = "Hello") will be treated as bytes (e.g. encoded ASCII) in Python 2 and Unicode in Python 3. As we are trying to support both Python 2 and Python 3 for at least a transition period, we may end up with odd behaviour wherever we have an unadorned string.

The versions of Python 2 and 3 we are targeting means that we can "adorn" strings with letters to indicate what type of string (u for Unicode, b for Bytes and r for Raw/regexes).

An important preliminary step to Python 2/3 support is therefore for us to add these adornments to each and every string literal in the code base.

To ensure that we can find these and do not accidentally introduce more unadorned strings, this merge request adds a function to our test_code that automatically checks all .py files for unadorned strings and gives an error if any are found.

The actual work to adorn all of these strings will be substantial, so that is not all done in this merge request. Instead, this takes the approach we have for many of our other code style checks, where it currently contains a very long list of excluded files (which are not checked) and we can remove these exceptions as we adorn the strings in each file.

To assist people in finding and correcting all of the unadorned strings in a particular file, the new file testing/find_unadorned_strings.py can be executed directly with a python file as an argument:
./find_unadorned_strings python_file.py
and it will return a nicely-formatted list of all unadorned strings in the file that need to be corrected.

As the codebase is currently Python 2 only, marking strings as Bytes (b" ") essentially preserves current behaviour, but it is highly desirable to convert as many of these as possible to Unicode strings (u" "), as these will be much easier to work with as we transition to Python 3 and it will improve non-ASCII support. This will likely require changes to other parts of the code that interact with the string. The broad recommended approach for text is to decode at the boundaries (e.g. when reading from or writing to files) and use Unicode throughout internally. Many built-ins and libraries natively support Unicode, so in many cases very little needs to change to the code.

Many helper variables/functions have already been created in duplicity so that you can use Unicode wherever possible. For paths, for example, you can use Path.uname instead of Path.name.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Aaron Whitehouse

Kenneth Loafman

Michal Smereczynski

duplicity-team

 === added file 'testing/find_unadorned_strings.py'
 --- testing/find_unadorned_strings.py	1970-01-01 00:00:00 +0000
 +++ testing/find_unadorned_strings.py	2018-06-08 21:22:09 +0000
@@ -0,0 +1,73 @@
++#!/usr/bin/env python3
++# -*- Mode:Python; indent-tabs-mode:nil; tab-width:4 -*-
++#
++# Copyright 2018 Aaron Whitehouse <aaron@whitehouse.kiwi.nz>
++#
++# This file is part of duplicity.
++#
++# Duplicity is free software; you can redistribute it and/or modify it
++# under the terms of the GNU General Public License as published by the
++# Free Software Foundation; either version 2 of the License, or (at your
++# option) any later version.
++#
++# Duplicity is distributed in the hope that it will be useful, but
++# WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
++# General Public License for more details.
++#
++# You should have received a copy of the GNU General Public License
++# along with duplicity; if not, write to the Free Software Foundation,
++# Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
++
++# For predictable results in python2/3 all string literals need to be marked as unicode, bytes or raw
++# This code finds all unadorned string literals (strings that are not marked with a u, b or r)
++
++import sys
++import tokenize
++import token
++
++# Unfortunately Python2 does not have the useful named tuple result from tokenize.tokenize,
++# so we have to recreate the effect using namedtuple and tokenize.generate_tokens
++from collections import namedtuple
++Python2_token = namedtuple(u'Python2_token', u'type string start end line')
++
++
++def return_unadorned_string_tokens(f):
++    if sys.version_info[0] < 3:
++        unnamed_tokens = tokenize.generate_tokens(f.readline)
++        for t in unnamed_tokens:
++            named_token = Python2_token(token.tok_name[t[0]], *t[1:])
++            if named_token.type == u"STRING" and named_token.string[0] in [u'"', u"'"]:
++                yield named_token
++
++    else:
++        named_tokens = tokenize.tokenize(f.readline)
++        for t in named_tokens:
++            if t.type == token.STRING and t.string[0] in [u'"', u"'"]:
++                yield t
++
++
++def check_file_for_unadorned(python_file):
++    unadorned_string_list = []
++    with open(python_file, u'rb') as f:
++        for s in return_unadorned_string_tokens(f):
++            unadorned_string_list.append((python_file, s.start, s.end, s.string))
++    return unadorned_string_list
++
++
++if __name__ == u"__main__":
++    import argparse
++
++    parser = argparse.ArgumentParser(description=u'Find any unadorned string literals in a Python file')
++    parser.add_argument(u'file', help=u'The file to search')
++    args = parser.parse_args()
++
++    unadorned_string_list = check_file_for_unadorned(args.file)
++    if len(unadorned_string_list) == 0:
++        print(u"There are no unadorned strings in", args.file)
++    else:
++        print(u"There are unadorned strings in", args.file, u"\n")
++        for unadorned_string in unadorned_string_list:
++            print(unadorned_string)
++            python_file, string_start, string_end, string = unadorned_string
++            print(string_start, string)
 === modified file 'testing/test_code.py'
 --- testing/test_code.py	2017-12-13 21:03:13 +0000
 +++ testing/test_code.py	2018-06-08 21:22:09 +0000
@@ -21,16 +21,19 @@
  import os
  import subprocess
  import pytest
++import fnmatch
++import os
--if os.getenv('RUN_CODE_TESTS', None) == '1':
++if os.getenv(u'RUN_CODE_TESTS', None) == u'1':
      # Make conditional so that we do not have to import in environments that
      # do not run the tests (e.g. the build servers)
      import pycodestyle
  from . import _top_dir, DuplicityTestCase  # @IgnorePep8
++from . import find_unadorned_strings
--skipCodeTest = pytest.mark.skipif(not os.getenv('RUN_CODE_TESTS', None) == '1',
--                                  reason='Must set environment var RUN_CODE_TESTS=1')
++skipCodeTest = pytest.mark.skipif(not os.getenv(u'RUN_CODE_TESTS', None) == u'1',
++                                  reason=u'Must set environment var RUN_CODE_TESTS=1')
  class CodeTest(DuplicityTestCase):
@@ -41,61 +44,199 @@
                                     stderr=subprocess.PIPE)
          output = process.communicate()[0]
          self.assertTrue(process.returncode in returncodes, output)
--        self.assertEqual("", output, output)
++        self.assertEqual(u"", output, output)
      @skipCodeTest
      def test_2to3(self):
          # As we modernize the source code, we can remove more and more nofixes
          self.run_checker([
--            "2to3",
--            "--nofix=next",
--            "--nofix=types",
--            "--nofix=unicode",
++            u"2to3",
++            u"--nofix=next",
++            u"--nofix=types",
++            u"--nofix=unicode",
              # The following fixes we don't want to remove, since they are false
              # positives, things we don't care about, or real incompatibilities
              # but which 2to3 can fix for us better automatically.
--            "--nofix=callable",
--            "--nofix=dict",
--            "--nofix=future",
--            "--nofix=imports",
--            "--nofix=print",
--            "--nofix=raw_input",
--            "--nofix=urllib",
--            "--nofix=xrange",
++            u"--nofix=callable",
++            u"--nofix=dict",
++            u"--nofix=future",
++            u"--nofix=imports",
++            u"--nofix=print",
++            u"--nofix=raw_input",
++            u"--nofix=urllib",
++            u"--nofix=xrange",
              _top_dir])
      @skipCodeTest
      def test_pylint(self):
--        """Pylint test (requires pylint to be installed to pass)"""
++        u"""Pylint test (requires pylint to be installed to pass)"""
          self.run_checker([
--            "pylint",
--            "-E",
--            "--msg-template={msg_id}: {line}: {msg}",
--            "--disable=E0203",  # Access to member before its definition line
--            "--disable=E0602",  # Undefined variable
--            "--disable=E0611",  # No name in module
--            "--disable=E1101",  # Has no member
--            "--disable=E1103",  # Maybe has no member
--            "--disable=E0712",  # Catching an exception which doesn't inherit from BaseException
--            "--ignore=_librsync.so",
--            "--msg-template='{path}:{line}: [{msg_id}({symbol}), {obj}] {msg}'",
--            os.path.join(_top_dir, 'duplicity'),
--            os.path.join(_top_dir, 'bin/duplicity'),
--            os.path.join(_top_dir, 'bin/rdiffdir')],
++            u"pylint",
++            u"-E",
++            u"--msg-template={msg_id}: {line}: {msg}",
++            u"--disable=E0203",  # Access to member before its definition line
++            u"--disable=E0602",  # Undefined variable
++            u"--disable=E0611",  # No name in module
++            u"--disable=E1101",  # Has no member
++            u"--disable=E1103",  # Maybe has no member
++            u"--disable=E0712",  # Catching an exception which doesn't inherit from BaseException
++            u"--ignore=_librsync.so",
++            u"--msg-template='{path}:{line}: [{msg_id}({symbol}), {obj}] {msg}'",
++            os.path.join(_top_dir, u'duplicity'),
++            os.path.join(_top_dir, u'bin/duplicity'),
++            os.path.join(_top_dir, u'bin/rdiffdir')],
              # Allow usage errors, older versions don't have
              # --msg-template
              [0, 32])
      @skipCodeTest
      def test_pep8(self):
--        """Test that we conform to PEP-8 using pycodestyle."""
++        u"""Test that we conform to PEP-8 using pycodestyle."""
          # Note that the settings, ignores etc for pycodestyle are set in tox.ini, not here
--        style = pycodestyle.StyleGuide(config_file=os.path.join(_top_dir, 'tox.ini'))
--        result = style.check_files([os.path.join(_top_dir, 'duplicity'),
--                                    os.path.join(_top_dir, 'bin/duplicity'),
--                                    os.path.join(_top_dir, 'bin/rdiffdir')])
++        style = pycodestyle.StyleGuide(config_file=os.path.join(_top_dir, u'tox.ini'))
++        result = style.check_files([os.path.join(_top_dir, u'duplicity'),
++                                    os.path.join(_top_dir, u'bin/duplicity'),
++                                    os.path.join(_top_dir, u'bin/rdiffdir')])
          self.assertEqual(result.total_errors, 0,
--                         "Found %s code style errors (and warnings)." % result.total_errors)
--
--if __name__ == "__main__":
++                         u"Found %s code style errors (and warnings)." % result.total_errors)
++
++    @skipCodeTest
++    def test_unadorned_string_literals(self):
++        u"""For predictable results in python2/3 all string literals need to be marked as unicode, bytes or raw"""
++
++        ignored_files = [os.path.join(_top_dir, u'.tox', u'*'), # These are not source files we want to check
++                         os.path.join(_top_dir, u'.eggs', u'*'),
++                         # TODO Every file from here down needs to be fixed and the exclusion removed
++                         os.path.join(_top_dir, u'setup.py'),
++                         os.path.join(_top_dir, u'docs', u'conf.py'),
++                         os.path.join(_top_dir, u'duplicity', u'__init__.py'),
++                         os.path.join(_top_dir, u'duplicity', u'asyncscheduler.py'),
++                         os.path.join(_top_dir, u'duplicity', u'backend.py'),
++                         os.path.join(_top_dir, u'duplicity', u'backends', u'__init__.py'),
++                         os.path.join(_top_dir, u'duplicity', u'backends', u'_boto_multi.py'),
++                         os.path.join(_top_dir, u'duplicity', u'backends', u'_boto_single.py'),
++                         os.path.join(_top_dir, u'duplicity', u'backends', u'_cf_cloudfiles.py'),
++                         os.path.join(_top_dir, u'duplicity', u'backends', u'_cf_pyrax.py'),
++                         os.path.join(_top_dir, u'duplicity', u'backends', u'acdclibackend.py'),
++                         os.path.join(_top_dir, u'duplicity', u'backends', u'adbackend.py'),
++                         os.path.join(_top_dir, u'duplicity', u'backends', u'azurebackend.py'),
++                         os.path.join(_top_dir, u'duplicity', u'backends', u'b2backend.py'),
++                         os.path.join(_top_dir, u'duplicity', u'backends', u'botobackend.py'),
++                         os.path.join(_top_dir, u'duplicity', u'backends', u'cfbackend.py'),
++                         os.path.join(_top_dir, u'duplicity', u'backends', u'dpbxbackend.py'),
++                         os.path.join(_top_dir, u'duplicity', u'backends', u'gdocsbackend.py'),
++                         os.path.join(_top_dir, u'duplicity', u'backends', u'giobackend.py'),
++                         os.path.join(_top_dir, u'duplicity', u'backends', u'hsibackend.py'),
++                         os.path.join(_top_dir, u'duplicity', u'backends', u'hubicbackend.py'),
++                         os.path.join(_top_dir, u'duplicity', u'backends', u'imapbackend.py'),
++                         os.path.join(_top_dir, u'duplicity', u'backends', u'jottacloudbackend.py'),
++                         os.path.join(_top_dir, u'duplicity', u'backends', u'lftpbackend.py'),
++                         os.path.join(_top_dir, u'duplicity', u'backends', u'localbackend.py'),
++                         os.path.join(_top_dir, u'duplicity', u'backends', u'mediafirebackend.py'),
++                         os.path.join(_top_dir, u'duplicity', u'backends', u'megabackend.py'),
++                         os.path.join(_top_dir, u'duplicity', u'backends', u'multibackend.py'),
++                         os.path.join(_top_dir, u'duplicity', u'backends', u'ncftpbackend.py'),
++                         os.path.join(_top_dir, u'duplicity', u'backends', u'onedrivebackend.py'),
++                         os.path.join(_top_dir, u'duplicity', u'backends', u'par2backend.py'),
++                         os.path.join(_top_dir, u'duplicity', u'backends', u'pcabackend.py'),
++                         os.path.join(_top_dir, u'duplicity', u'backends', u'pydrivebackend.py'),
++                         os.path.join(_top_dir, u'duplicity', u'backends', u'pyrax_identity', u'__init__.py'),
++                         os.path.join(_top_dir, u'duplicity', u'backends', u'pyrax_identity', u'hubic.py'),
++                         os.path.join(_top_dir, u'duplicity', u'backends', u'rsyncbackend.py'),
++                         os.path.join(_top_dir, u'duplicity', u'backends', u'ssh_paramiko_backend.py'),
++                         os.path.join(_top_dir, u'duplicity', u'backends', u'ssh_pexpect_backend.py'),
++                         os.path.join(_top_dir, u'duplicity', u'backends', u'swiftbackend.py'),
++                         os.path.join(_top_dir, u'duplicity', u'backends', u'sxbackend.py'),
++                         os.path.join(_top_dir, u'duplicity', u'backends', u'tahoebackend.py'),
++                         os.path.join(_top_dir, u'duplicity', u'backends', u'webdavbackend.py'),
++                         os.path.join(_top_dir, u'duplicity', u'cached_ops.py'),
++                         os.path.join(_top_dir, u'duplicity', u'collections.py'),
++                         os.path.join(_top_dir, u'duplicity', u'commandline.py'),
++                         os.path.join(_top_dir, u'duplicity', u'compilec.py'),
++                         os.path.join(_top_dir, u'duplicity', u'diffdir.py'),
++                         os.path.join(_top_dir, u'duplicity', u'dup_temp.py'),
++                         os.path.join(_top_dir, u'duplicity', u'dup_threading.py'),
++                         os.path.join(_top_dir, u'duplicity', u'dup_time.py'),
++                         os.path.join(_top_dir, u'duplicity', u'errors.py'),
++                         os.path.join(_top_dir, u'duplicity', u'file_naming.py'),
++                         os.path.join(_top_dir, u'duplicity', u'filechunkio.py'),
++                         os.path.join(_top_dir, u'duplicity', u'globals.py'),
++                         os.path.join(_top_dir, u'duplicity', u'globmatch.py'),
++                         os.path.join(_top_dir, u'duplicity', u'gpg.py'),
++                         os.path.join(_top_dir, u'duplicity', u'gpginterface.py'),
++                         os.path.join(_top_dir, u'duplicity', u'lazy.py'),
++                         os.path.join(_top_dir, u'duplicity', u'librsync.py'),
++                         os.path.join(_top_dir, u'duplicity', u'log.py'),
++                         os.path.join(_top_dir, u'duplicity', u'manifest.py'),
++                         os.path.join(_top_dir, u'duplicity', u'patchdir.py'),
++                         os.path.join(_top_dir, u'duplicity', u'path.py'),
++                         os.path.join(_top_dir, u'duplicity', u'progress.py'),
++                         os.path.join(_top_dir, u'duplicity', u'robust.py'),
++                         os.path.join(_top_dir, u'duplicity', u'selection.py'),
++                         os.path.join(_top_dir, u'duplicity', u'statistics.py'),
++                         os.path.join(_top_dir, u'duplicity', u'tarfile.py'),
++                         os.path.join(_top_dir, u'duplicity', u'tempdir.py'),
++                         os.path.join(_top_dir, u'duplicity', u'util.py'),
++                         os.path.join(_top_dir, u'testing', u'__init__.py'),
++                         os.path.join(_top_dir, u'testing', u'find_unadorned_strings.py'),
++                         os.path.join(_top_dir, u'testing', u'functional', u'__init__.py'),
++                         os.path.join(_top_dir, u'testing', u'functional', u'test_badupload.py'),
++                         os.path.join(_top_dir, u'testing', u'functional', u'test_cleanup.py'),
++                         os.path.join(_top_dir, u'testing', u'functional', u'test_final.py'),
++                         os.path.join(_top_dir, u'testing', u'functional', u'test_log.py'),
++                         os.path.join(_top_dir, u'testing', u'functional', u'test_rdiffdir.py'),
++                         os.path.join(_top_dir, u'testing', u'functional', u'test_replicate.py'),
++                         os.path.join(_top_dir, u'testing', u'functional', u'test_restart.py'),
++                         os.path.join(_top_dir, u'testing', u'functional', u'test_selection.py'),
++                         os.path.join(_top_dir, u'testing', u'functional', u'test_verify.py'),
++                         os.path.join(_top_dir, u'testing', u'manual', u'__init__.py'),
++                         os.path.join(_top_dir, u'testing', u'overrides', u'__init__.py'),
++                         os.path.join(_top_dir, u'testing', u'overrides', u'gettext.py'),
++                         os.path.join(_top_dir, u'testing', u'test_unadorned.py'),
++                         os.path.join(_top_dir, u'testing', u'unit', u'__init__.py'),
++                         os.path.join(_top_dir, u'testing', u'unit', u'test_backend_instance.py'),
++                         os.path.join(_top_dir, u'testing', u'unit', u'test_backend.py'),
++                         os.path.join(_top_dir, u'testing', u'unit', u'test_collections.py'),
++                         os.path.join(_top_dir, u'testing', u'unit', u'test_diffdir.py'),
++                         os.path.join(_top_dir, u'testing', u'unit', u'test_dup_temp.py'),
++                         os.path.join(_top_dir, u'testing', u'unit', u'test_dup_time.py'),
++                         os.path.join(_top_dir, u'testing', u'unit', u'test_file_naming.py'),
++                         os.path.join(_top_dir, u'testing', u'unit', u'test_globmatch.py'),
++                         os.path.join(_top_dir, u'testing', u'unit', u'test_gpg.py'),
++                         os.path.join(_top_dir, u'testing', u'unit', u'test_gpginterface.py'),
++                         os.path.join(_top_dir, u'testing', u'unit', u'test_lazy.py'),
++                         os.path.join(_top_dir, u'testing', u'unit', u'test_manifest.py'),
++                         os.path.join(_top_dir, u'testing', u'unit', u'test_patchdir.py'),
++                         os.path.join(_top_dir, u'testing', u'unit', u'test_path.py'),
++                         os.path.join(_top_dir, u'testing', u'unit', u'test_selection.py'),
++                         os.path.join(_top_dir, u'testing', u'unit', u'test_statistics.py'),
++                         os.path.join(_top_dir, u'testing', u'unit', u'test_tarfile.py'),
++                         os.path.join(_top_dir, u'testing', u'unit', u'test_tempdir.py')]
++
++
++        # Find all the .py files in the duplicity tree
++        # We cannot use glob.glob recursive until we drop support for Python < 3.5
++        matches = []
++
++        def multi_filter(names, patterns):
++            u"""Generator function which yields the names that match one or more of the patterns."""
++            for name in names:
++                if any(fnmatch.fnmatch(name, pattern) for pattern in patterns):
++                    yield name
++
++        for root, dirnames, filenames in os.walk(_top_dir):
++            for filename in fnmatch.filter(filenames, u'*.py'):
++                matches.append(os.path.join(root, filename))
++
++        excluded = multi_filter(matches, ignored_files) if ignored_files else []
++        matches = list(set(matches) - set(excluded))
++
++        for python_source_file in matches:
++            # Check each of the relevant python sources for unadorned string literals
++            unadorned_string_list = find_unadorned_strings.check_file_for_unadorned(python_source_file)
++            self.assertEqual([], unadorned_string_list,
++                             u"Found %s unadorned strings: \n %s" % (len(unadorned_string_list), unadorned_string_list))
++
++
++if __name__ == u"__main__":
      unittest.main()