Created by Aaron Whitehouse on 2018-06-08 and last modified on 2018-06-08

As set out in the Python 3 blueprint: https://blueprints.launchpad.net/duplicity/+spec/python3
one of the most time consuming, and least easy to automate, parts of supporting both Python 2 and 3 is string literals. This is because simple strings (e.g. a = "Hello") will be treated as bytes (e.g. encoded ASCII) in Python 2 and Unicode in Python 3. As we are trying to support both Python 2 and Python 3 for at least a transition period, we may end up with odd behaviour wherever we have an unadorned string.

The versions of Python 2 and 3 we are targeting means that we can "adorn" strings with letters to indicate what type of string (u for Unicode, b for Bytes and r for Raw/regexes).

An important preliminary step to Python 2/3 support is therefore for us to add these adornments to each and every string literal in the code base.

To ensure that we can find these and do not accidentally introduce more unadorned strings, this merge request adds a function to our test_code that automatically checks all .py files for unadorned strings and gives an error if any are found.

The actual work to adorn all of these strings will be substantial, so that is not all done in this merge request. Instead, this takes the approach we have for many of our other code style checks, where it currently contains a very long list of excluded files (which are not checked) and we can remove these exceptions as we adorn the strings in each file.

To assist people in finding and correcting all of the unadorned strings in a particular file, the new file testing/find_unadorned_strings.py can be executed directly with a python file as an argument:
./find_unadorned_strings python_file.py
and it will return a nicely-formatted list of all unadorned strings in the file that need to be corrected.

As the codebase is currently Python 2 only, marking strings as Bytes (b" ") essentially preserves current behaviour, but it is highly desirable to convert as many of these as possible to Unicode strings (u" "), as these will be much easier to work with as we transition to Python 3 and it will improve non-ASCII support. This will likely require changes to other parts of the code that interact with the string. The broad recommended approach for text is to decode at the boundaries (e.g. when reading from or writing to files) and use Unicode throughout internally. Many built-ins and libraries natively support Unicode, so in many cases very little needs to change to the code.

Many helper variables/functions have already been created in duplicity so that you can use Unicode wherever possible. For paths, for example, you can use Path.uname instead of Path.name.

Get this branch:
bzr branch lp:~aaron-whitehouse/duplicity/08-unadorned-strings
Only Aaron Whitehouse can upload to this branch. If you are Aaron Whitehouse please log in for upload directions.

Branch merges

Related bugs

Related blueprints

Branch information

Aaron Whitehouse

Recent revisions

1307. By Aaron Whitehouse on 2018-06-08

* Added new script to find unadorned strings (testing/find_unadorned_strings.py python_file) which prints all unadorned strings in a .py file.
* Added a new test to test_code.py that checks across all files for unadorned strings and gives an error if any are found (most files are in an ignore list at this stage, but this will allow us to incrementally remove the exceptions as we adorn the strings in each file).

1306. By Aaron Whitehouse on 2018-06-08

* Adorn string literals in test_code.py with u/b
* Add test for unadorned string literals (currently only single file)

1305. By Aaron Whitehouse on 2018-06-08

* Tox changes to accommodate new pycodestyle version warnings. Ignored W504 for now and marked as a TODO. Marked W503 as a permanent ignore, as it is prefered to the (mutually exclusive) W504 under PEP8.
* Marked various regex strings as raw strings to avoid the new W605 "invalid escape sequence".

1304. By Kenneth Loafman on 2018-05-07

* Fixed bug #x1717935 with suggestion from strainu
  - Use urllib.quote_plus() to properly quote pathnames passed via URL

1303. By Kenneth Loafman on 2018-05-07

* Fixed bug #1768954 with patch from Max Hallden
  - Add AZURE_ENDPOINT_SUFFIX environ variable to allow setting to non-U.S. servers

1302. By Kenneth Loafman on 2018-05-01

* Merged in lp:~dawgfoto/duplicity/fixup1252
  * only check decryptable remote manifests
    - fixup of revision 1252 which introduces a non-fatal error message (see #1729796)
    - for backups the GPG private key and/or it's password are typically not available
    - also avoid interactive password queries through e.g. gpg agent

1301. By Launchpad Translations on behalf of duplicity-team on 2018-02-25

Launchpad automatic translations update.

1300. By Launchpad Translations on behalf of duplicity-team on 2018-02-24

Launchpad automatic translations update.

1299. By Kenneth Loafman on 2018-02-23

* Mass update of po files from launchpad translations.

1298. By Kenneth Loafman on 2018-01-21

* Merged in lp:~dawgfoto/duplicity/fixup1251
  - Avoid redundant replication of already present backup sets.
  - Fixed by adding back BackupSet.__eq__ which was accidentally(?) removed in 1251.

Branch metadata

Branch format:
Branch format 7
Repository format:
Bazaar repository format 2a (needs bzr 1.16 or later)
Stacked on:
This branch contains Public information 
Everyone can see this information.