Merge lp:~thelinuxguy/openlp/fix-newline-bug into lp:openlp

Proposed by Simon Hanna
Status: Merged
Merged at revision: 2817
Proposed branch: lp:~thelinuxguy/openlp/fix-newline-bug
Merge into: lp:openlp
Diff against target: 80 lines (+33/-9)
2 files modified
openlp/core/common/__init__.py (+7/-7)
tests/functional/openlp_core/common/test_common.py (+26/-2)
To merge this branch: bzr merge lp:~thelinuxguy/openlp/fix-newline-bug
Reviewer Review Type Date Requested Status
Phill Approve
Tim Bentley Approve
Review via email: mp+343465@code.launchpad.net

This proposal supersedes a proposal from 2018-04-17.

Description of the change

Fix the fix for 1727517

To post a comment you must log in.
Revision history for this message
Tim Bentley (trb143) wrote : Posted in a previous version of this proposal

One minor fix as I do not like the rename of the field

review: Needs Fixing
Revision history for this message
Phill (phill-ridout) wrote : Posted in a previous version of this proposal

See in line. One question and a few (more) minor fixes please.

review: Needs Fixing
Revision history for this message
Simon Hanna (thelinuxguy) wrote : Posted in a previous version of this proposal

will update shortly

Revision history for this message
Phill (phill-ridout) wrote : Posted in a previous version of this proposal

I've done some research in to this. Officially the only code points allowed are:

"
[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */
" ( https://www.w3.org/TR/REC-xml/#charsets)

So going on this our regex should be:

re.compile(r'[\x00-\x08\x0B\x0C\x0E-\x1F\x7F-\x9F]')

we probably should filter out the other code points too.

Its worth noting that REPLACMENT_CHARS_MAP replaces the vertical tab and form feed chars with "\n\n" before the CONTROL_CHARS regex filters them out!

So in summary please replace the regex with the one in the inline comment. (Note: its not tested)

review: Needs Fixing
Revision history for this message
Simon Hanna (thelinuxguy) wrote : Posted in a previous version of this proposal

I wonder what the reason behind replacing vertial tabs with new lines is...

Looking at where the regex is being used I'm inclined to say the whole thing needs to be reworked.

It's also used in the filename cleaning where linefeeds are strange, but then again I don't know what the input always is.

I'm changing it to what you request, but you should really take a serious look at it. From what I see, you are the last one that touched the invalid file regex, I highly doubt this is what you actually want to happen. If it is, it is a mystery to me...

Revision history for this message
Phill (phill-ridout) wrote : Posted in a previous version of this proposal

One small typo, and a couple comments that need changing.

review: Needs Fixing
Revision history for this message
Phill (phill-ridout) wrote : Posted in a previous version of this proposal

> One small typo, and a couple comments that need changing.

Sorry!

Revision history for this message
Tim Bentley (trb143) wrote : Posted in a previous version of this proposal
review: Needs Fixing
Revision history for this message
Tim Bentley (trb143) wrote :

Looks good to me.

review: Approve
Revision history for this message
Phill (phill-ridout) wrote :

Good, thanks for you patience

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'openlp/core/common/__init__.py'
2--- openlp/core/common/__init__.py 2018-02-24 16:10:02 +0000
3+++ openlp/core/common/__init__.py 2018-04-17 19:37:10 +0000
4@@ -44,7 +44,7 @@
5
6 FIRST_CAMEL_REGEX = re.compile('(.)([A-Z][a-z]+)')
7 SECOND_CAMEL_REGEX = re.compile('([a-z0-9])([A-Z])')
8-CONTROL_CHARS = re.compile(r'[\x00-\x1F\x7F-\x9F]')
9+CONTROL_CHARS = re.compile(r'[\x00-\x08\x0B\x0C\x0E-\x1F\x7F-\x9F]')
10 INVALID_FILE_CHARS = re.compile(r'[\\/:\*\?"<>\|\+\[\]%]')
11 IMAGES_FILTER = None
12 REPLACMENT_CHARS_MAP = str.maketrans({'\u2018': '\'', '\u2019': '\'', '\u201c': '"', '\u201d': '"', '\u2026': '...',
13@@ -471,15 +471,15 @@
14 log.exception('Error detecting file encoding')
15
16
17-def normalize_str(irreg_str):
18+def normalize_str(irregular_string):
19 """
20 Normalize the supplied string. Remove unicode control chars and tidy up white space.
21
22- :param str irreg_str: The string to normalize.
23+ :param str irregular_string: The string to normalize.
24 :return: The normalized string
25 :rtype: str
26 """
27- irreg_str = irreg_str.translate(REPLACMENT_CHARS_MAP)
28- irreg_str = CONTROL_CHARS.sub('', irreg_str)
29- irreg_str = NEW_LINE_REGEX.sub('\n', irreg_str)
30- return WHITESPACE_REGEX.sub(' ', irreg_str)
31+ irregular_string = irregular_string.translate(REPLACMENT_CHARS_MAP)
32+ irregular_string = CONTROL_CHARS.sub('', irregular_string)
33+ irregular_string = NEW_LINE_REGEX.sub('\n', irregular_string)
34+ return WHITESPACE_REGEX.sub(' ', irregular_string)
35
36=== modified file 'tests/functional/openlp_core/common/test_common.py'
37--- tests/functional/openlp_core/common/test_common.py 2017-12-29 09:15:48 +0000
38+++ tests/functional/openlp_core/common/test_common.py 2018-04-17 19:37:10 +0000
39@@ -25,8 +25,8 @@
40 from unittest import TestCase
41 from unittest.mock import MagicMock, call, patch
42
43-from openlp.core.common import clean_button_text, de_hump, extension_loader, is_macosx, is_linux, is_win, \
44- path_to_module, trace_error_handler
45+from openlp.core.common import clean_button_text, de_hump, extension_loader, is_macosx, is_linux, \
46+ is_win, normalize_str, path_to_module, trace_error_handler
47 from openlp.core.common.path import Path
48
49
50@@ -211,6 +211,30 @@
51 assert is_win() is False, 'is_win() should return False'
52 assert is_macosx() is False, 'is_macosx() should return False'
53
54+ def test_normalize_str_leaves_newlines(self):
55+ # GIVEN: a string containing newlines
56+ string = 'something\nelse'
57+ # WHEN: normalize is called
58+ normalized_string = normalize_str(string)
59+ # THEN: string is unchanged
60+ assert normalized_string == string
61+
62+ def test_normalize_str_removes_null_byte(self):
63+ # GIVEN: a string containing a null byte
64+ string = 'somet\x00hing'
65+ # WHEN: normalize is called
66+ normalized_string = normalize_str(string)
67+ # THEN: nullbyte is removed
68+ assert normalized_string == 'something'
69+
70+ def test_normalize_str_replaces_crlf_with_lf(self):
71+ # GIVEN: a string containing crlf
72+ string = 'something\r\nelse'
73+ # WHEN: normalize is called
74+ normalized_string = normalize_str(string)
75+ # THEN: crlf is replaced with lf
76+ assert normalized_string == 'something\nelse'
77+
78 def test_clean_button_text(self):
79 """
80 Test the clean_button_text() function.