Merge lp:~thelinuxguy/openlp/fix-newline-bug into lp:openlp

Proposed by Simon Hanna
Status: Superseded
Proposed branch: lp:~thelinuxguy/openlp/fix-newline-bug
Merge into: lp:openlp
Diff against target: 80 lines (+33/-9)
2 files modified
openlp/core/common/__init__.py (+7/-7)
tests/functional/openlp_core/common/test_common.py (+26/-2)
To merge this branch: bzr merge lp:~thelinuxguy/openlp/fix-newline-bug
Reviewer Review Type Date Requested Status
Phill Needs Fixing
Tim Bentley Pending
Review via email: mp+343306@code.launchpad.net

This proposal supersedes a proposal from 2018-04-15.

This proposal has been superseded by a proposal from 2018-04-17.

Description of the change

Fix the fix for 1727517

To post a comment you must log in.
Revision history for this message
Tim Bentley (trb143) wrote : Posted in a previous version of this proposal

One minor fix as I do not like the rename of the field

review: Needs Fixing
Revision history for this message
Phill (phill-ridout) wrote : Posted in a previous version of this proposal

See in line. One question and a few (more) minor fixes please.

review: Needs Fixing
Revision history for this message
Simon Hanna (thelinuxguy) wrote : Posted in a previous version of this proposal

will update shortly

Revision history for this message
Phill (phill-ridout) wrote : Posted in a previous version of this proposal

I've done some research in to this. Officially the only code points allowed are:

"
[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */
" ( https://www.w3.org/TR/REC-xml/#charsets)

So going on this our regex should be:

re.compile(r'[\x00-\x08\x0B\x0C\x0E-\x1F\x7F-\x9F]')

we probably should filter out the other code points too.

Its worth noting that REPLACMENT_CHARS_MAP replaces the vertical tab and form feed chars with "\n\n" before the CONTROL_CHARS regex filters them out!

So in summary please replace the regex with the one in the inline comment. (Note: its not tested)

review: Needs Fixing
Revision history for this message
Simon Hanna (thelinuxguy) wrote : Posted in a previous version of this proposal

I wonder what the reason behind replacing vertial tabs with new lines is...

Looking at where the regex is being used I'm inclined to say the whole thing needs to be reworked.

It's also used in the filename cleaning where linefeeds are strange, but then again I don't know what the input always is.

I'm changing it to what you request, but you should really take a serious look at it. From what I see, you are the last one that touched the invalid file regex, I highly doubt this is what you actually want to happen. If it is, it is a mystery to me...

Revision history for this message
Phill (phill-ridout) wrote :

One small typo, and a couple comments that need changing.

review: Needs Fixing
Revision history for this message
Phill (phill-ridout) wrote :

> One small typo, and a couple comments that need changing.

Sorry!

2821. By Simon Hanna

Fix typo

2822. By Simon Hanna

Fix comment

2823. By Simon Hanna

Change parameter name

Unmerged revisions

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'openlp/core/common/__init__.py'
--- openlp/core/common/__init__.py 2018-02-24 16:10:02 +0000
+++ openlp/core/common/__init__.py 2018-04-17 10:27:58 +0000
@@ -44,7 +44,7 @@
4444
45FIRST_CAMEL_REGEX = re.compile('(.)([A-Z][a-z]+)')45FIRST_CAMEL_REGEX = re.compile('(.)([A-Z][a-z]+)')
46SECOND_CAMEL_REGEX = re.compile('([a-z0-9])([A-Z])')46SECOND_CAMEL_REGEX = re.compile('([a-z0-9])([A-Z])')
47CONTROL_CHARS = re.compile(r'[\x00-\x1F\x7F-\x9F]')47CONTROL_CHARS = re.compile(r'[\x00-\x08\x0B\x0C\x0E-\x1F\x7F-\x9F]')
48INVALID_FILE_CHARS = re.compile(r'[\\/:\*\?"<>\|\+\[\]%]')48INVALID_FILE_CHARS = re.compile(r'[\\/:\*\?"<>\|\+\[\]%]')
49IMAGES_FILTER = None49IMAGES_FILTER = None
50REPLACMENT_CHARS_MAP = str.maketrans({'\u2018': '\'', '\u2019': '\'', '\u201c': '"', '\u201d': '"', '\u2026': '...',50REPLACMENT_CHARS_MAP = str.maketrans({'\u2018': '\'', '\u2019': '\'', '\u201c': '"', '\u201d': '"', '\u2026': '...',
@@ -471,15 +471,15 @@
471 log.exception('Error detecting file encoding')471 log.exception('Error detecting file encoding')
472472
473473
474def normalize_str(irreg_str):474def normalize_str(string):
475 """475 """
476 Normalize the supplied string. Remove unicode control chars and tidy up white space.476 Normalize the supplied string. Remove unicode control chars and tidy up white space.
477477
478 :param str irreg_str: The string to normalize.478 :param str string: The string to normalize.
479 :return: The normalized string479 :return: The normalized string
480 :rtype: str480 :rtype: str
481 """481 """
482 irreg_str = irreg_str.translate(REPLACMENT_CHARS_MAP)482 string = string.translate(REPLACMENT_CHARS_MAP)
483 irreg_str = CONTROL_CHARS.sub('', irreg_str)483 string = CONTROL_CHARS.sub('', string)
484 irreg_str = NEW_LINE_REGEX.sub('\n', irreg_str)484 string = NEW_LINE_REGEX.sub('\n', string)
485 return WHITESPACE_REGEX.sub(' ', irreg_str)485 return WHITESPACE_REGEX.sub(' ', string)
486486
=== modified file 'tests/functional/openlp_core/common/test_common.py'
--- tests/functional/openlp_core/common/test_common.py 2017-12-29 09:15:48 +0000
+++ tests/functional/openlp_core/common/test_common.py 2018-04-17 10:27:58 +0000
@@ -25,8 +25,8 @@
25from unittest import TestCase25from unittest import TestCase
26from unittest.mock import MagicMock, call, patch26from unittest.mock import MagicMock, call, patch
2727
28from openlp.core.common import clean_button_text, de_hump, extension_loader, is_macosx, is_linux, is_win, \28from openlp.core.common import clean_button_text, de_hump, extension_loader, is_macosx, is_linux, \
29 path_to_module, trace_error_handler29 is_win, normalize_str, path_to_module, trace_error_handler
30from openlp.core.common.path import Path30from openlp.core.common.path import Path
3131
3232
@@ -211,6 +211,30 @@
211 assert is_win() is False, 'is_win() should return False'211 assert is_win() is False, 'is_win() should return False'
212 assert is_macosx() is False, 'is_macosx() should return False'212 assert is_macosx() is False, 'is_macosx() should return False'
213213
214 def test_normalize_str_leaves_newlines(self):
215 # GIVEN: a string containing newlines
216 string = 'something\nelse'
217 # WHEN: normalize is called
218 normalized_string = normalize_str(string)
219 # THEN: string is unchanged
220 assert normalized_string == string
221
222 def test_normalize_str_removes_null_byte(self):
223 # GIVEN: a string containing a null byte
224 string = 'somet\x00hing'
225 # WHEN: normalize is called
226 normalized_string = normalize_str(string)
227 # THEN: nullbyte is removed
228 assert normalized_string == 'something'
229
230 def test_normalize_str_replaces_crlf_with_lf(self):
231 # GIVEN: a string containing crlf
232 string = 'something\r\nelse'
233 # WHEN: normalize is called
234 normalized_string = normalize_str(string)
235 # THEN: crlf is replaced with lf
236 assert normalized_string == 'something\nelse'
237
214 def test_clean_button_text(self):238 def test_clean_button_text(self):
215 """239 """
216 Test the clean_button_text() function.240 Test the clean_button_text() function.