OpenLP

Merge lp:~thelinuxguy/openlp/fix-newline-bug into lp:openlp

fix-newline-bug
Merge into trunk

Proposed by Simon Hanna on 2018-04-17

Status:

Merged

Merged at revision:

2817

Proposed branch:

lp:~thelinuxguy/openlp/fix-newline-bug

Merge into:

lp:openlp

Diff against target:

80 lines (+33/-9)

2 files modified

openlp/core/common/__init__.py (+7/-7)
tests/functional/openlp_core/common/test_common.py (+26/-2)

To merge this branch:

bzr merge lp:~thelinuxguy/openlp/fix-newline-bug

Low

Fix Committed

Link a bug report

Reviewer	Date Requested	Status
Phill	2018-04-17	Approve on 2018-04-18
Tim Bentley	2018-04-17	Approve on 2018-04-17
Review via email: mp+343465@code.launchpad.net

This proposal supersedes a proposal from 2018-04-17.

Description of the change

Fix the fix for 1727517

Revision history for this message

Tim Bentley (trb143) wrote on 2018-04-14: Posted in a previous version of this proposal

One minor fix as I do not like the rename of the field

review: Needs Fixing

Revision history for this message

Phill (phill-ridout) wrote on 2018-04-15: Posted in a previous version of this proposal

See in line. One question and a few (more) minor fixes please.

review: Needs Fixing

Revision history for this message

Simon Hanna (thelinuxguy) wrote on 2018-04-15: Posted in a previous version of this proposal

will update shortly

Revision history for this message

Phill (phill-ridout) wrote on 2018-04-16: Posted in a previous version of this proposal

I've done some research in to this. Officially the only code points allowed are:

"
[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */
" ( https://www.w3.org/TR/REC-xml/#charsets)

So going on this our regex should be:

re.compile(r'[\x00-\x08\x0B\x0C\x0E-\x1F\x7F-\x9F]')

we probably should filter out the other code points too.

Its worth noting that REPLACMENT_CHARS_MAP replaces the vertical tab and form feed chars with "\n\n" before the CONTROL_CHARS regex filters them out!

So in summary please replace the regex with the one in the inline comment. (Note: its not tested)

review: Needs Fixing

Revision history for this message

Simon Hanna (thelinuxguy) wrote on 2018-04-16: Posted in a previous version of this proposal

I wonder what the reason behind replacing vertial tabs with new lines is...

Looking at where the regex is being used I'm inclined to say the whole thing needs to be reworked.

It's also used in the filename cleaning where linefeeds are strange, but then again I don't know what the input always is.

I'm changing it to what you request, but you should really take a serious look at it. From what I see, you are the last one that touched the invalid file regex, I highly doubt this is what you actually want to happen. If it is, it is a mystery to me...

Revision history for this message

Phill (phill-ridout) wrote on 2018-04-16: Posted in a previous version of this proposal

One small typo, and a couple comments that need changing.

review: Needs Fixing

Revision history for this message

Phill (phill-ridout) wrote on 2018-04-16: Posted in a previous version of this proposal

> One small typo, and a couple comments that need changing.

Sorry!

Revision history for this message

Tim Bentley (trb143) wrote on 2018-04-17: Posted in a previous version of this proposal

See comments inline

lp:~thelinuxguy/openlp/fix-newline-bug (revision 2822)
https://ci.openlp.io/job/Branch-01-Pull/2502/ [SUCCESS]
https://ci.openlp.io/job/Branch-02a-Linux-Tests/2403/ [SUCCESS]
https://ci.openlp.io/job/Branch-02b-macOS-Tests/189/ [FAILURE]
https://ci.openlp.io/job/Branch-03a-Build-Source/101/ [SUCCESS]
https://ci.openlp.io/job/Branch-03b-Build-macOS/94/ [SUCCESS]
https://ci.openlp.io/job/Branch-04a-Code-Analysis/1563/ [SUCCESS]
https://ci.openlp.io/job/Branch-04b-Test-Coverage/1376/ [SUCCESS]

review: Needs Fixing

Revision history for this message

Tim Bentley (trb143) wrote on 2018-04-17:

Looks good to me.

review: Approve

Revision history for this message

Phill (phill-ridout) wrote on 2018-04-18:

Good, thanks for you patience

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to status/vote changes:

Daniel Borges

Samuel Sjöbergsson

 === modified file 'openlp/core/common/__init__.py'
 --- openlp/core/common/__init__.py	2018-02-24 16:10:02 +0000
 +++ openlp/core/common/__init__.py	2018-04-17 19:37:10 +0000
@@ -44,7 +44,7 @@
  FIRST_CAMEL_REGEX = re.compile('(.)([A-Z][a-z]+)')
  SECOND_CAMEL_REGEX = re.compile('([a-z0-9])([A-Z])')
--CONTROL_CHARS = re.compile(r'[\x00-\x1F\x7F-\x9F]')
++CONTROL_CHARS = re.compile(r'[\x00-\x08\x0B\x0C\x0E-\x1F\x7F-\x9F]')
  INVALID_FILE_CHARS = re.compile(r'[\\/:\*\?"<>\|\+\[\]%]')
  IMAGES_FILTER = None
  REPLACMENT_CHARS_MAP = str.maketrans({'\u2018': '\'', '\u2019': '\'', '\u201c': '"', '\u201d': '"', '\u2026': '...',
@@ -471,15 +471,15 @@
          log.exception('Error detecting file encoding')
--def normalize_str(irreg_str):
++def normalize_str(irregular_string):
      """
      Normalize the supplied string. Remove unicode control chars and tidy up white space.
--    :param str irreg_str: The string to normalize.
++    :param str irregular_string: The string to normalize.
      :return: The normalized string
      :rtype: str
      """
--    irreg_str = irreg_str.translate(REPLACMENT_CHARS_MAP)
--    irreg_str = CONTROL_CHARS.sub('', irreg_str)
--    irreg_str = NEW_LINE_REGEX.sub('\n', irreg_str)
--    return WHITESPACE_REGEX.sub(' ', irreg_str)
++    irregular_string = irregular_string.translate(REPLACMENT_CHARS_MAP)
++    irregular_string = CONTROL_CHARS.sub('', irregular_string)
++    irregular_string = NEW_LINE_REGEX.sub('\n', irregular_string)
++    return WHITESPACE_REGEX.sub(' ', irregular_string)
 === modified file 'tests/functional/openlp_core/common/test_common.py'
 --- tests/functional/openlp_core/common/test_common.py	2017-12-29 09:15:48 +0000
 +++ tests/functional/openlp_core/common/test_common.py	2018-04-17 19:37:10 +0000
@@ -25,8 +25,8 @@
  from unittest import TestCase
  from unittest.mock import MagicMock, call, patch
--from openlp.core.common import clean_button_text, de_hump, extension_loader, is_macosx, is_linux, is_win, \
--    path_to_module, trace_error_handler
++from openlp.core.common import clean_button_text, de_hump, extension_loader, is_macosx, is_linux, \
++    is_win, normalize_str, path_to_module, trace_error_handler
  from openlp.core.common.path import Path
@@ -211,6 +211,30 @@
              assert is_win() is False, 'is_win() should return False'
              assert is_macosx() is False, 'is_macosx() should return False'
++    def test_normalize_str_leaves_newlines(self):
++        # GIVEN: a string containing newlines
++        string = 'something\nelse'
++        # WHEN: normalize is called
++        normalized_string = normalize_str(string)
++        # THEN: string is unchanged
++        assert normalized_string == string
++
++    def test_normalize_str_removes_null_byte(self):
++        # GIVEN: a string containing a null byte
++        string = 'somet\x00hing'
++        # WHEN: normalize is called
++        normalized_string = normalize_str(string)
++        # THEN: nullbyte is removed
++        assert normalized_string == 'something'
++
++    def test_normalize_str_replaces_crlf_with_lf(self):
++        # GIVEN: a string containing crlf
++        string = 'something\r\nelse'
++        # WHEN: normalize is called
++        normalized_string = normalize_str(string)
++        # THEN: crlf is replaced with lf
++        assert normalized_string == 'something\nelse'
++
      def test_clean_button_text(self):
          """
          Test the clean_button_text() function.

OpenLP

Merge lp:~thelinuxguy/openlp/fix-newline-bug into lp:openlp

Commit message

Description of the change

Preview Diff

Subscribers