Merge ~pkopylov/qa-regression-testing:testlib_assert_file_type into qa-regression-testing:master

Proposed by Pavel Kopylov
Status: Merged
Merged at revision: 503c71149c028e840958d702aad5271b2f643018
Proposed branch: ~pkopylov/qa-regression-testing:testlib_assert_file_type
Merge into: qa-regression-testing:master
Diff against target: 39 lines (+13/-5)
1 file modified
scripts/testlib.py (+13/-5)
Reviewer Review Type Date Requested Status
Steve Beattie Approve
Alex Murray Needs Fixing
Review via email: mp+434935@code.launchpad.net

Commit message

scripts/testlib.py: the function assertFileType() was improved in order to handle files that have two or more recognized file-types correctly

Description of the change

When I was running test-bzip2.py on ubuntu 16.04 I met the situation when the /usr/bin/file utility detected an unexpected file type for some file during the test. Actually, that compressed file had two types, but the mentioned utility output only the first one. However, the expected type was the second known type, so it was not printed at all.

Fortunately, the /usr/bin/file utility can output all recognized file's types, not only the first.

This commit changes the utility's command line adding two options: -k and -r, as a result, all recognized types will be output and if even one of them matches to expected then an assertion will be true.

To post a comment you must log in.
Revision history for this message
Alex Murray (alexmurray) wrote :

Since the output of file will now include multiple lines, would it be better to explicitly iterate over each line and test them separately to find the match? That would make the code a bit easier to understand IMO rather than trying to include a newline as part of the regex to match.

Revision history for this message
Pavel Kopylov (pkopylov) wrote :

I refactored the regexp scanning to do it over splitted lines.

Please, pay attention that the second and all the next types of some files are out by the file utility starting with '- ' combination, that is why the regexp has become more complicated.

Revision history for this message
Alex Murray (alexmurray) wrote :

Thanks - this is looking better - just a couple minor cosmetic changes and I think it should be good to go.

review: Needs Fixing
Revision history for this message
Steve Beattie (sbeattie) wrote :

Hey Pavel, thanks for this. One issue I hit is that our team still supports Ubuntu 14.04 LTS packages as part of the Ubuntu ESM product, and on 14.04, the file command does not support the -r option. I added the following commit to address the issue:

  https://git.launchpad.net/qa-regression-testing/commit/?id=9938b4d6cf7c4be8e585e6296ff90d9d4370367f

as part of the merged branch.

For qa-r-t maintainers, the test-librsvg.py test checks against svg (aka XML) files, and are a good validation case for this.

Ultimately, the more robust solution for this might be to convert the file invocation to pass --mime, though that is not without variances in output across versions of file and requires changing all of the tests expected output. (Python 3's magic module is also a possibility, but s only available as the python-magic package in Python 2, and the 14.04/16.04 era version of that module has a drastically different api.)

review: Approve
Revision history for this message
Marc Deslauriers (mdeslaur) wrote :

FYI, this caused regressions in a bunch of tests as when file is run with -b, it will add extra line breaks in the output. For example:

Netpbm image data, size = 3684 x 2760, rawbits, pixmap

turns into:

Netpbm image data, size = 3684 x 2760
- , rawbits, pixmap
- data

And:

TIFF image data, little-endian, direntries=21, height=1380, bps=16, compression=none, PhotometricIntepretation=BlackIsZero, description= , manufacturer=Canon, model=PowerShot S95, width=1842

turns into:

TIFF image data, little-endian, direntries=21, height=1380, bps=16, compression=none, PhotometricInterpretation=BlackIsZero, description= , manufacturer=Canon, model=PowerShot S95
- , width=1842
- data

I have hidden the multiple file types functionality behind an option for now until a better solution can be found.

Revision history for this message
Pavel Kopylov (pkopylov) wrote (last edit ):

Dear Marc, it is bad news about regression.

However, I couldn't reproduce the problem. So could you please write an OS version you met the problem?

I guess that the new changes to the test 'test-libraw.py' revealed the problem on Ubuntu 23.04. Unfortunately, I don't have this Ubuntu version available to install on the VM. Could you please send here an output of the file utility for both 'file -b' and 'file -bkr' variants with suspicious files?

PS: Sorry for the long pause, I was on vacation last month.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/scripts/testlib.py b/scripts/testlib.py
2index b18a74c..f30427e 100644
3--- a/scripts/testlib.py
4+++ b/scripts/testlib.py
5@@ -1121,7 +1121,10 @@ class TestlibCase(unittest.TestCase):
6 def assertFileType(self, filename, filetype, strict=True):
7 '''Checks the file type of the file specified'''
8
9- (rc, report, out) = self._testlib_shell_cmd(["/usr/bin/file", "-b", filename])
10+ # Try to see all recognized types of file (thanks to -k option), not only the first.
11+ # File can have several detected types, it will be OK if one of those types
12+ # equals to an expected filetype
13+ (rc, report, out) = self._testlib_shell_cmd(["/usr/bin/file", "-bkr", filename])
14 out = out.strip()
15 expected = 0
16 # Absolutely no idea why this happens on Hardy
17@@ -1130,13 +1133,18 @@ class TestlibCase(unittest.TestCase):
18 result = 'Got exit code %d, expected %d:\n%s\n' % (rc, expected, report)
19 self.assertEqual(expected, rc, result)
20
21+ found = False
22 if strict:
23- filetype = '^%s$' % (filetype)
24+ # some line extracted from the out should match to the filetype exactly
25+ filetype = '(^|- )%s$' % (filetype)
26 else:
27- # accept if the beginning of the line matches
28- filetype = '^%s' % (filetype)
29+ # just a start portion of some line extracted from the out matches to the filetype
30+ filetype = '(^|- )%s' % (filetype)
31+ for some_line in out.splitlines(True):
32+ if re.search(filetype, some_line) is not None:
33+ found = True
34 result = 'File type reported by file: [%s], expected regex: [%s]\n' % (out, filetype)
35- self.assertNotEqual(None, re.search(filetype, out), result)
36+ self.assertNotEqual(False, found, result)
37
38 def yank_commonname_from_cert(self, certfile):
39 '''Extract the commonName from a given PEM'''

Subscribers

People subscribed via source and target branches