Merge lp:~inspirated/launchpad/implement-Bug-findAttachments into lp:launchpad
Status: | Rejected |
---|---|
Rejected by: | Robert Collins |
Proposed branch: | lp:~inspirated/launchpad/implement-Bug-findAttachments |
Merge into: | lp:launchpad |
Diff against target: |
234 lines (+175/-0) 7 files modified
lib/lp/bugs/configure.zcml (+1/-0) lib/lp/bugs/interfaces/bug.py (+10/-0) lib/lp/bugs/model/bug.py (+61/-0) lib/lp/bugs/tests/test_bug_find_attachment.py (+87/-0) lib/lp/bugs/tests/testfiles/sample-attachment-lorem.txt (+4/-0) lib/lp/bugs/tests/testfiles/sample-attachment-repeatchars.txt (+7/-0) lib/lp/bugs/tests/testfiles/sample-attachment-unicode.txt (+5/-0) |
To merge this branch: | bzr merge lp:~inspirated/launchpad/implement-Bug-findAttachments |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Robert Collins (community) | Abstain | ||
Graham Binns (community) | code | Disapprove | |
Björn Tillenius | Pending | ||
Review via email: mp+27786@code.launchpad.net |
This proposal supersedes a proposal from 2010-06-15.
Description of the change
Summary:
As part of my Google Summer of Code project, I had to implement attachment searching functionality in Arsenal. The end-product would allow user to specify a search string which would be searched in all the bug attachments for a project.
Doing this efficiently required two modifications in Launchpad:
* Exposing a bug collection for a particular product
* Implement a attachment search method for a particular bug (this branch)
Proposed fix:
Export a read operation findAttachments in IBug which returns a collection of IBugAttachment.
Pre-implementation Notes:
After a lengthy discussion on #launchpad-reviews with gmb, stub and BjornT, it was concluded that a naive string search was not suitable for inclusion into Launchpad. Improvements suggested were to:
* Add an attachment size limit
* Read the attachments in chunks
These improvements have been taken into account by adding a file size limit and using Boyer-Moore-
Implementation Details:
* Attachments larger than 8 mb cannot be searched
* Boyer–Moore–
* Zope configuration had to be updated to export the method
* Multi-line searches are supported by an "is_encoded" parameter. If the client wants to include newline characters in <text>, it can encode them using text.encode(
* Attachments are searched according to default character encoding. The use of locale.
Tests:
Tests are included for plain text as well as multi-line and unicode searches:
bin/test -vvc -m lp.bugs.
Demo and Q/A:
Open any Launchpad bug in a browser:
https:/
Create an attachment and upload any text file containing the string ‘char buf’.
Create a Launchpad instance:
>>> from launchpadlib.
>>> launchpad = Launchpad.
The authorization page:
(https:/
should be opening in your browser. After you have authorized
this program to access Launchpad on your behalf you should come
back here and press <Enter> to finish the authentication process.
Load the bug:
>>> bug = launchpad.bugs[15]
Search for the attachment containing the string ‘char buf’:
>>> results = bug.findAttachm
>>> for attachment in results:
... print attachment.title
...
Buffer Overflow Intro.txt
lint:
The changes are lint clean.
Links:
[1] http://
Unmerged revisions
- 11021. By Kamran Riaz Khan
-
Minor changes to improve unittest readability.
- 11020. By Kamran Riaz Khan
-
Removed urllib2 usage in Bug.findAttachm
ents() by opening attachments
directly. - 11019. By Kamran Riaz Khan
-
Minor changes to improve code readability.
- 11018. By Kamran Riaz Khan
-
Use encode(
'unicode_ escape' ) instead of urllib.quote() for multiline
searches - 11017. By Kamran Riaz Khan
-
Changes in Bug.findAttachm
ents():
* Use Boyer-Moore-Horspool algorithm to search attachments in chunks
* Fixed unicode and multiline unittests - 11016. By Kamran Riaz Khan
-
Added support for multi-line searches in Bug.findAttachm
ents() using
is_encoded parameter - 11015. By Kamran Riaz Khan
-
Fixed Bug.findAttachm
ents() to use preferred encoding instead of UTF-8 - 11014. By Kamran Riaz Khan
-
Changes in Bug.findAttachm
ents():
* Limited the size of attachments that can be searched to 8 MB
* Modified searching to go-through attachments line-wise - 11013. By Kamran Riaz Khan
-
Fixed unicode searches in Bug.findAttachm
ents() - 11012. By Kamran Riaz Khan
-
Added unit tests for Bug.findAttachm
ents() method
After a lengthy discussion in #launchpad-reviews we've agreed that there should be a different solution to this problem.