Merge lp:~adamreichold/qpdfview/extended-text-selection into lp:qpdfview
Status: | Rejected | ||||
---|---|---|---|---|---|
Rejected by: | Adam Reichold | ||||
Proposed branch: | lp:~adamreichold/qpdfview/extended-text-selection | ||||
Merge into: | lp:qpdfview | ||||
Diff against target: |
331 lines (+138/-46) 6 files modified
sources/documentview.cpp (+2/-2) sources/model.h (+10/-1) sources/pageitem.cpp (+39/-2) sources/pageitem.h (+3/-0) sources/pdfmodel.cpp (+83/-40) sources/pdfmodel.h (+1/-1) |
||||
To merge this branch: | bzr merge lp:~adamreichold/qpdfview/extended-text-selection | ||||
Related bugs: |
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Benjamin Eltzner | Pending | ||
Razi Alavizadeh | Pending | ||
Review via email: mp+264322@code.launchpad.net |
Description of the change
This branch extends the model to allow for extended text selections, i.e. an arbitrary boundary and the text contained within it, and implements this within the PDF model. It also modifies the PageItem so that it makes use of this selection if available, preferring it over the previous method of text extraction and also highlighting the boundary during the selection process.
The extended text extraction itself uses the same Poppler API as the cached text extraction and relies on the fact that the text boxes are provided in "approximately reading order" by Poppler already. Hence this might be much more complicated to implement in the other backends, e.g. DjVuLibre, but therefore it should stay optional IMHO.
Some of the points that need to be discussed IMHO are:
* Do we want to always use it if made available by the model, or maybe extend the copy-to-clipboard pop-up menu with additional options to explicitly select the method of text extraction instead?
* Is the performance sufficient or do we need to devise a more incremental computation? It feels alright on my machine but I did come to wrong conclusions based on this in the past. This also affects whether this should be put behind a configuration setting or into a separate rubber-band-mode.
* This currently works on a per-text-box, i.e. approximately per-word, level and could be extended to work on a per-character level but I suspect with significant overhead for either tracking character runs or already aggregating runs into a boundary and contained text.
* Should the boundary be processed further to provide more connected highlighting and if so how, e.g. computing (an approximation of) the convex hull?
Unmerged revisions
- 1936. By Adam Reichold
-
Some minor cosmetic clean-ups to PDF text extraction and selection.
- 1935. By Adam Reichold
-
Since PdfPage::text is now a fallback interface we can remove the separate Page::cachedText entry point which all other plugins forwarded to Page::text anyway.
- 1934. By Adam Reichold
-
At least simplify the boundary and text of extended PDF selections.
- 1933. By Adam Reichold
-
Use proper text selection if made available by the model.
- 1932. By Adam Reichold
-
Extend model to allow for proper text selections
Also note that this does not make any use of tagged PDF features and hence its reliability w.r.t. complicated layouts is probably questionable...