Merge lp:~pmarchwiak/synapse-project/recoll-plugin into lp:synapse-project

Proposed by Patrick Marchwiak
Status: Needs review
Proposed branch: lp:~pmarchwiak/synapse-project/recoll-plugin
Merge into: lp:synapse-project
Diff against target: 217 lines (+191/-0)
3 files modified
src/plugins/Makefile.am (+1/-0)
src/plugins/recoll-plugin.vala (+189/-0)
src/ui/synapse-main.vala (+1/-0)
To merge this branch: bzr merge lp:~pmarchwiak/synapse-project/recoll-plugin
Reviewer Review Type Date Requested Status
Michal Hruby Pending
Review via email: mp+133784@code.launchpad.net

Description of the change

This plugin runs queries against a Recoll index providing search results based on file contents (document types supported by Recoll include msword and PDF).

Feature summary for the current implementation:
* returns up to 20 UriMatches, using the filename as the title and filename + abstract (sample of doc contents) as the description
* results are scored lower than results from the locate plugin to ensure file name matches show up first in the results
* plugin is only enabled when the recoll command is available

To post a comment you must log in.
Revision history for this message
Michal Hruby (mhr3) wrote :

Thanks for the contribution, it's looking pretty good, although I do have one gripe:

I'm not a fan of spawning processes in the search() method of an ItemProvider plugin - this is called on every keystroke and is pretty heavy on the system. That's why "Locate" is hidden behind an action, and before calculating with `bc` the query is first checked against a regex. Could you please do something like that?

Otherwise, the code is clean, nice job ;)

Revision history for this message
Patrick Marchwiak (pmarchwiak) wrote :

I understand your concern.

I feel there's more motivation for the action approach with "Locate" since it is quite common for users to have a large enough number of files that it would take too long to run. I don't see how checking a regex would help in the search index situation, since anything a user types is a valid search. In my (limited) testing on a more recent i5 laptop as well as an older core duo laptop, results have been quick to return with this plugin. However, I do appreciate that spawning a process could be taxing on other systems. The other drawback of hiding it behind an action is that it would require additional keystrokes to get to the results, which makes it less useful to me.

I'm not sure when I'll find the free time, but I will consider making hiding it behind an action.

Revision history for this message
Ari (ari-lp) wrote :

Hi Patrick,

first of all thank you very much for this plugin. I have been searching for a way to integrate Recoll into Synapse for a while now and I am very glad I found your code. I am not a coder myself but I just wanted to say that I agree with Michal's remarks. If you're working with a large index - say thousands of documents - searching can be quite resource-intensive, both in terms of CPU cycles and hard disk activity. Engaging a search while the user is still typing is counter-productive in these use cases. That's why I would love to see this plugin implemented as an action. It would be fantastic if you could modify it in that regard.

Thank you for taking the time to read this.

Cheers
--Ari

Unmerged revisions

511. By Patrick Marchwiak

Set the description to file name + abstract in recoll plugin

510. By Patrick Marchwiak

Turn down match score and use Utils.FileInfo to build better results in recoll plugin

509. By Patrick Marchwiak

Remove python script for accessing recoll and invoke cmd line tool directly

508. By Patrick Marchwiak

Add initial version of recoll plugin

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'src/plugins/Makefile.am'
--- src/plugins/Makefile.am 2012-03-18 19:11:23 +0000
+++ src/plugins/Makefile.am 2012-11-10 07:41:20 +0000
@@ -48,6 +48,7 @@
48 opensearch.vala \48 opensearch.vala \
49 pastebin-plugin.vala \49 pastebin-plugin.vala \
50 pidgin-plugin.vala \50 pidgin-plugin.vala \
51 recoll-plugin.vala \
51 rhythmbox-plugin.vala \52 rhythmbox-plugin.vala \
52 selection-plugin.vala \53 selection-plugin.vala \
53 test-slow-plugin.vala \54 test-slow-plugin.vala \
5455
=== added file 'src/plugins/recoll-plugin.vala'
--- src/plugins/recoll-plugin.vala 1970-01-01 00:00:00 +0000
+++ src/plugins/recoll-plugin.vala 2012-11-10 07:41:20 +0000
@@ -0,0 +1,189 @@
1 /*
2 *
3 * Authored by Patrick Marchwiak <pd@marchwiak.com>
4 *
5 */
6
7 namespace Synapse
8 {
9
10 // Sends query to Recoll command line tool.
11 public class RecollPlugin : Object, Activatable, ItemProvider
12 {
13 // a mandatory property
14 public bool enabled { get; set; default = true; }
15
16 // this method is called when a plugin is enabled
17 // use it to initialize your plugin
18 public void activate ()
19 {
20 }
21
22 // this method is called when a plugin is disabled
23 // use it to free the resources you're using
24 public void deactivate ()
25 {
26 }
27
28 // register your plugin in the UI
29 static void register_plugin ()
30 {
31 DataSink.PluginRegistry.get_default ().register_plugin (
32 typeof (RecollPlugin),
33 _ ("Recoll"), // plugin title
34 _ ("Returns results of full text search against an existing Recoll index."), // description
35 "recoll", // icon name
36 register_plugin, // reference to this function
37 Environment.find_program_in_path ("recoll") != null, // true if user's system has all required components which the plugin needs
38 _ ("recoll is not installed") // error message
39 );
40 }
41
42 static construct
43 {
44 // register the plugin when the class is constructed
45 register_plugin ();
46 }
47
48 // an optional method to improve the speed of searches,
49 // if you return false here, the search method won't be called
50 // for this query
51 public bool handles_query (Query query)
52 {
53 return (QueryFlags.FILES in query.query_type);
54 }
55
56 enum LineType {
57 FIELDS,
58 ABSTRACT_START,
59 ABSTRACT,
60 ABSTRACT_END;
61 }
62
63 public async ResultSet? search (Query query) throws SearchError
64 {
65 Pid pid;
66 int read_fd, write_fd;
67 string[] argv = {"recoll",
68 "-t", // command line mode
69 "-n", // indices of results
70 "0-20", // return first 20 results
71 "-a", // ALL TERMS mode
72 "-A", // output abstracts
73 query.query_string};
74
75 try
76 {
77 Process.spawn_async_with_pipes (null, argv, null,
78 SpawnFlags.SEARCH_PATH,
79 null, out pid, out write_fd, out read_fd);
80 UnixInputStream read_stream = new UnixInputStream (read_fd, true);
81 DataInputStream recoll_output = new DataInputStream (read_stream);
82
83 // Sample output from `recoll -t` :
84 // ===============================
85 // Recoll query: ((kernel:(wqf=11) OR kernels OR kernelize OR kernelized))
86 // 4725 results (printing 1 max):
87 // text/plain [file:///home/patrick/code/sample-results.txt] [sample-results.txt] 8806 bytes
88 // ABSTRACT
89 // some text summarizing the document usually has the keyword (kernel)
90 // /ABSTRACT
91
92 string line = null;
93 var next_line_type = LineType.FIELDS;
94 ResultSet results = new ResultSet ();
95 Utils.FileInfo result = null;
96 int line_idx = 0;
97 string description = null;
98 while ((line = yield recoll_output.read_line_async (Priority.DEFAULT)) != null)
99 {
100 if (line_idx >= 2) // skip first two lines
101 {
102 if (next_line_type == LineType.FIELDS)
103 {
104 string[] fields = line.split("\t");
105
106 //string mimetype = fields[0];
107
108 string uri = fields[1];
109 uri = uri.substring(1, uri.length - 2);
110
111 description = uri.split("://")[1];
112
113 // FIXME: recoll already gives us the mimetype so FileInfo
114 // is doing extra work obtaining it from the file
115 result = new Utils.FileInfo (uri, typeof (MatchObject));
116
117 yield result.initialize ();
118
119 next_line_type = LineType.ABSTRACT_START;
120 }
121 else if (next_line_type == LineType.ABSTRACT_START)
122 {
123 // TODO check for the start of the abstract
124 next_line_type = LineType.ABSTRACT;
125 }
126 else if (next_line_type == LineType.ABSTRACT)
127 {
128 line = line.chug().chomp();
129 if (line != null && line != "")
130 {
131 description = description + ": " + line;
132 }
133 next_line_type = LineType.ABSTRACT_END;
134 }
135 else if (next_line_type == LineType.ABSTRACT_END)
136 {
137 // TODO check for the end of the abstract
138
139 // TODO use relevancy rating to set match score
140
141 // score defaults to just under that of results from the locate plugin
142 result.match_obj.description = description;
143 results.add(result.match_obj, Match.Score.INCREMENT_MINOR * 2);
144 next_line_type = LineType.FIELDS;
145 }
146 else
147 {
148 // TODO handle unexpected output ?
149 }
150 }
151 line_idx++;
152 }
153
154 return results;
155 }
156 catch (Error err)
157 {
158 if (!query.is_cancelled ()) warning ("%s", err.message);
159 }
160
161 // make sure this method is called before returning any results
162 query.check_cancellable ();
163 return null;
164 }
165
166 private class MatchObject: Object, Match, UriMatch
167 {
168 // from Match interface
169 public string title { get; construct set; }
170 public string description { get; set; }
171 public string icon_name { get; construct set; }
172 public bool has_thumbnail { get; construct set; }
173 public string thumbnail_path { get; construct set; }
174 public MatchType match_type { get; construct set; }
175
176 // from UriMatch interface
177 public string uri { get; set; }
178 public QueryFlags file_type { get; set; }
179 public string mime_type { get; set; }
180
181 public int default_relevancy { get; set; default = 0; }
182
183 public MatchObject ()
184 {
185 Object (match_type: MatchType.GENERIC_URI);
186 }
187 }
188 }
189}
0190
=== modified file 'src/ui/synapse-main.vala'
--- src/ui/synapse-main.vala 2012-03-18 16:03:54 +0000
+++ src/ui/synapse-main.vala 2012-11-10 07:41:20 +0000
@@ -186,6 +186,7 @@
186#if HAVE_LIBREST186#if HAVE_LIBREST
187 typeof (ImgUrPlugin),187 typeof (ImgUrPlugin),
188#endif188#endif
189 typeof (RecollPlugin),
189 // action-only plugins190 // action-only plugins
190 typeof (DevhelpPlugin),191 typeof (DevhelpPlugin),
191 typeof (OpenSearchPlugin),192 typeof (OpenSearchPlugin),