Merge lp:~tristan-rivoallan/vanilla-miner/vm-594599 into lp:vanilla-miner
- vm-594599
- Merge into trunk
Proposed by
Tristan Rivoallan
Status: | Merged | ||||
---|---|---|---|---|---|
Merged at revision: | 49 | ||||
Proposed branch: | lp:~tristan-rivoallan/vanilla-miner/vm-594599 | ||||
Merge into: | lp:vanilla-miner | ||||
Diff against target: |
521 lines (+279/-110) 12 files modified
apps/frontend/modules/resource/templates/_documentation/link/schema.php (+33/-25) config/doctrine/schema.yml (+14/-2) config/search.yml (+3/-1) config/solr/IndexA_fr/conf/schema.xml (+2/-0) data/utils/proxy.php (+13/-0) lib/filter/doctrine/ResourceTypeFormFilter.class.php (+0/-16) lib/form/doctrine/ResourceTypeForm.class.php (+0/-16) lib/model/doctrine/ResourceType.class.php (+0/-15) lib/model/doctrine/ResourceTypeTable.class.php (+0/-11) lib/task/minerExpandLinksTask.class.php (+178/-0) lib/task/minerExtractlinksTask.class.php (+30/-23) lib/vendor/CI/Search/Link/Segment.php (+6/-1) |
||||
To merge this branch: | bzr merge lp:~tristan-rivoallan/vanilla-miner/vm-594599 | ||||
Related bugs: |
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Tristan Rivoallan | Approve | ||
Review via email: mp+28352@code.launchpad.net |
Commit message
Description of the change
To post a comment you must log in.
Revision history for this message
Tristan Rivoallan (tristan-rivoallan) : | # |
review:
Approve
Preview Diff
[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1 | === modified file 'apps/frontend/modules/resource/templates/_documentation/link/schema.php' |
2 | --- apps/frontend/modules/resource/templates/_documentation/link/schema.php 2010-06-03 16:39:07 +0000 |
3 | +++ apps/frontend/modules/resource/templates/_documentation/link/schema.php 2010-06-23 20:53:23 +0000 |
4 | @@ -1,40 +1,32 @@ |
5 | <p>Cette collection expose les attributs suivants :</p> |
6 | |
7 | -<h4 id="schema-url">url</h4> |
8 | -<p>C'est l'URL vers la ressource. Par exemple : </p> |
9 | -<pre>http://www.glafouk.com/dlz/radioclash_astrotease.mp3</pre> |
10 | - |
11 | -<h4 id="schema-domain_fqdn">domain_fqdn</h4> |
12 | -<p>C'est le nom de domaine complet de l'URL vers la ressource. Par exemple :</p> |
13 | -<pre>data.musiques-incongrues.net</pre> |
14 | - |
15 | -<h4 id="schema-domain_parent">domain_parent</h4> |
16 | -<p>C'est le domaine parent de l'URL vers la ressource. Par exemple :</p> |
17 | -<pre>musiques-incongrues.net</pre> |
18 | -<p>Les deux URLs http://www.musiques-incongrues.net et http://data.musiques-incongrues.net ont un domaine parent identique.</p> |
19 | - |
20 | -<h4 id="schema-mime_type">mime_type</h4> |
21 | -<p>C'est le type MIME de la ressource. Cet attribut n'est pas toujours définit. Il l'est toujours pour les fichiers binaires (mp3, image, etc). Par exemple :</p> |
22 | -<pre>audio/mpeg</pre> |
23 | +<h4 id="schema-availability">availability</h4> |
24 | +<p>Ce paramètre correspond à la disponibilité du lien.</p> |
25 | +<dl> |
26 | + <dt>Valeurs possibles</dt> |
27 | + <dd><code>unknown</code> : On ne sait pas si l'URL est accessible ou non</dd> |
28 | + <dd><code>available</code> : L'URL est accessible</dd> |
29 | + <dd><code>unavailable</code> : L'URL n'est pas accessible</dd> |
30 | +</dl> |
31 | +<p>Par défaut, les liens avec une URL non accessible ne sont pas retournés.</p> |
32 | + |
33 | +<h4 id="schema-comment_id">comment_id</h4> |
34 | +<p>C'est l'identifiant du commentaire sur le forum dans lequel à été contribué le lien. Par exemple :</p> |
35 | +<pre>15336</pre> |
36 | |
37 | <h4 id="schema-contributed_at">contributed_at</h4> |
38 | <p>C'est la date à laquelle a été contribué le lien. Par exemple :</p> |
39 | <pre>2007-05-09T21:22:05Z</pre> |
40 | |
41 | +<h4 id="schema-contributor_name">contributor_name</h4> |
42 | +<p>C'est le nom sur le forum de l'utilisateur ayant contribué le lien. Par exemple :</p> |
43 | +<pre>mbertier</pre> |
44 | |
45 | <h4 id="schema-contributor_id">contributor_id</h4> |
46 | <p>C'est l'identifiant sur le forum de l'utilisateur ayant contribué le lien. Par exemple :</p> |
47 | <pre>34</pre> |
48 | <p>Les URL pour accéder au profil d'un utilisateur sur Musiques Incongrues ont la forme http://www.musiques-incongrues.net/forum/account/<strong>contributor_id</strong>/</p> |
49 | |
50 | -<h4 id="schema-contributor_name">contributor_name</h4> |
51 | -<p>C'est le nom sur le forum de l'utilisateur ayant contribué le lien. Par exemple :</p> |
52 | -<pre>mbertier</pre> |
53 | - |
54 | -<h4 id="schema-comment_id">comment_id</h4> |
55 | -<p>C'est l'identifiant du commentaire sur le forum dans lequel à été contribué le lien. Par exemple :</p> |
56 | -<pre>15336</pre> |
57 | - |
58 | <h4 id="schema-discussion_id">discussion_id</h4> |
59 | <p>C'est l'identifiant de la discussion dans laquelle a été contribué le lien. Par exemple :</p> |
60 | <pre>5455</pre> |
61 | @@ -42,4 +34,20 @@ |
62 | |
63 | <h4 id="schema-discussion_name">discussion_name</h4> |
64 | <p>C'est le titre de la discussion dans laquelle a été contribué le lien. Par exemple :</p> |
65 | -<pre>Des clips, des clips, rien que des clips</pre> |
66 | \ No newline at end of file |
67 | +<pre>Des clips, des clips, rien que des clips</pre> |
68 | +<h4 id="schema-domain_fqdn">domain_fqdn</h4> |
69 | +<p>C'est le nom de domaine complet de l'URL vers la ressource. Par exemple :</p> |
70 | +<pre>data.musiques-incongrues.net</pre> |
71 | + |
72 | +<h4 id="schema-domain_parent">domain_parent</h4> |
73 | +<p>C'est le domaine parent de l'URL vers la ressource. Par exemple :</p> |
74 | +<pre>musiques-incongrues.net</pre> |
75 | +<p>Les deux URLs http://www.musiques-incongrues.net et http://data.musiques-incongrues.net ont un domaine parent identique.</p> |
76 | + |
77 | +<h4 id="schema-mime_type">mime_type</h4> |
78 | +<p>C'est le type MIME de la ressource. Cet attribut n'est pas toujours définit. Il l'est toujours pour les fichiers binaires (mp3, image, etc). Par exemple :</p> |
79 | +<pre>audio/mpeg</pre> |
80 | + |
81 | +<h4 id="schema-url">url</h4> |
82 | +<p>C'est l'URL vers la ressource. Par exemple : </p> |
83 | +<pre>http://www.glafouk.com/dlz/radioclash_astrotease.mp3</pre> |
84 | \ No newline at end of file |
85 | |
86 | === modified file 'config/doctrine/schema.yml' |
87 | --- config/doctrine/schema.yml 2010-06-15 13:22:18 +0000 |
88 | +++ config/doctrine/schema.yml 2010-06-23 20:53:23 +0000 |
89 | @@ -31,9 +31,21 @@ |
90 | type: integer |
91 | discussion_name: |
92 | type: string |
93 | + # available, unavailable, unknown |
94 | + availability: |
95 | + type: string |
96 | + default: 'unknown' |
97 | + expanded_at: |
98 | + type: timestamp |
99 | indexes: |
100 | - url_index: |
101 | + idx_url: |
102 | fields: |
103 | url: |
104 | length: 512 |
105 | - type: unique |
106 | \ No newline at end of file |
107 | + type: unique |
108 | + idx_expanded_at: |
109 | + fields: [expanded_at] |
110 | + idx_availability: |
111 | + fields: |
112 | + availability: |
113 | + length: 11 |
114 | \ No newline at end of file |
115 | |
116 | === modified file 'config/search.yml' |
117 | --- config/search.yml 2010-06-02 16:53:29 +0000 |
118 | +++ config/search.yml 2010-06-23 20:53:23 +0000 |
119 | @@ -26,7 +26,9 @@ |
120 | type: int |
121 | discussion_name: |
122 | stored: true |
123 | - |
124 | + availability: |
125 | + stored: true |
126 | + |
127 | index: |
128 | encoding: UTF-8 |
129 | cultures: [fr] |
130 | |
131 | === modified file 'config/solr/IndexA_fr/conf/schema.xml' |
132 | --- config/solr/IndexA_fr/conf/schema.xml 2010-06-03 13:14:51 +0000 |
133 | +++ config/solr/IndexA_fr/conf/schema.xml 2010-06-23 20:53:23 +0000 |
134 | @@ -258,6 +258,7 @@ |
135 | <field name='comment_id' type='int' stored='true' multiValued='false' required='false' /> |
136 | <field name='discussion_id' type='int' stored='true' multiValued='false' required='false' /> |
137 | <field name='discussion_name' type='text' stored='true' multiValued='false' required='false' /> |
138 | + <field name='availability' type='text' stored='true' multiValued='false' required='false' /> |
139 | </fields> |
140 | |
141 | <!-- field to use to determine and enforce document uniqueness. --> |
142 | @@ -283,5 +284,6 @@ |
143 | <copyField source='comment_id' dest='sfl_all' /> |
144 | <copyField source='discussion_id' dest='sfl_all' /> |
145 | <copyField source='discussion_name' dest='sfl_all' /> |
146 | + <copyField source='availability' dest='sfl_all' /> |
147 | |
148 | </schema> |
149 | |
150 | === added directory 'data/utils' |
151 | === added file 'data/utils/proxy.php' |
152 | --- data/utils/proxy.php 1970-01-01 00:00:00 +0000 |
153 | +++ data/utils/proxy.php 2010-06-23 20:53:23 +0000 |
154 | @@ -0,0 +1,13 @@ |
155 | +<?php |
156 | +// Sanity checks |
157 | +if (is_null($_SERVER['PATH_INFO'])) |
158 | +{ |
159 | + throw new InvalidArgumentException('Please specify path info.'); |
160 | +} |
161 | + |
162 | +// Call service |
163 | +$curl = curl_init(sprintf('http://data.musiques-incongrues.net/%s?%s', $_SERVER['PATH_INFO'], $_SERVER['QUERY_STRING'])); |
164 | +curl_exec($curl); |
165 | + |
166 | +// Clean up |
167 | +curl_close($curl); |
168 | \ No newline at end of file |
169 | |
170 | === removed file 'lib/filter/doctrine/ResourceTypeFormFilter.class.php' |
171 | --- lib/filter/doctrine/ResourceTypeFormFilter.class.php 2010-06-02 10:18:41 +0000 |
172 | +++ lib/filter/doctrine/ResourceTypeFormFilter.class.php 1970-01-01 00:00:00 +0000 |
173 | @@ -1,16 +0,0 @@ |
174 | -<?php |
175 | - |
176 | -/** |
177 | - * ResourceType filter form. |
178 | - * |
179 | - * @package vanilla-miner |
180 | - * @subpackage filter |
181 | - * @author Your name here |
182 | - * @version SVN: $Id: sfDoctrineFormFilterTemplate.php 23810 2009-11-12 11:07:44Z Kris.Wallsmith $ |
183 | - */ |
184 | -class ResourceTypeFormFilter extends BaseResourceTypeFormFilter |
185 | -{ |
186 | - public function configure() |
187 | - { |
188 | - } |
189 | -} |
190 | |
191 | === removed file 'lib/form/doctrine/ResourceTypeForm.class.php' |
192 | --- lib/form/doctrine/ResourceTypeForm.class.php 2010-06-02 10:18:41 +0000 |
193 | +++ lib/form/doctrine/ResourceTypeForm.class.php 1970-01-01 00:00:00 +0000 |
194 | @@ -1,16 +0,0 @@ |
195 | -<?php |
196 | - |
197 | -/** |
198 | - * ResourceType form. |
199 | - * |
200 | - * @package vanilla-miner |
201 | - * @subpackage form |
202 | - * @author Your name here |
203 | - * @version SVN: $Id: sfDoctrineFormTemplate.php 23810 2009-11-12 11:07:44Z Kris.Wallsmith $ |
204 | - */ |
205 | -class ResourceTypeForm extends BaseResourceTypeForm |
206 | -{ |
207 | - public function configure() |
208 | - { |
209 | - } |
210 | -} |
211 | |
212 | === removed file 'lib/model/doctrine/ResourceType.class.php' |
213 | --- lib/model/doctrine/ResourceType.class.php 2010-06-02 10:18:41 +0000 |
214 | +++ lib/model/doctrine/ResourceType.class.php 1970-01-01 00:00:00 +0000 |
215 | @@ -1,15 +0,0 @@ |
216 | -<?php |
217 | - |
218 | -/** |
219 | - * ResourceType |
220 | - * |
221 | - * This class has been auto-generated by the Doctrine ORM Framework |
222 | - * |
223 | - * @package vanilla-miner |
224 | - * @subpackage model |
225 | - * @author Your name here |
226 | - * @version SVN: $Id: Builder.php 7490 2010-03-29 19:53:27Z jwage $ |
227 | - */ |
228 | -class ResourceType extends BaseResourceType |
229 | -{ |
230 | -} |
231 | |
232 | === removed file 'lib/model/doctrine/ResourceTypeTable.class.php' |
233 | --- lib/model/doctrine/ResourceTypeTable.class.php 2010-06-02 10:18:41 +0000 |
234 | +++ lib/model/doctrine/ResourceTypeTable.class.php 1970-01-01 00:00:00 +0000 |
235 | @@ -1,11 +0,0 @@ |
236 | -<?php |
237 | - |
238 | - |
239 | -class ResourceTypeTable extends Doctrine_Table |
240 | -{ |
241 | - |
242 | - public static function getInstance() |
243 | - { |
244 | - return Doctrine_Core::getTable('ResourceType'); |
245 | - } |
246 | -} |
247 | \ No newline at end of file |
248 | |
249 | === added file 'lib/task/minerExpandLinksTask.class.php' |
250 | --- lib/task/minerExpandLinksTask.class.php 1970-01-01 00:00:00 +0000 |
251 | +++ lib/task/minerExpandLinksTask.class.php 2010-06-23 20:53:23 +0000 |
252 | @@ -0,0 +1,178 @@ |
253 | +<?php |
254 | + |
255 | +class minerExpandLinksTask extends sfBaseTask |
256 | +{ |
257 | + protected function configure() |
258 | + { |
259 | + $this->addOptions(array( |
260 | + new sfCommandOption('env', null, sfCommandOption::PARAMETER_REQUIRED, 'The environment', 'dev'), |
261 | + new sfCommandOption('connection', null, sfCommandOption::PARAMETER_REQUIRED, 'The connection name', 'doctrine'), |
262 | + new sfCommandOption('progress', null, sfCommandOption::PARAMETER_NONE, 'Display a progress bar'), |
263 | + new sfCommandOption('verbose', null, sfCommandOption::PARAMETER_NONE, 'Display more informations about extraction process'), |
264 | + new sfCommandOption('all', null, sfCommandOption::PARAMETER_NONE, 'Expand all links in database. By default, only new links are expanded'), |
265 | + new sfCommandOption('with-unavailable', null, sfCommandOption::PARAMETER_NONE, 'When expanding all links (--all), also include links previously marked as unavailable'), |
266 | + // TODO : add --older-than option |
267 | + )); |
268 | + |
269 | + $this->namespace = 'miner'; |
270 | + $this->name = 'expand-links'; |
271 | + // TODO : write descriptions |
272 | + $this->briefDescription = 'Expands informations about links by crawling their URLs'; |
273 | + $this->detailedDescription = <<<EOF |
274 | + |
275 | +Use cases : |
276 | + * Expand new urls : [php symfony miner:expand-links|INFO] |
277 | + * Expand all urls (a word about --with-unavailable) : [php symfony miner:expand-links --all|INFO] |
278 | + * Expand all urls, including those previously marked as unavailable : [php symfony miner:expand-links --all --with-unavailable|INFO] |
279 | +EOF; |
280 | + } |
281 | + |
282 | + protected function execute($arguments = array(), $options = array()) |
283 | + { |
284 | + // Open database connection |
285 | + $databaseManager = new sfDatabaseManager($this->configuration); |
286 | + $connection = $databaseManager->getDatabase($options['connection'])->getConnection(); |
287 | + |
288 | + // Build query for fetching links from database |
289 | + $q = Doctrine_Query::create() |
290 | + ->select('l.url') |
291 | + ->from('Link l'); |
292 | + if (!$options['all']) |
293 | + { |
294 | + $q->where('l.expanded_at is null'); |
295 | + } |
296 | + if (!$options['with-unavailable']) |
297 | + { |
298 | + $q->andWhere('l.availability != "unavailable"'); |
299 | + } |
300 | + |
301 | + // Fetch links from database |
302 | + $links_count = $q->count(); |
303 | + $links = $q->execute(null, Doctrine_Core::HYDRATE_ON_DEMAND); |
304 | + $q->free(); |
305 | + $this->logSection('info', sprintf('Expanding %s links', $links_count)); |
306 | + |
307 | + // Instanciate progress bar, if user requested so |
308 | + $links_expanded = 0; |
309 | + if ($options['progress']) |
310 | + { |
311 | + include 'Console/ProgressBar.php'; |
312 | + $progress_bar = new Console_ProgressBar( |
313 | + '** Links %fraction% comments [%bar%] %percent% | ', |
314 | + '=>', '-', 80, $links_count, array('ansi_terminal' => true) |
315 | + ); |
316 | + $progress_bar->update($links_expanded); |
317 | + } |
318 | + |
319 | + // Launch a HEAD request on each link, and use data in response headers to update informations about link in database |
320 | + // TODO : move crawling code to dedicated class. and then create miner:crawl-url task |
321 | + require 'HTTP/Request2.php'; |
322 | + $request = new HTTP_Request2(null, HTTP_Request2::METHOD_HEAD, array('follow_redirects' => true)); |
323 | + $request->setHeader('user-agent', 'vanilla-miner/1.1 (https://launchpad.net/vanilla-miner)'); |
324 | + |
325 | + foreach ($links as $link) |
326 | + { |
327 | + $link->expanded_at = time(); |
328 | + try |
329 | + { |
330 | + $request->setUrl($link->url); |
331 | + $response = $request->send(); |
332 | + if (200 == $response->getStatus()) |
333 | + { |
334 | + if ($options['progress']) |
335 | + { |
336 | + $this->log(sprintf('[%d] %s', $response->getStatus(), $link->url)); |
337 | + } |
338 | + else |
339 | + { |
340 | + $this->logSection('info', sprintf('[%d] %s - Updating metadata, marking as available', $response->getStatus(), $link->url)); |
341 | + } |
342 | + |
343 | + // Extract meaningful informations from server response |
344 | + $header = $response->getHeader(); |
345 | + $header = $this->normalizeHeader($header); |
346 | + $link->mime_type = $this->getMimeType($header); |
347 | + |
348 | + // Mark link as available |
349 | + $link->availability = 'available'; |
350 | + |
351 | + // Save link to database |
352 | + $link->replace(); |
353 | + } |
354 | + else |
355 | + { |
356 | + if ($options['progress']) |
357 | + { |
358 | + $this->log(sprintf('[%d] %s', $response->getStatus(), $link->url)); |
359 | + } |
360 | + else |
361 | + { |
362 | + $this->logSection('notice', sprintf( |
363 | + '[%d] %s (%d %s) - Marking as unavailable', |
364 | + $response->getStatus(), |
365 | + $link->url, |
366 | + $response->getStatus(), |
367 | + $response->getReasonPhrase() |
368 | + ) |
369 | + ); |
370 | + } |
371 | + $link->availability = 'unavailable'; |
372 | + $link->replace(); |
373 | + } |
374 | + } |
375 | + catch (HTTP_Request2_Exception $e) |
376 | + { |
377 | + if ($options['progress']) |
378 | + { |
379 | + $this->log(sprintf('[ERR] %s', $link->url)); |
380 | + } |
381 | + else |
382 | + { |
383 | + $this->logSection('error', sprintf('[ERR] Received exception with message "%s" for link "%s" - Marking as unavailable.', $e->getMessage(), $link->url)); |
384 | + } |
385 | + $link->availability = 'unavailable'; |
386 | + $link->replace(); |
387 | + } |
388 | + |
389 | + // Update progress bar |
390 | + if ($options['progress']) |
391 | + { |
392 | + $progress_bar->update(++$links_expanded); |
393 | + } |
394 | + |
395 | + } |
396 | + } |
397 | + |
398 | + private function normalizeHeader(array $header) |
399 | + { |
400 | + // Make all header names lower case |
401 | + $header_rev = array_flip($header); |
402 | + array_walk($header_rev, create_function('&$item, $key', 'strtolower($item);')); |
403 | + $header = array_flip($header_rev); |
404 | + |
405 | + return $header; |
406 | + } |
407 | + |
408 | + private function getMimeType(array $header) |
409 | + { |
410 | + $mime_type = null; |
411 | + |
412 | + if (isset($header['content-type'])) |
413 | + { |
414 | + $mime_type = $header['content-type']; |
415 | + |
416 | + // Extract mime type from content-type header |
417 | + // TODO : use a regular expression instead of this crappy flow |
418 | + $matches = array(); |
419 | + if (strpos($header['content-type'], 'charset') !== false) |
420 | + { |
421 | + if (preg_match('/(.+); ?charset=.+/i', $header['content-type'], $matches)) |
422 | + { |
423 | + $mime_type = $matches[1]; |
424 | + } |
425 | + } |
426 | + } |
427 | + |
428 | + return $mime_type; |
429 | + } |
430 | +} |
431 | \ No newline at end of file |
432 | |
433 | === modified file 'lib/task/minerExtractlinksTask.class.php' |
434 | --- lib/task/minerExtractlinksTask.class.php 2010-06-23 13:13:30 +0000 |
435 | +++ lib/task/minerExtractlinksTask.class.php 2010-06-23 20:53:23 +0000 |
436 | @@ -63,31 +63,38 @@ |
437 | $resources_parsed = 0; |
438 | $resources_total = $extractor->countResources($arguments['dsn']); |
439 | |
440 | - // Instanciate an configure progress bar |
441 | - if ($options['progress']) |
442 | - { |
443 | - include 'Console/ProgressBar.php'; |
444 | - $progress_bar = new Console_ProgressBar( |
445 | - '** '.$arguments['dsn'].' %fraction% resources [%bar%] %percent% | ', |
446 | - '=>', '-', 80, $resources_total, array('ansi_terminal' => true) |
447 | - ); |
448 | - } |
449 | - |
450 | - // Extract resources from source and insert them in Links database |
451 | - while ($resource_extraction_info = $extractor->extract($arguments['dsn'], $options['connection'])) |
452 | - { |
453 | - // Update extraction statistics |
454 | - $urls_found_count += $resource_extraction_info['urls_found_count']; |
455 | - |
456 | - // Update progress bar |
457 | + if ($resources_total > 0) |
458 | + { |
459 | + // Instanciate an configure progress bar |
460 | if ($options['progress']) |
461 | { |
462 | - $progress_bar->update($resource_extraction_info['resources_parsed_count']); |
463 | - } |
464 | - } |
465 | - |
466 | - // Log |
467 | - $this->logSection('extract', sprintf('%d URLs where extracted from %d resources', $urls_found_count, $resources_total)); |
468 | + include 'Console/ProgressBar.php'; |
469 | + $progress_bar = new Console_ProgressBar( |
470 | + '** '.$arguments['dsn'].' %fraction% resources [%bar%] %percent% | ', |
471 | + '=>', '-', 80, $resources_total, array('ansi_terminal' => true) |
472 | + ); |
473 | + } |
474 | + |
475 | + // Extract resources from source and insert them in Links database |
476 | + while ($resource_extraction_info = $extractor->extract($arguments['dsn'], $options['connection'])) |
477 | + { |
478 | + // Update extraction statistics |
479 | + $urls_found_count += $resource_extraction_info['urls_found_count']; |
480 | + |
481 | + // Update progress bar |
482 | + if ($options['progress']) |
483 | + { |
484 | + $progress_bar->update($resource_extraction_info['resources_parsed_count']); |
485 | + } |
486 | + } |
487 | + |
488 | + // Log |
489 | + $this->logSection('extract', sprintf('%d URLs where extracted from %d resources', $urls_found_count, $resources_total)); |
490 | + } |
491 | + else |
492 | + { |
493 | + $this->logSection('extract', 'No resources to extract. Exiting.'); |
494 | + } |
495 | } |
496 | |
497 | /** |
498 | |
499 | === modified file 'lib/vendor/CI/Search/Link/Segment.php' |
500 | --- lib/vendor/CI/Search/Link/Segment.php 2010-06-15 13:21:38 +0000 |
501 | +++ lib/vendor/CI/Search/Link/Segment.php 2010-06-23 20:53:23 +0000 |
502 | @@ -59,6 +59,12 @@ |
503 | } |
504 | $c->setLimit($limit); |
505 | |
506 | + // default : return links with availability being marked as "available" or "unknown" |
507 | + if ($parameters->get('availability', null) === null) |
508 | + { |
509 | + $c->addField('-availability', 'unavailable'); |
510 | + } |
511 | + |
512 | // Define sorting |
513 | $sorting_direction = $parameters->get('sort_direction', 'asc'); |
514 | if ($sorting_direction == 'desc') |
515 | @@ -108,5 +114,4 @@ |
516 | |
517 | return array_keys($schema_fields); |
518 | } |
519 | - |
520 | } |
521 | \ No newline at end of file |