Merge lp:~igraph/igraph/0.5-sf into lp:igraph/0.5-main

Proposed by Gábor Csárdi
Status: Merged
Approved by: Gábor Csárdi
Approved revision: 1415
Merge reported by: Gábor Csárdi
Merged at revision: not available
Proposed branch: lp:~igraph/igraph/0.5-sf
Merge into: lp:igraph/0.5-main
Diff against target: None lines
To merge this branch: bzr merge lp:~igraph/igraph/0.5-sf
Reviewer Review Type Date Requested Status
Gábor Csárdi Approve
Review via email: mp+4286@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Gábor Csárdi (gabor.csardi) :
review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== added file 'doc/homepage/bugs.html.in'
2--- doc/homepage/bugs.html.in 1970-01-01 00:00:00 +0000
3+++ doc/homepage/bugs.html.in 2009-03-07 23:41:52 +0000
4@@ -0,0 +1,18 @@
5+<h2>Bug reports, questions, comments</h2>
6+
7+<h3>Reporting bugs at Launchpad</h3>
8+
9+<p>We suggest to report igraph bugs on the Launchpad website by
10+ clicking on the first button in the menu on the bottom of the page. This
11+ allows you to follow up on your bug report, e.g. get an email
12+ notification when it is fixed. Please check the
13+ <a href="https://bugs.launchpad.net/igraph/+bugs">list of bugs</a>
14+ before clicking on the button below to report a new bug.
15+</p>
16+
17+<h3>Sending bug reports in email</h3>
18+
19+If you prefer not to use Launchpad, you can report bugs on
20+the <code>igraph-help</code> mailing list. You can find more
21+information on <code>igraph-help</code>
22+<a href="support.html">here.</a>
23
24=== modified file 'doc/homepage/documentation.html.in'
25--- doc/homepage/documentation.html.in 2008-02-14 10:57:48 +0000
26+++ doc/homepage/documentation.html.in 2009-03-01 19:20:34 +0000
27@@ -84,9 +84,9 @@
28 <div>The original XML DocBook format of the
29 manual can be obtained by downloading the latest igraph
30 development version from
31- <a href="http://cneurocvs.rmki.kfki.hu/arch/csardi@rmki.kfki.hu--2004-public"
32- onClick="javascript:urchinTracker ('/archrepo'); ">
33- our repository</a>.</div>
34+ <a href="https://launchpad.net/igraph"
35+ onClick="javascript:urchinTracker ('/launchpad'); ">
36+ Launchpad</a>.</div>
37 </div>
38 </li>
39
40@@ -101,4 +101,16 @@
41
42 </li>
43
44+<li class="download download-wiki">
45+ <span class="name">igraph wiki at wikidot (external site)</span>
46+ <span class="comment">&mdash; a collaborative effort of the community</span>
47+ <ul class="download-links">
48+ <li class="download-external">
49+ <a href="http://igraph.wikidot.com">Browse online</a>
50+ </li>
51+ </ul>
52+
53+</li>
54+
55+
56 </ul>
57
58=== modified file 'doc/homepage/download.html.in'
59--- doc/homepage/download.html.in 2008-06-26 12:49:44 +0000
60+++ doc/homepage/download.html.in 2009-03-01 19:20:34 +0000
61@@ -13,19 +13,19 @@
62 please try this before downloading from here.</span><br/>
63 </li>
64 <li class="download-windows">
65- <a href="download/igraph_${VERSION}.zip"
66+ <a href="http://switch.dl.sourceforge.net/sourceforge/igraph/igraph_${VERSION}.zip"
67 onClick="javascript:urchinTracker ('/downloads/rwindowsbinary'); ">
68 Windows binary</a>
69 </li>
70 <li class="download-osx">
71- <a href="download/igraph_${VERSION}.tgz"
72+ <a href="http://switch.dl.sourceforge.net/sourceforge/igraph/igraph_${VERSION}.tgz"
73 onClick="javascript:urchinTracker ('/downloads/rosxbinary'); ">
74 Mac OSX universal binary</a>
75 </li>
76 <li class="download-source">
77- <a href="download/igraph_${VERSION}.tar.gz"
78+ <a href="http://switch.dl.sourceforge.net/sourceforge/igraph/igraph_${VERSION}.tar.gz"
79 onClick="javascript:urchinTracker ('/downloads/rsource'); ">
80- Source package (for Linux and similar)</a><br/>
81+ Source package (for Linux and similar)</a>
82 </li>
83 </ul>
84 </li>
85@@ -58,7 +58,7 @@
86 <li class="download-external">
87 <a href="http://python.org/pypi/python-igraph"
88 onClick="javascript:urchinTracker ('/outgoing/pythoncheeseshop'); ">
89- Python Package Index page</a>
90+ Python Package Index page</a>
91 </li>
92 </ul>
93 </li>
94@@ -74,7 +74,7 @@
95 <li class="download-external">
96 <a href="http://rubyforge.org/projects/igraph"
97 onClick="javascript:urchinTracker ('/outgoing/rubyforge'); ">
98- External homepage</a>
99+ External homepage</a>
100 </li>
101 </ul>
102 </li>
103@@ -86,10 +86,24 @@
104 <ul class="download-links">
105 <li class="download-source">
106 <a onClick="javascript:urchinTracker ('/downloads/library'); "
107- href="download/igraph-${VERSION}.tar.gz">Source code</a>
108- </li>
109- </ul>
110-</li>
111+ href="http://switch.dl.sourceforge.net/sourceforge/igraph/igraph-${VERSION}.tar.gz">Source code</a>
112+ </li>
113+ </ul>
114+</li>
115+
116+<li class="download download-sf">
117+ <span class="name">Browse all igraph releases</span>
118+ <span class="comment">&mdash; All file releases at
119+ SourceForge</span>
120+ <ul class="download-links">
121+ <li class="download-external">
122+ <a onClick="javascript:urchinTracker ('/downloads/sourceforge'); "
123+ href="https://sourceforge.net/project/showfiles.php?group_id=160289">Go to SourceForge</a>
124+ </li>
125+ </ul>
126+</li>
127+
128+
129
130 </ul>
131
132@@ -121,10 +135,6 @@
133 <a name="dev"></a><h3>Obtaining the development version</h3>
134
135 <p>You can get the latest development version of igraph (including the
136-C core and the wrappers for R and Python) with the <code>tla</code> arch
137-revision control system from the following URL:
138- <a href="http://cneurocvs.rmki.kfki.hu/arch/csardi@rmki.kfki.hu--2004-public"
139- onClick="javascript:urchinTracker ('/archrepo'); ">
140-<a href="http://cneurocvs.rmki.kfki.hu/arch/csardi@rmki.kfki.hu--2004-public">
141-<code>http://cneurocvs.rmki.kfki.hu/arch/csardi@rmki.kfki.hu--2004-public</code>.</a>
142+C core and the wrappers for R and Python)
143+ from <a href="https://launchpad.net/igraph">Launchpad</a>.
144 </p>
145
146=== added file 'doc/homepage/features.html.in'
147--- doc/homepage/features.html.in 1970-01-01 00:00:00 +0000
148+++ doc/homepage/features.html.in 2009-03-08 09:52:02 +0000
149@@ -0,0 +1,36 @@
150+<h2>Features</h2>
151+
152+<p>Here are some features of <b>igraph</b>, a subset of the functionality,
153+see <i>The igraph reference manual</i> under <a
154+href="documentation.html">Documentation</a> for more details.</p>
155+
156+<ul>
157+<li><b>igraph</b> contains functions for generating regular and random graphs
158+according to many algorithms and models from the network theory
159+literature.</li>
160+<li><b>igraph</b> provides routines for manipulating graphs, adding and removing
161+ edges and vertices.</li>
162+<li>You can assign numeric or textual attribute to the vertices or edges
163+ of the graph, like edge weights or textual vertex ids.</li>
164+<li>A rich set of functions calculating various structural properties,
165+ eg. betweenness, PageRank, k-cores, network motifs,
166+ etc. are also included.</li>
167+<li>Force based layout generators for small and large graphs</li>
168+<li>The R package and the Python module can visualize graphs many ways,
169+ in 2D and 3D, interactively or non-interactively.</li>
170+<li><b>igraph</b> provides data types for implementing your own algorithm in
171+ C, R, Python or Ruby.</li>
172+<li>Community structure detection algorithms using many recently developed
173+ heuristics.</li>
174+<li><b>igraph</b> can read and write many file formats, e.g., GraphML, GML or Pajek.
175+ </li>
176+<li><b>igraph</b> contains efficient functions for deciding graph isomorphism and
177+ subgraph isomorphism</li>
178+<li>It also contains an implementation of the push/relabel algorithm
179+ for calculating maximum network flow, and this way minimum cuts,
180+ vertex and edge connectivity.</li>
181+<li><b>igraph</b> is well documented both for users and developers.</li>
182+<li><b>igraph</b> is open source and distributed under GNU GPL.</li>
183+</ul>
184+
185+<div class="back"><a href="index.html">&laquo; Back to main page</a></div>
186
187=== modified file 'doc/homepage/generate.py'
188--- doc/homepage/generate.py 2007-11-01 15:20:16 +0000
189+++ doc/homepage/generate.py 2009-03-08 03:10:40 +0000
190@@ -73,6 +73,7 @@
191 f = open(file[:-3], "w")
192 content = open(file).read()
193 tokens["CONTENT"] = string.Template(content).safe_substitute(tokens)
194+ tokens["PAGENAME"] = file[:-8]
195 for tmpl in template:
196 print >>f, tmpl.safe_substitute(tokens),
197
198
199=== modified file 'doc/homepage/images/header_blue.png'
200Binary files doc/homepage/images/header_blue.png 2007-10-25 15:09:54 +0000 and doc/homepage/images/header_blue.png 2009-03-02 09:02:13 +0000 differ
201=== added file 'doc/homepage/images/icon_bug.png'
202Binary files doc/homepage/images/icon_bug.png 1970-01-01 00:00:00 +0000 and doc/homepage/images/icon_bug.png 2009-03-01 18:56:34 +0000 differ
203=== added file 'doc/homepage/images/icon_sf.png'
204Binary files doc/homepage/images/icon_sf.png 1970-01-01 00:00:00 +0000 and doc/homepage/images/icon_sf.png 2009-03-01 17:58:12 +0000 differ
205=== added file 'doc/homepage/images/icon_wiki.png'
206Binary files doc/homepage/images/icon_wiki.png 1970-01-01 00:00:00 +0000 and doc/homepage/images/icon_wiki.png 2009-03-01 16:31:05 +0000 differ
207=== added file 'doc/homepage/images/igraph2.png'
208Binary files doc/homepage/images/igraph2.png 1970-01-01 00:00:00 +0000 and doc/homepage/images/igraph2.png 2009-03-01 18:56:34 +0000 differ
209=== added file 'doc/homepage/images/igraph3.png'
210Binary files doc/homepage/images/igraph3.png 1970-01-01 00:00:00 +0000 and doc/homepage/images/igraph3.png 2009-03-01 14:40:26 +0000 differ
211=== modified file 'doc/homepage/index.html.in'
212--- doc/homepage/index.html.in 2008-02-06 15:59:48 +0000
213+++ doc/homepage/index.html.in 2009-03-08 09:52:02 +0000
214@@ -1,174 +1,93 @@
215-<h2>Introduction</h2>
216-
217-<img align="right" src="images/screenshots/fastgreedy.png" alt=""/>
218+<table class="intro"><tr><td>
219+<h2 class="th">Introduction</h2>
220+
221+<hr/>
222+
223 <p><b>igraph</b> is a free software package for creating and manipulating undirected and
224 directed graphs. It includes implementations for classic graph theory
225 problems like minimum spanning trees and network flow, and also
226 implements algorithms for some recent network analysis methods, like
227 community structure search.</p>
228-<div class="more"><a href="#introduction2">Read more &raquo;</a></div>
229-
230-<h2>Features</h2>
231+<div class="more"><a href="introduction.html">Read more &raquo;</a></div>
232+
233+</td><td>
234+<h2 class="th">Features</h2>
235+
236+<hr/>
237
238 <p><b>igraph</b> contains functions for generating regular and random graphs,
239 manipulating graphs, assigning attributes to vertices and edges. It can
240-calculate various structural properties, includes heuristics for
241-community structure detection, supports many file formats.</p>
242-<div class="more"><a href="#features">Read more &raquo;</a></div>
243-
244-<h2>Requirements</h2>
245-
246-<p>The software you need for installing <b>igraph</b> depends on
247+calculate various structural properties, graph isomorphism,
248+includes heuristics for community structure detection, supports many
249+ file formats. The R and Python interfaces support visualization.</p>
250+<div class="more"><a href="features.html">Read more &raquo;</a></div>
251+
252+</td><td>
253+<h2 class="th">Requirements</h2>
254+
255+<hr/>
256+
257+<p>
258+igraph runs on most modern machines and operating systems, and it is
259+tested on MS Windows, Mac OSX and various Linux versions.
260+</p>
261+
262+<p>
263+The software you need for installing <b>igraph</b> depends on
264 whether you want to use the C library, the R package or the Python
265-extension.</p>
266-<div class="more"><a href="#requirements">Read more &raquo;</a></div>
267-
268-<hr/>
269-
270-<a name="introduction"></a>
271-<h2>Introduction</h2>
272-
273-<p><b>igraph</b> is a free software package for creating and manipulating undirected and
274-directed graphs. It includes implementations for classic graph theory
275-problems like minimum spanning trees and network flow, and also
276-implements algorithms for some recent network analysis methods, like
277-community structure search.</p>
278-
279-<a name="introduction2"></a>
280-<p>The efficient implementation of <b>igraph</b> allows it to handle graphs
281-with millions of vertices and edges. The rule of thumb is that if your graph
282-fits into the physical memory then <b>igraph</b> can handle it.</p>
283-
284-<p><b>igraph</b> can be installed in several forms:</p>
285-
286-<ul>
287-<li><b>igraph</b> as a <i>C library</i> is useful if you want to use
288- it in your C/C++ projects, or want to implement your own network analysis
289- or model in C/C++ using the data structures and functions <b>igraph</b> provides.
290- </li>
291-<li><b>igraph</b> as an <i>R package</i>. You can use <b>igraph</b> as an extension
292- package to <a href="http://www.r-project.org">The GNU R project for
293- Statistical Computing.</a> The flexibility of the R language and its
294- richness in statistical methods add a great deal of productivity to
295- <b>igraph</b>, with a very small speed penalty.</li>
296-<li><b>igraph</b> as a <a href="http://www.python.org">Python</a> extension module.
297- This way you can combine <b>igraph</b> with the huge set of Python functions
298- and modules available, and the ease of the Python language,
299- with a small speed penalty.</li>
300-<li><b>igraph</b> as a <a href="http://www.ruby-lang.org">Ruby</a> extension.
301- If you like the Ruby language, then this might be the right choice for you.
302- </li>
303-</ul>
304-
305-<p>Every form of <b>igraph</b> contain the same code at the very heart, written in ANSI C.</p>
306-
307-<p>Please note that the ways of installing <b>igraph</b> depends on which its
308-forms you actually want to use. Eg. for using <b>igraph</b> as an R package,
309-you don't need to download the C library at all. See the <a
310-href="download.html">download page</a> for details.</p>
311-
312-<hr/>
313-
314-<a name="features"></a>
315-<h2>Features</h2>
316-
317-<p>Here are some features of <b>igraph</b>, a subset of the functionality,
318-see <i>The igraph reference manual</i> under <a
319-href="documentation.html">Documentation</a> for more details.</p>
320-
321-<ul>
322-<li><b>igraph</b> contains functions for generating regular and random graphs
323-according to many algorithms and models from the network theory
324-literature.</li>
325-<li><b>igraph</b> provides routines for manipulating graphs, adding and removing
326- edges and vertices.</li>
327-<li>You can assign numeric or textual attribute to the vertices or edges
328- of the graph, like edge weights or textual vertex ids.</li>
329-<li>A rich set of functions calculating various structural properties,
330- eg. betweenness, PageRank, k-cores, network motifs,
331- etc. are also included.</li>
332-<li>Force based layout generators for small and large graphs</li>
333-<li>The R package and the Python module can visualize graphs many ways,
334- in 2D and 3D, interactively or non-interactively.</li>
335-<li><b>igraph</b> provides data types for implementing your own algorithm in
336- C, R, Python or Ruby.</li>
337-<li>Community structure detection algorithms using many recently developed
338- heuristics.</li>
339-<li><b>igraph</b> can read and write many file formats, e.g., GraphML, GML or Pajek.
340- </li>
341-<li><b>igraph</b> contains efficient functions for deciding graph isomorphism and
342- subgraph isomorphism</li>
343-<li>It also contains an implementation of the push/relabel algorithm
344- for calculating maximum network flow, and this way minimum cuts,
345- vertex and edge connectivity.</li>
346-<li><b>igraph</b> is well documented both for users and developers.</li>
347-<li><b>igraph</b> is open source and distributed under GNU GPL.</li>
348-</ul>
349-
350-<hr/>
351-
352-<a name="requirements"></a>
353-<h2>Requirements</h2>
354-
355-<h3>The C library</h3>
356- <p>For using the <i><b>igraph</b> C library</i> only a fairly recent C
357- and C++ library is needed. For supporting the GraphML format
358- you'll need the <a href="http://xmlsoft.org"><code>libxml2</code>
359- library</a>.</p>
360-
361- <p>For compiling the <i><b>igraph</b> C library</i> from source you'll
362- need a fairly modern C and C++ compiler and some standard UNIX
363- tools like <code>make</code>, <code>chmod</code>, <code>touch</code>, etc.</p>
364-
365- <p>Most often we use the <a href="http://gcc.gnu.org">GNU Compiler
366- Collection</a> for compiling. Theoretically it is possible to compile
367- <b>igraph</b> using the Microsoft Visual C Compiler as well, but we
368- recommend the <a href="http://www.cygwin.com">Cygwin</a> or
369- <a href="http://www.mingw.org">MinGW</a> environments for compiling
370- <b>igraph</b> under Windows.</p>
371-
372-<h3>The GNU R package</h3>
373- <p>For using the <i><b>igraph</b> R package</i> you obviously need
374- GNU R, a recent version like 2.6.0 is recommended, since earlier
375- versions contain bugs which affect <b>igraph</b>. If you would like to
376- compile the R package from source (this is the usual way for Linux
377- systems), you'll need a C and C++ compiler and optionally the
378- <a href="http://xmlsoft.org"><code>libxml2</code> library</a>.</p>
379-
380- <p>For installing the <i><b>igraph</b> R package</i> on Windows, you
381- don't need anything else, just GNU R and the binary <b>igraph</b> R
382- package of course</p>
383-
384-<h3>The Python extension module</h3>
385-
386- <p>For installing the <i>Python extension module</i>, you'll
387- need Python version 2.4 or later. Binary versions of the module
388- are available for Python 2.4 and Python 2.5 on the Windows platform,
389- so you don't have to compile anything in this case. OS X Leopard
390- users can download an installer as well. If you are using
391- Linux or Mac OS X Tiger, you'll have to compile the extension by yourself.
392- Compilation requires a recent C compiler (e.g. <code>gcc</code>, which
393- is usually available as a separate package in Linux and as part of
394- <a href="http://developer.apple.com/tools/xcode/">XCode</a> on Mac
395- OS X. You'll also need the compiled form of the <b>igraph</b> C
396- core, as the Python interface will link to that during compilation.</p>
397- <p>If you have the <a href="http://peak.telecommunity.com/DevCenter/EasyInstall">EasyInstall</a>
398- Python script installed, you can simply type <code>easy_install igraph</code>
399- to download, compile and install <b>igraph</b> as a Python Egg. This
400- method assumes that you installed the C core somewhere in the
401- default include and library path of your system.</p>
402-
403-<h3>The Ruby extension</h3>
404- <p>For installing the <i>Ruby extension</i> please consult
405- <a href="http://rubyforge.org/projects/igraph/">the homepage of
406- this extension</a>.</p>
407-
408-<h3>Compiling the development version</h3>
409- <p>For compiling the development version of <b>igraph</b>, you'll need
410- quite a large set of tools: <code>autoconf</code>, <code>automake</code>,
411- GNU <code>bison</code> version 1.35 or newer, GNU <code>make</code>,
412- maybe more. For compiling the documentation you'll need even more tools:
413- Python, Docbook schemas and tools, <code>xmlto</code>, <code>makeinfo</code>,
414- <code>patch</code>, <code>docbook2x</code>, maybe more. We're sure you
415- know how to get these if you want to compile the development version :)</p>
416+extension; and may vary depending on your platform.</p>
417+
418+<div class="more"><a href="requirements.html">Read more &raquo;</a></div>
419+</td></tr></table>
420+
421+<hr/>
422+
423+<div class="feed">
424+<script src="http://feeds2.feedburner.com/IgraphAnnouncements?format=sigpro"
425+ type="text/javascript" ></script><noscript><p>Subscribe to RSS
426+ headline updates
427+ from: <a href="http://feeds2.feedburner.com/IgraphAnnouncements"></a><br/>Powered
428+ by FeedBurner</p> </noscript>
429+<p class="comment">View <a href="https://launchpad.net/igraph/+announcements">more
430+ announcements.</a></p>
431+</div>
432+
433+<div class="feed">
434+<script src="http://feeds2.feedburner.com/BugsInIgraph?format=sigpro&nItems=5"
435+ type="text/javascript" ></script><noscript><p>Subscribe to RSS
436+ headline updates
437+ from: <a href="http://feeds2.feedburner.com/BugsInIgraph"></a><br/>Powered
438+ by FeedBurner</p> </noscript>
439+<p class="comment">View <a href="https://launchpad.net/igraph/+bugs">more
440+ bugs.</a></p>
441+</div>
442+
443+<div class="feed">
444+<script src="http://feeds2.feedburner.com/LatestRevisionsForBranchigraph/igraph/06-main?format=sigpro&nItems=5"
445+ type="text/javascript" ></script><noscript><p>Subscribe to RSS
446+ headline updates
447+ from: <a href="http://feeds2.feedburner.com/LatestRevisionsForBranchigraph/igraph/06-main"></a><br/>Powered
448+ by FeedBurner</p> </noscript>
449+<p class="comment"><a href="https://code.launchpad.net/~igraph/igraph/0.6-main">Browse
450+ commits.</a></p>
451+</div>
452+
453+<div class="feed">
454+<script src="http://feeds2.feedburner.com/LatestRevisionsForBranchigraph/igraph/05-main?format=sigpro&nItems=5"
455+ type="text/javascript" ></script><noscript><p>Subscribe to RSS
456+ headline updates
457+ from: <a href="http://feeds2.feedburner.com/LatestRevisionsForBranchigraph/igraph/05-main"></a><br/>Powered
458+ by FeedBurner</p> </noscript>
459+<p class="comment"><a href="https://code.launchpad.net/~igraph/igraph/0.5-main">Browse
460+ commits.</a></p>
461+</div>
462+
463+<div class="feed">
464+<script src="http://feeds2.feedburner.com/RecentPageChangesFromSiteigraphaWikidotSite?format=sigpro&nItems=5"
465+ type="text/javascript" ></script><noscript><p>Subscribe to RSS
466+ headline updates
467+ from: <a href="http://feeds2.feedburner.com/RecentPageChangesFromSiteigraphaWikidotSite"></a><br/>Powered
468+ by FeedBurner</p> </noscript>
469+<p class="comment">See <a href="http://igraph.wikidot.com/system:recent-changes">more changes.</a></p>
470+</div>
471
472
473=== added file 'doc/homepage/introduction.html.in'
474--- doc/homepage/introduction.html.in 1970-01-01 00:00:00 +0000
475+++ doc/homepage/introduction.html.in 2009-03-08 09:52:02 +0000
476@@ -0,0 +1,42 @@
477+<h2>Introduction</h2>
478+
479+<p><b>igraph</b> is a free software package for creating and manipulating undirected and
480+directed graphs. It includes implementations for classic graph theory
481+problems like minimum spanning trees and network flow, and also
482+implements algorithms for some recent network analysis methods, like
483+community structure search.</p>
484+
485+<a name="introduction2"></a>
486+<p>The efficient implementation of <b>igraph</b> allows it to handle graphs
487+with millions of vertices and edges. The rule of thumb is that if your graph
488+fits into the physical memory then <b>igraph</b> can handle it.</p>
489+
490+<p><b>igraph</b> can be installed in several forms:</p>
491+
492+<ul>
493+<li><b>igraph</b> as a <i>C library</i> is useful if you want to use
494+ it in your C/C++ projects, or want to implement your own network analysis
495+ or model in C/C++ using the data structures and functions <b>igraph</b> provides.
496+ </li>
497+<li><b>igraph</b> as an <i>R package</i>. You can use <b>igraph</b> as an extension
498+ package to <a href="http://www.r-project.org">The GNU R project for
499+ Statistical Computing.</a> The flexibility of the R language and its
500+ richness in statistical methods add a great deal of productivity to
501+ <b>igraph</b>, with a very small speed penalty.</li>
502+<li><b>igraph</b> as a <a href="http://www.python.org">Python</a> extension module.
503+ This way you can combine <b>igraph</b> with the huge set of Python functions
504+ and modules available, and the ease of the Python language,
505+ with a small speed penalty.</li>
506+<li><b>igraph</b> as a <a href="http://www.ruby-lang.org">Ruby</a> extension.
507+ If you like the Ruby language, then this might be the right choice for you.
508+ </li>
509+</ul>
510+
511+<p>Every form of <b>igraph</b> contain the same code at the very heart, written in ANSI C.</p>
512+
513+<p>Please note that the ways of installing <b>igraph</b> depends on which its
514+forms you actually want to use. Eg. for using <b>igraph</b> as an R package,
515+you don't need to download the C library at all. See the <a
516+href="download.html">download page</a> for details.</p>
517+
518+<div class="back"><a href="index.html">&laquo; Back to main page</a></div>
519
520=== modified file 'doc/homepage/license.html.in'
521--- doc/homepage/license.html.in 2008-02-14 17:38:13 +0000
522+++ doc/homepage/license.html.in 2009-03-01 18:56:34 +0000
523@@ -1,8 +1,9 @@
524 <h2>License</h2>
525
526-<p>igraph library. Copyright (C) 2003-2008 Gábor Csárdi and Tamás Nepusz
527-&lt;csardi@rmki.kfki.hu&gt; MTA RMKI, Konkoly-Thege Miklós st. 29-33.,
528-Budapest 1121, Hungary.</p>
529+<p>igraph library. Copyright (C) 2003-2009 Gábor Csárdi
530+ &lt;csardi@rmki.kfki.hu&gt; and Tamás Nepusz
531+ &lt;ntamas@rmki.kfki.hu&gt; MTA RMKI, Konkoly-Thege Miklós
532+ st. 29-33., Budapest 1121, Hungary.</p>
533
534 <p> This program is free software; you can redistribute it and/or modify it
535 under the terms of the GNU General Public License as published by the Free
536
537=== added file 'doc/homepage/pngfix.js'
538--- doc/homepage/pngfix.js 1970-01-01 00:00:00 +0000
539+++ doc/homepage/pngfix.js 2009-03-01 23:45:06 +0000
540@@ -0,0 +1,39 @@
541+/*
542+
543+Correctly handle PNG transparency in Win IE 5.5 & 6.
544+http://homepage.ntlworld.com/bobosola. Updated 18-Jan-2006.
545+
546+Use in <HEAD> with DEFER keyword wrapped in conditional comments:
547+<!--[if lt IE 7]>
548+<script defer type="text/javascript" src="pngfix.js"></script>
549+<![endif]-->
550+
551+*/
552+
553+var arVersion = navigator.appVersion.split("MSIE")
554+var version = parseFloat(arVersion[1])
555+
556+if ((version >= 5.5) && (document.body.filters))
557+{
558+ for(var i=0; i<document.images.length; i++)
559+ {
560+ var img = document.images[i]
561+ var imgName = img.src.toUpperCase()
562+ if (imgName.substring(imgName.length-3, imgName.length) == "PNG")
563+ {
564+ var imgID = (img.id) ? "id='" + img.id + "' " : ""
565+ var imgClass = (img.className) ? "class='" + img.className + "' " : ""
566+ var imgTitle = (img.title) ? "title='" + img.title + "' " : "title='" + img.alt + "' "
567+ var imgStyle = "display:inline-block;" + img.style.cssText
568+ if (img.align == "left") imgStyle = "float:left;" + imgStyle
569+ if (img.align == "right") imgStyle = "float:right;" + imgStyle
570+ if (img.parentElement.href) imgStyle = "cursor:hand;" + imgStyle
571+ var strNewHTML = "<span " + imgID + imgClass + imgTitle
572+ + " style=\"" + "width:" + img.width + "px; height:" + img.height + "px;" + imgStyle + ";"
573+ + "filter:progid:DXImageTransform.Microsoft.AlphaImageLoader"
574+ + "(src=\'" + img.src + "\', sizingMethod='scale');\"></span>"
575+ img.outerHTML = strNewHTML
576+ i = i-1
577+ }
578+ }
579+}
580
581=== added file 'doc/homepage/requirements.html.in'
582--- doc/homepage/requirements.html.in 1970-01-01 00:00:00 +0000
583+++ doc/homepage/requirements.html.in 2009-03-08 09:52:02 +0000
584@@ -0,0 +1,65 @@
585+<h2>Requirements</h2>
586+
587+<h3>The C library</h3>
588+ <p>For using the <i><b>igraph</b> C library</i> only a fairly recent C
589+ and C++ library is needed. For supporting the GraphML format
590+ you'll need the <a href="http://xmlsoft.org"><code>libxml2</code>
591+ library</a>.</p>
592+
593+ <p>For compiling the <i><b>igraph</b> C library</i> from source you'll
594+ need a fairly modern C and C++ compiler and some standard UNIX
595+ tools like <code>make</code>, <code>chmod</code>, <code>touch</code>, etc.</p>
596+
597+ <p>Most often we use the <a href="http://gcc.gnu.org">GNU Compiler
598+ Collection</a> for compiling. Theoretically it is possible to compile
599+ <b>igraph</b> using the Microsoft Visual C Compiler as well, but we
600+ recommend the <a href="http://www.cygwin.com">Cygwin</a> or
601+ <a href="http://www.mingw.org">MinGW</a> environments for compiling
602+ <b>igraph</b> under Windows.</p>
603+
604+<h3>The GNU R package</h3>
605+ <p>For using the <i><b>igraph</b> R package</i> you obviously need
606+ GNU R, a recent version like 2.6.0 is recommended, since earlier
607+ versions contain bugs which affect <b>igraph</b>. If you would like to
608+ compile the R package from source (this is the usual way for Linux
609+ systems), you'll need a C and C++ compiler and optionally the
610+ <a href="http://xmlsoft.org"><code>libxml2</code> library</a>.</p>
611+
612+ <p>For installing the <i><b>igraph</b> R package</i> on Windows, you
613+ don't need anything else, just GNU R and the binary <b>igraph</b> R
614+ package of course</p>
615+
616+<h3>The Python extension module</h3>
617+
618+ <p>For installing the <i>Python extension module</i>, you'll
619+ need Python version 2.4 or later. Binary versions of the module
620+ are available for Python 2.4 and Python 2.5 on the Windows platform,
621+ so you don't have to compile anything in this case. OS X Leopard
622+ users can download an installer as well. If you are using
623+ Linux or Mac OS X Tiger, you'll have to compile the extension by yourself.
624+ Compilation requires a recent C compiler (e.g. <code>gcc</code>, which
625+ is usually available as a separate package in Linux and as part of
626+ <a href="http://developer.apple.com/tools/xcode/">XCode</a> on Mac
627+ OS X. You'll also need the compiled form of the <b>igraph</b> C
628+ core, as the Python interface will link to that during compilation.</p>
629+ <p>If you have the <a href="http://peak.telecommunity.com/DevCenter/EasyInstall">EasyInstall</a>
630+ Python script installed, you can simply type <code>easy_install igraph</code>
631+ to download, compile and install <b>igraph</b> as a Python Egg. This
632+ method assumes that you installed the C core somewhere in the
633+ default include and library path of your system.</p>
634+
635+<h3>The Ruby extension</h3>
636+ <p>For installing the <i>Ruby extension</i> please consult
637+ <a href="http://rubyforge.org/projects/igraph/">the homepage of
638+ this extension</a>.</p>
639+
640+<h3>Compiling the development version</h3>
641+ <p>For compiling the development version of <b>igraph</b>, you'll need
642+ quite a large set of tools: <code>autoconf</code>, <code>automake</code>,
643+ GNU <code>bison</code> version 1.35 or newer, GNU <code>make</code>,
644+ maybe more. For compiling the documentation you'll need even more tools:
645+ Python, Docbook schemas and tools, <code>xmlto</code>, <code>makeinfo</code>,
646+ <code>patch</code>, <code>docbook2x</code>, maybe more. We're sure you
647+ know how to get these if you want to compile the development version :)</p>
648+
649+<div class="back"><a href="index.html">&laquo; Back to main page</a></div>
650
651=== modified file 'doc/homepage/style.css'
652--- doc/homepage/style.css 2008-02-12 17:49:44 +0000
653+++ doc/homepage/style.css 2009-03-08 11:22:36 +0000
654@@ -8,17 +8,26 @@
655 h1 {
656 color: #fff;
657 margin: 0;
658- padding: 7px 0 7px 15px;
659+ height: 40px;
660+ line-height: 40px;
661 text-shadow: 0px 1px 2px #000;
662- background: #005fd7 url(images/header_blue.png) repeat-x;
663+ background: #1872ce url(images/header_blue.png) repeat-x;
664+ border-top: 1px solid #1872ce;
665 border-bottom: 1px solid #1c477f;
666 font-size: large;
667+ padding-left: 10px;
668 }
669 h2 {
670- font-size: 1.2em;
671+ font-size: 1.5em;
672+ text-indent: -40px;
673+}
674+h2.th {
675+ font-size: 1.5em;
676+ text-indent: 0px;
677 }
678 h3 {
679 font-size: 1em;
680+ text-indent: -20px;
681 }
682 h4 {
683 font-size: 0.8em;
684@@ -37,13 +46,26 @@
685 img.float_left { float: left }
686 pre.condensed { font-size: 0.8em; line-height: 1.5em; }
687
688-.main {
689- padding: 7px 15px;
690+.igraphlogo {
691+ float: right;
692+ padding-left: 40px;
693+ padding-right: 40px;
694+ padding-top:30px;
695+}
696+
697+div.main {
698+ max-width:900px;
699+ padding-left: 50px;
700+ padding-bottom: 50px;
701+ margin-right: 0;
702 }
703 .more {
704 text-align: right;
705 margin-top: -1em;
706 }
707+.back {
708+ text-align: left;
709+}
710
711 ul.no-bullet {
712 list-style-type: none;
713@@ -80,6 +102,9 @@
714 li.download-c {
715 background: url(images/icon_c.png) no-repeat 0px 0px;
716 }
717+li.download-sf {
718+ background: url(images/icon_sf.png) no-repeat 0px 0px;
719+}
720 li.download-r {
721 background: url(images/icon_r.png) no-repeat 0px 0px;
722 }
723@@ -92,6 +117,9 @@
724 li.download-doc {
725 background: url(images/icon_documentation.png) no-repeat 0px 0px;
726 }
727+li.download-wiki {
728+ background: url(images/icon_wiki.png) no-repeat 0px 0px;
729+}
730 ul.download-links {
731 list-style-type: none;
732 padding: 2px 0 0 20px; margin: 0;
733@@ -99,13 +127,16 @@
734 }
735 ul.download-links li {
736 padding: 0px 10px 0px 0px; margin: 0;
737- display: inline;
738 padding-bottom: 5px !important;
739 }
740 ul.download-links li.download-source {
741 background: url(images/icon_source.png) no-repeat 0px 0px;
742 padding-left: 18px;
743 }
744+ul.download-links li.download-sf {
745+ background: url(images/icon_sf.png) no-repeat 0px 0px;
746+ padding-left: 18px;
747+}
748 ul.download-links li.download-windows {
749 background: url(images/icon_windows.png) no-repeat 0px 0px;
750 padding-left: 18px;
751@@ -126,6 +157,10 @@
752 background: url(images/icon_links.png) no-repeat 0px 0px;
753 padding-left: 18px;
754 }
755+ul.download-links li.download-wiki {
756+ background: url(images/icon_wiki.png) no-repeat 0px 0px;
757+ padding-left: 18px;
758+}
759 ul.download-links li.download-pdf {
760 background: url(images/icon_pdf.png) no-repeat 0px 0px;
761 padding-left: 18px;
762@@ -146,11 +181,8 @@
763 h2 a, h2 a:visited, h2 a:hover { color: #000; text-decoration: none }
764 h3 a, h3 a:visited, h3 a:hover { color: #000; text-decoration: none }
765
766-/* Version info */
767-#version_info {
768+#sourceforge_logo {
769 float: right;
770- font-size: 0.8em;
771- color: #888;
772 padding: 3px 15px 0px 0px;
773 }
774
775@@ -162,41 +194,89 @@
776 ul.menu-upper {
777 list-style-type: none;
778 padding: 0px 0px 0px 15px; margin: 0;
779+ margin-top: 10px;
780+ width: 95%;
781+ border-bottom: 1px solid;
782+ text-align: left;
783 }
784-
785 ul.menu li {
786- padding: 0px 0px 10px 20px;
787+ padding: 0px 10px 10px 20px;
788 margin: 0px;
789 }
790 ul.menu-upper li {
791- padding: 3px 10px 10px 20px;
792+ padding: 8px 10px 4px 25px;
793 font-size: 0.8em;
794- margin: 0px;
795+ border: solid;
796+ border-width: 1px 1px 1px 1px;
797+ margin: 0px 0px 0px 0px;
798 display: inline;
799 }
800+ul.menu-upper li:hover {
801+ border-top: solid 2px #0000ff;
802+ border-left: solid 2px #0000ff;
803+ border-right: solid 2px #0000ff;
804+}
805 ul li.item-introduction {
806- background: url(images/icon_info.png) no-repeat 0px 3px;
807+ background: #dadaff url(images/icon_info.png) no-repeat 5px 6px;
808 }
809 ul li.item-download {
810- background: url(images/icon_download.png) no-repeat 0px 3px;
811+ background: #dadaff url(images/icon_download.png) no-repeat 5px 6px;
812 }
813 ul li.item-news {
814- background: url(images/icon_news.png) no-repeat 0px 3px;
815+ background: #dadaff url(images/icon_news.png) no-repeat 5px 6px;
816 }
817 ul li.item-documentation {
818- background: url(images/icon_documentation.png) no-repeat 0px 3px;
819+ background: #dadaff url(images/icon_documentation.png) no-repeat 5px 6px;
820+}
821+ul li.item-wiki {
822+ background: #dadaff url(images/icon_wiki.png) no-repeat 5px 6px;
823 }
824 ul li.item-screenshots {
825- background: url(images/icon_screenshots.png) no-repeat 0px 3px;
826+ background: #dadaff url(images/icon_screenshots.png) no-repeat 5px 6px;
827 }
828 ul li.item-community {
829- background: url(images/icon_community.png) no-repeat 0px 3px;
830+ background: #dadaff url(images/icon_community.png) no-repeat 5px 6px;
831+}
832+ul li.item-bug {
833+ background: #dadaff url(images/icon_bug.png) no-repeat 5px 6px;
834 }
835 ul li.item-links {
836- background: url(images/icon_links.png) no-repeat 0px 3px;
837+ background: #dadaff url(images/icon_links.png) no-repeat 5px 6px;
838 }
839 ul li.item-license {
840- background: url(images/icon_license.png) no-repeat 0px 3px;
841+ background: #dadaff url(images/icon_license.png) no-repeat 5px 6px;
842+}
843+body#index li#n-index {
844+ background: #ffffff url(images/icon_info.png) no-repeat 5px 6px;
845+ border-bottom: 1px solid #ffffff;
846+}
847+body#news li#n-news {
848+ background: #ffffff url(images/icon_news.png) no-repeat 5px 6px;
849+ border-bottom: 1px solid #ffffff;
850+}
851+body#download li#n-download{
852+ background: #ffffff url(images/icon_download.png) no-repeat 5px 6px;
853+ border-bottom: 1px solid #ffffff;
854+}
855+body#documentation li#n-documentation{
856+ background: #ffffff url(images/icon_documentation.png) no-repeat 5px 6px;
857+ border-bottom: 1px solid #ffffff;
858+}
859+body#screenshots li#n-screenshots{
860+ background: #ffffff url(images/icon_screenshots.png) no-repeat 5px 6px;
861+ border-bottom: 1px solid #ffffff;
862+}
863+body#support li#n-support{
864+ background: #ffffff url(images/icon_community.png) no-repeat 5px 6px;
865+ border-bottom: 1px solid #ffffff;
866+}
867+body#bugs li#n-bugs{
868+ background: #ffffff url(images/icon_bug.png) no-repeat 5px 6px;
869+ border-bottom: 1px solid #ffffff;
870+}
871+body#license li#n-license {
872+ background: #ffffff url(images/icon_license.png) no-repeat 5px 6px;
873+ border-bottom: 1px solid #ffffff;
874 }
875
876 /* Forms */
877@@ -232,3 +312,80 @@
878 {
879 padding: 0px 60px 0px;
880 }
881+table.intro td {
882+ vertical-align: top;
883+ width: 33.3%;
884+ padding: 20px;
885+}
886+div.feed {
887+ width: 100%;
888+ padding: 20px;
889+}
890+div.feedburnerFeedBlock p.feedTitle {
891+ font-size: 1em;
892+}
893+div.feedburnerFeedBlock a {
894+ color: black;
895+}
896+div.feedburnerFeedBlock ul a {
897+ color: #00f; text-decoration: none
898+}
899+div.feedburnerFeedBlock {
900+ font-size: 1em;
901+}
902+div.feedburnerFeedBlock ul {
903+ list-style-type: none;
904+}
905+div.feedburnerFeedBlock ul li {
906+ margin-top: 10px;
907+}
908+div.feedburnerFeedBlock p.date {
909+ font-style: italic;
910+ font-size: 0.8em;
911+ margin: 5px;
912+}
913+div.feedburnerFeedBlock p.date:before {
914+ content: "(";
915+}
916+div.feedburnerFeedBlock p.date:after {
917+ content: ")";
918+}
919+div.feedburnerFeedBlock span, span+p {display:inline}
920+div.feed span.headline {
921+ font-size: 1.25em;
922+}
923+#creditfooter {
924+ float: right;
925+}
926+div.feedburnerFeedBlock table {
927+ table-layout: fixed;
928+ width: 100%;
929+}
930+div.quick-menu {
931+ position:fixed;
932+ bottom:0;
933+ width: 100%;
934+}
935+div.quick-menu ul.menu-bottom {
936+ padding: 5px;
937+ border-top: 1px solid #0000a0;
938+ text-align: center;
939+ background: #dadaff;
940+}
941+ul.menu-bottom {
942+ list-style-type: none;
943+ padding: 0px 0px 0px 15px; margin: 0;
944+}
945+ul.menu-bottom li {
946+ padding: 8px 10px 10px 25px;
947+ font-size: 0.8em;
948+ margin: 0px;
949+ display: inline;
950+}
951+div.copyright {
952+ border-top: 1px solid #8e8e8e;
953+ text-align: center;
954+ color: #8e8e8e;
955+ padding-top: 10px;
956+ padding-bottom: 50px;
957+}
958\ No newline at end of file
959
960=== modified file 'doc/homepage/support.html.in'
961--- doc/homepage/support.html.in 2008-02-14 10:57:48 +0000
962+++ doc/homepage/support.html.in 2009-03-01 18:56:34 +0000
963@@ -1,29 +1,35 @@
964-<h2>Getting help</h2>
965-
966-<h3>Mailing lists</h3>
967+<h2>Mailing lists</h2>
968
969 <p>There are two <b>igraph</b> mailing lists: <code>igraph-help</code>
970-and <code>igraph-announce</code>.</p>
971+ and <code>igraph-announce</code>.
972+</p>
973+
974+<h3><code>igraph-help</code></h3>
975
976 <p><code>igraph-help</code> is for general discussion and questions,
977-help, suggestions, etc. See
978-<a href="http://lists.nongnu.org/mailman/listinfo/igraph-help"
979-onClick="javascript:urchinTracker ('/outgoing/mailinglist-help'); ">
980-its page at Savannah</a> for subscription and archives.</p>
981-
982-<p><code>igraph-announce</code> is a very low-traffic mailing (about one email per month) list for
983-important announcements only, such as a new <b>igraph</b> release for
984-example. <code>igraph-announce</code> is a read-only list, you can not
985-post any messages to it. See
986-<a href="http://lists.nongnu.org/mailman/listinfo/igraph-announce"
987-onClick="javascript:urchinTracker ('/outgoing/mailinglist-announce'); ">
988-its page at Savannah</a> for subscription and archives.</p>
989-
990-<h3>Bug reports, questions, comments</h3>
991-
992-<p>Please send questions, comments and bug reports to the
993-<a href="mailto:igraph-help@nongnu.org"><code>igraph-help</code>
994-mailing list</a> or to <a href="mailto:csardi@rmki.kfki.hu">Gábor
995-Csárdi</a>. The mailing list is preferred. In order to receive
996-responses to your comments, please consider subscribing to the
997-mailing list.</p>
998+ help, suggestions, etc. See
999+ <a href="http://lists.nongnu.org/mailman/listinfo/igraph-help"
1000+ onClick="javascript:urchinTracker ('/outgoing/mailinglist-help'); ">
1001+ its page at Savannah</a> for subscription and archives.
1002+</p>
1003+
1004+<h3><code>igraph-announce</code></h3>
1005+
1006+<p><code>igraph-announce</code> is a very low-traffic mailing (about
1007+ two emails per year) list for important announcements only, such as
1008+ a new <b>igraph</b> release for example. These announcements are
1009+ also sent to <code>igraph-help</code>, so you don't need to sign up
1010+ to both mailing lists.
1011+</p>
1012+
1013+<p><code>igraph-announce</code> is a read-only list, you can
1014+ not post any messages to it.
1015+</p>
1016+
1017+<p>See <a href="http://lists.nongnu.org/mailman/listinfo/igraph-announce"
1018+ onClick="javascript:urchinTracker
1019+ ('/outgoing/mailinglist-announce'); ">
1020+ <code>igraph-announce</code> at Savannah</a> for subscription and
1021+ archives.
1022+</p>
1023+
1024
1025=== modified file 'doc/homepage/template.html'
1026--- doc/homepage/template.html 2008-02-14 10:57:48 +0000
1027+++ doc/homepage/template.html 2009-03-08 11:22:36 +0000
1028@@ -6,9 +6,13 @@
1029 <head>
1030 <title>The igraph library for complex network research</title>
1031 <link rel="stylesheet" type="text/css" href="style.css"></link>
1032+ <link rel="shortcut icon" href="images/igraph3.png" type="image/x-icon" />
1033+ <!--[if lt IE 7.]>
1034+ <script defer type="text/javascript" src="pngfix.js"></script>
1035+ <![endif]-->
1036 </head>
1037
1038-<body>
1039+<body id="$PAGENAME">
1040
1041 <script src="http://www.google-analytics.com/urchin.js" type="text/javascript">
1042 </script>
1043@@ -17,26 +21,68 @@
1044 urchinTracker();
1045 </script>
1046
1047-<h1><a href="index.html">The <b>igraph</b> library</a></h1>
1048-
1049-<div id="version_info">Latest version: <b>$VERSION</b><br/>
1050+<div id="sourceforge_logo">
1051+<a href="http://sourceforge.net/projects/igraph"><img src="http://sflogo.sourceforge.net/sflogo.php?group_id=160289&type=12" width="120" height="30" border="0" alt="Get The igraph library at SourceForge.net. Fast, secure and Free Open Source software downloads" /></a>
1052+</div>
1053+
1054+<h1><a href="index.html">The <b>igraph</b> library</a>
1055+</h1>
1056+
1057+<ul class="menu-upper">
1058+ <li class="item-introduction" id="n-index"><a href="index.html">Home</a></li>
1059+ <li class="item-news" id="n-news"><a href="news.html">News</a></li>
1060+ <li class="item-download" id="n-download"><a href="download.html">Download</a></li>
1061+ <li class="item-documentation" id="n-documentation"><a href="documentation.html">Documentation</a></li>
1062+ <li class="item-wiki"><a href="http://igraph.wikidot.com">Wiki</a></li>
1063+ <li class="item-screenshots" id="n-screenshots"><a href="screenshots.html">Screenshots</a></li>
1064+ <li class="item-community" id="n-support"><a href="support.html">Mailing lists</a></li>
1065+ <li class="item-bug" id="n-bugs"><a href="bugs.html">Bugs</a></li>
1066+<!-- <li class="item-links" id="n-links"><a href="links.html">Links</a></li> -->
1067+ <li class="item-license" id="n-license"><a href="license.html">License</a></li>
1068+</ul>
1069+
1070+<div class="igraphlogo">
1071+<img src="images/igraph2.png" alt="" /><br/>
1072+Latest version: <b>$VERSION</b><br/>
1073 <a href="relnotes-$VERSION.html">Release notes</a>
1074 </div>
1075
1076-<ul class="menu-upper">
1077- <li class="item-introduction"><a href="index.html">Introduction</a></li>
1078- <li class="item-news"><a href="news.html">News</a></li>
1079- <li class="item-download"><a href="download.html">Download</a></li>
1080- <li class="item-documentation"><a href="documentation.html">Documentation</a></li>
1081- <li class="item-screenshots"><a href="screenshots.html">Screenshots</a></li>
1082- <li class="item-community"><a href="support.html">Community</a></li>
1083- <li class="item-links"><a href="links.html">Links</a></li>
1084- <li class="item-license"><a href="license.html">License</a></li>
1085-</ul>
1086-
1087 <div class="main">
1088 $CONTENT
1089 </div>
1090
1091+<div class="quick-menu">
1092+<ul class="menu-bottom">
1093+ <li class="item-bug"><a id="qm-reportbug"
1094+ href="https://bugs.launchpad.net/igraph/+filebug">
1095+ <span>Report a bug</span></a></li>
1096+<!-- <li><a href="https://bugs.launchpad.net/igraph/+filebug"> -->
1097+<!-- <span>Request a feature</span></a></li> -->
1098+ <li class="item-bug"><a id="qm-listbugs"
1099+ href="https://bugs.launchpad.net/igraph/+bugs">
1100+ <span>List open bugs</span></a></li>
1101+ <li class="item-community"><a id="qm-searcharchive"
1102+ href="http://lists.gnu.org/archive/html/igraph-help/">
1103+ <span>Search mailing list</span></a></li>
1104+ <li class="item-community"><a id="qm-subscribe"
1105+ href="http://lists.nongnu.org/mailman/listinfo/igraph-help">
1106+ <span>Subscribe to mailing list</span></a></li>
1107+ <li class="item-license"><a id="qm-browsesource"
1108+ href="http://bazaar.launchpad.net/~igraph/igraph/0.5-main/files/1139.1.143">
1109+ <span>Browse source code</span></a></li>
1110+ <li class="item-documentation"><a id="qm-searchdocs"
1111+ href="http://www.google.com/coop/cse?cx=006731217555358536053:yeszjk2wr6q">
1112+ <span>Search documentation</span></a></li>
1113+</ul>
1114+</div>
1115+
1116+<div class="copyright">
1117+ Copyright &copy;
1118+ 2005-2009 <a href="http://igraph.sourceforge.net/">The igraph
1119+ Project</a>.<br><a href="http://validator.w3.org/check/referer">Optimised</a>
1120+ for <a href="http://www.w3.org/">standards</a>. Hosted
1121+ by <a href="http://www.sourceforge.net/">SourceForge</a>.
1122+</div>
1123+
1124 </body>
1125 </html>
1126
1127=== added file 'doc/sitemap_gen.py'
1128--- doc/sitemap_gen.py 1970-01-01 00:00:00 +0000
1129+++ doc/sitemap_gen.py 2009-03-03 11:01:42 +0000
1130@@ -0,0 +1,2094 @@
1131+#!/usr/bin/env python
1132+#
1133+# Copyright (c) 2004, 2005 Google Inc.
1134+# All rights reserved.
1135+#
1136+# Redistribution and use in source and binary forms, with or without
1137+# modification, are permitted provided that the following conditions
1138+# are met:
1139+#
1140+# * Redistributions of source code must retain the above copyright
1141+# notice, this list of conditions and the following disclaimer.
1142+#
1143+# * Redistributions in binary form must reproduce the above copyright
1144+# notice, this list of conditions and the following disclaimer in
1145+# the documentation and/or other materials provided with the
1146+# distribution.
1147+#
1148+# * Neither the name of Google nor the names of its contributors may
1149+# be used to endorse or promote products derived from this software
1150+# without specific prior written permission.
1151+#
1152+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
1153+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
1154+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
1155+# FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
1156+# COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
1157+# INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
1158+# BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
1159+# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
1160+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
1161+# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
1162+# ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
1163+# POSSIBILITY OF SUCH DAMAGE.
1164+#
1165+#
1166+# The sitemap_gen.py script is written in Python 2.2 and released to
1167+# the open source community for continuous improvements under the BSD
1168+# 2.0 new license, which can be found at:
1169+#
1170+# http://www.opensource.org/licenses/bsd-license.php
1171+#
1172+
1173+__usage__ = \
1174+"""A simple script to automatically produce sitemaps for a webserver,
1175+in the Google Sitemap Protocol (GSP).
1176+
1177+Usage: python sitemap_gen.py --config=config.xml [--help] [--testing]
1178+ --config=config.xml, specifies config file location
1179+ --help, displays usage message
1180+ --testing, specified when user is experimenting
1181+"""
1182+
1183+# Please be careful that all syntax used in this file can be parsed on
1184+# Python 1.5 -- this version check is not evaluated until after the
1185+# entire file has been parsed.
1186+import sys
1187+if sys.hexversion < 0x02020000:
1188+ print 'This script requires Python 2.2 or later.'
1189+ print 'Currently run with version: %s' % sys.version
1190+ sys.exit(1)
1191+
1192+import fnmatch
1193+import glob
1194+import gzip
1195+import md5
1196+import os
1197+import re
1198+import stat
1199+import time
1200+import types
1201+import urllib
1202+import urlparse
1203+import xml.sax
1204+
1205+# True and False were introduced in Python2.2.2
1206+try:
1207+ testTrue=True
1208+ del testTrue
1209+except NameError:
1210+ True=1
1211+ False=0
1212+
1213+# Text encodings
1214+ENC_ASCII = 'ASCII'
1215+ENC_UTF8 = 'UTF-8'
1216+ENC_IDNA = 'IDNA'
1217+ENC_ASCII_LIST = ['ASCII', 'US-ASCII', 'US', 'IBM367', 'CP367', 'ISO646-US'
1218+ 'ISO_646.IRV:1991', 'ISO-IR-6', 'ANSI_X3.4-1968',
1219+ 'ANSI_X3.4-1986', 'CPASCII' ]
1220+ENC_DEFAULT_LIST = ['ISO-8859-1', 'ISO-8859-2', 'ISO-8859-5']
1221+
1222+# Available Sitemap types
1223+SITEMAP_TYPES = ['web', 'mobile', 'news']
1224+
1225+# General Sitemap tags
1226+GENERAL_SITEMAP_TAGS = ['loc', 'changefreq', 'priority', 'lastmod']
1227+
1228+# News specific tags
1229+NEWS_SPECIFIC_TAGS = ['keywords', 'publication_date', 'stock_tickers']
1230+
1231+# News Sitemap tags
1232+NEWS_SITEMAP_TAGS = GENERAL_SITEMAP_TAGS + NEWS_SPECIFIC_TAGS
1233+
1234+# Maximum number of urls in each sitemap, before next Sitemap is created
1235+MAXURLS_PER_SITEMAP = 50000
1236+
1237+# Suffix on a Sitemap index file
1238+SITEINDEX_SUFFIX = '_index.xml'
1239+
1240+# Regular expressions tried for extracting URLs from access logs.
1241+ACCESSLOG_CLF_PATTERN = re.compile(
1242+ r'.+\s+"([^\s]+)\s+([^\s]+)\s+HTTP/\d+\.\d+"\s+200\s+.*'
1243+ )
1244+
1245+# Match patterns for lastmod attributes
1246+DATE_PATTERNS = map(re.compile, [
1247+ r'^\d\d\d\d$',
1248+ r'^\d\d\d\d-\d\d$',
1249+ r'^\d\d\d\d-\d\d-\d\d$',
1250+ r'^\d\d\d\d-\d\d-\d\dT\d\d:\d\dZ$',
1251+ r'^\d\d\d\d-\d\d-\d\dT\d\d:\d\d[+-]\d\d:\d\d$',
1252+ r'^\d\d\d\d-\d\d-\d\dT\d\d:\d\d:\d\d(\.\d+)?Z$',
1253+ r'^\d\d\d\d-\d\d-\d\dT\d\d:\d\d:\d\d(\.\d+)?[+-]\d\d:\d\d$',
1254+ ])
1255+
1256+# Match patterns for changefreq attributes
1257+CHANGEFREQ_PATTERNS = [
1258+ 'always', 'hourly', 'daily', 'weekly', 'monthly', 'yearly', 'never'
1259+ ]
1260+
1261+# XML formats
1262+GENERAL_SITEINDEX_HEADER = \
1263+ '<?xml version="1.0" encoding="UTF-8"?>\n' \
1264+ '<sitemapindex\n' \
1265+ ' xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"\n' \
1266+ ' xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"\n' \
1267+ ' xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9\n' \
1268+ ' http://www.sitemaps.org/schemas/sitemap/0.9/' \
1269+ 'siteindex.xsd">\n'
1270+
1271+NEWS_SITEINDEX_HEADER = \
1272+ '<?xml version="1.0" encoding="UTF-8"?>\n' \
1273+ '<sitemapindex\n' \
1274+ ' xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"\n' \
1275+ ' xmlns:news="http://www.google.com/schemas/sitemap-news/0.9"\n' \
1276+ ' xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"\n' \
1277+ ' xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9\n' \
1278+ ' http://www.sitemaps.org/schemas/sitemap/0.9/' \
1279+ 'siteindex.xsd">\n'
1280+
1281+SITEINDEX_FOOTER = '</sitemapindex>\n'
1282+SITEINDEX_ENTRY = \
1283+ ' <sitemap>\n' \
1284+ ' <loc>%(loc)s</loc>\n' \
1285+ ' <lastmod>%(lastmod)s</lastmod>\n' \
1286+ ' </sitemap>\n'
1287+GENERAL_SITEMAP_HEADER = \
1288+ '<?xml version="1.0" encoding="UTF-8"?>\n' \
1289+ '<urlset\n' \
1290+ ' xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"\n' \
1291+ ' xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"\n' \
1292+ ' xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9\n' \
1293+ ' http://www.sitemaps.org/schemas/sitemap/0.9/' \
1294+ 'sitemap.xsd">\n'
1295+
1296+NEWS_SITEMAP_HEADER = \
1297+ '<?xml version="1.0" encoding="UTF-8"?>\n' \
1298+ '<urlset\n' \
1299+ ' xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"\n' \
1300+ ' xmlns:news="http://www.google.com/schemas/sitemap-news/0.9"\n' \
1301+ ' xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"\n' \
1302+ ' xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9\n' \
1303+ ' http://www.sitemaps.org/schemas/sitemap/0.9/' \
1304+ 'sitemap.xsd">\n'
1305+
1306+SITEMAP_FOOTER = '</urlset>\n'
1307+SITEURL_XML_PREFIX = ' <url>\n'
1308+SITEURL_XML_SUFFIX = ' </url>\n'
1309+
1310+NEWS_TAG_XML_PREFIX = ' <news:news>\n'
1311+NEWS_TAG_XML_SUFFIX = ' </news:news>\n'
1312+
1313+# Search engines to notify with the updated sitemaps
1314+#
1315+# This list is very non-obvious in what's going on. Here's the gist:
1316+# Each item in the list is a 6-tuple of items. The first 5 are "almost"
1317+# the same as the input arguments to urlparse.urlunsplit():
1318+# 0 - schema
1319+# 1 - netloc
1320+# 2 - path
1321+# 3 - query <-- EXCEPTION: specify a query map rather than a string
1322+# 4 - fragment
1323+# Additionally, add item 5:
1324+# 5 - query attribute that should be set to the new Sitemap URL
1325+# Clear as mud, I know.
1326+NOTIFICATION_SITES = [
1327+ ('http', 'www.google.com', 'webmasters/sitemaps/ping', {}, '', 'sitemap'),
1328+ ]
1329+
1330+
1331+class Error(Exception):
1332+ """
1333+ Base exception class. In this module we tend not to use our own exception
1334+ types for very much, but they come in very handy on XML parsing with SAX.
1335+ """
1336+ pass
1337+#end class Error
1338+
1339+
1340+class SchemaError(Error):
1341+ """Failure to process an XML file according to the schema we know."""
1342+ pass
1343+#end class SchemeError
1344+
1345+
1346+class Encoder:
1347+ """
1348+ Manages wide-character/narrow-character conversions for just about all
1349+ text that flows into or out of the script.
1350+
1351+ You should always use this class for string coercion, as opposed to
1352+ letting Python handle coercions automatically. Reason: Python
1353+ usually assumes ASCII (7-bit) as a default narrow character encoding,
1354+ which is not the kind of data we generally deal with.
1355+
1356+ General high-level methodologies used in sitemap_gen:
1357+
1358+ [PATHS]
1359+ File system paths may be wide or narrow, depending on platform.
1360+ This works fine, just be aware of it and be very careful to not
1361+ mix them. That is, if you have to pass several file path arguments
1362+ into a library call, make sure they are all narrow or all wide.
1363+ This class has MaybeNarrowPath() which should be called on every
1364+ file system path you deal with.
1365+
1366+ [URLS]
1367+ URL locations are stored in Narrow form, already escaped. This has the
1368+ benefit of keeping escaping and encoding as close as possible to the format
1369+ we read them in. The downside is we may end up with URLs that have
1370+ intermingled encodings -- the root path may be encoded in one way
1371+ while the filename is encoded in another. This is obviously wrong, but
1372+ it should hopefully be an issue hit by very few users. The workaround
1373+ from the user level (assuming they notice) is to specify a default_encoding
1374+ parameter in their config file.
1375+
1376+ [OTHER]
1377+ Other text, such as attributes of the URL class, configuration options,
1378+ etc, are generally stored in Unicode for simplicity.
1379+ """
1380+
1381+ def __init__(self):
1382+ self._user = None # User-specified default encoding
1383+ self._learned = [] # Learned default encodings
1384+ self._widefiles = False # File system can be wide
1385+
1386+ # Can the file system be Unicode?
1387+ try:
1388+ self._widefiles = os.path.supports_unicode_filenames
1389+ except AttributeError:
1390+ try:
1391+ self._widefiles = sys.getwindowsversion() == os.VER_PLATFORM_WIN32_NT
1392+ except AttributeError:
1393+ pass
1394+
1395+ # Try to guess a working default
1396+ try:
1397+ encoding = sys.getfilesystemencoding()
1398+ if encoding and not (encoding.upper() in ENC_ASCII_LIST):
1399+ self._learned = [ encoding ]
1400+ except AttributeError:
1401+ pass
1402+
1403+ if not self._learned:
1404+ encoding = sys.getdefaultencoding()
1405+ if encoding and not (encoding.upper() in ENC_ASCII_LIST):
1406+ self._learned = [ encoding ]
1407+
1408+ # If we had no guesses, start with some European defaults
1409+ if not self._learned:
1410+ self._learned = ENC_DEFAULT_LIST
1411+ #end def __init__
1412+
1413+ def SetUserEncoding(self, encoding):
1414+ self._user = encoding
1415+ #end def SetUserEncoding
1416+
1417+ def NarrowText(self, text, encoding):
1418+ """ Narrow a piece of arbitrary text """
1419+ if type(text) != types.UnicodeType:
1420+ return text
1421+
1422+ # Try the passed in preference
1423+ if encoding:
1424+ try:
1425+ result = text.encode(encoding)
1426+ if not encoding in self._learned:
1427+ self._learned.append(encoding)
1428+ return result
1429+ except UnicodeError:
1430+ pass
1431+ except LookupError:
1432+ output.Warn('Unknown encoding: %s' % encoding)
1433+
1434+ # Try the user preference
1435+ if self._user:
1436+ try:
1437+ return text.encode(self._user)
1438+ except UnicodeError:
1439+ pass
1440+ except LookupError:
1441+ temp = self._user
1442+ self._user = None
1443+ output.Warn('Unknown default_encoding: %s' % temp)
1444+
1445+ # Look through learned defaults, knock any failing ones out of the list
1446+ while self._learned:
1447+ try:
1448+ return text.encode(self._learned[0])
1449+ except:
1450+ del self._learned[0]
1451+
1452+ # When all other defaults are exhausted, use UTF-8
1453+ try:
1454+ return text.encode(ENC_UTF8)
1455+ except UnicodeError:
1456+ pass
1457+
1458+ # Something is seriously wrong if we get to here
1459+ return text.encode(ENC_ASCII, 'ignore')
1460+ #end def NarrowText
1461+
1462+ def MaybeNarrowPath(self, text):
1463+ """ Paths may be allowed to stay wide """
1464+ if self._widefiles:
1465+ return text
1466+ return self.NarrowText(text, None)
1467+ #end def MaybeNarrowPath
1468+
1469+ def WidenText(self, text, encoding):
1470+ """ Widen a piece of arbitrary text """
1471+ if type(text) != types.StringType:
1472+ return text
1473+
1474+ # Try the passed in preference
1475+ if encoding:
1476+ try:
1477+ result = unicode(text, encoding)
1478+ if not encoding in self._learned:
1479+ self._learned.append(encoding)
1480+ return result
1481+ except UnicodeError:
1482+ pass
1483+ except LookupError:
1484+ output.Warn('Unknown encoding: %s' % encoding)
1485+
1486+ # Try the user preference
1487+ if self._user:
1488+ try:
1489+ return unicode(text, self._user)
1490+ except UnicodeError:
1491+ pass
1492+ except LookupError:
1493+ temp = self._user
1494+ self._user = None
1495+ output.Warn('Unknown default_encoding: %s' % temp)
1496+
1497+ # Look through learned defaults, knock any failing ones out of the list
1498+ while self._learned:
1499+ try:
1500+ return unicode(text, self._learned[0])
1501+ except:
1502+ del self._learned[0]
1503+
1504+ # When all other defaults are exhausted, use UTF-8
1505+ try:
1506+ return unicode(text, ENC_UTF8)
1507+ except UnicodeError:
1508+ pass
1509+
1510+ # Getting here means it wasn't UTF-8 and we had no working default.
1511+ # We really don't have anything "right" we can do anymore.
1512+ output.Warn('Unrecognized encoding in text: %s' % text)
1513+ if not self._user:
1514+ output.Warn('You may need to set a default_encoding in your '
1515+ 'configuration file.')
1516+ return text.decode(ENC_ASCII, 'ignore')
1517+ #end def WidenText
1518+#end class Encoder
1519+encoder = Encoder()
1520+
1521+
1522+class Output:
1523+ """
1524+ Exposes logging functionality, and tracks how many errors
1525+ we have thus output.
1526+
1527+ Logging levels should be used as thus:
1528+ Fatal -- extremely sparingly
1529+ Error -- config errors, entire blocks of user 'intention' lost
1530+ Warn -- individual URLs lost
1531+ Log(,0) -- Un-suppressable text that's not an error
1532+ Log(,1) -- touched files, major actions
1533+ Log(,2) -- parsing notes, filtered or duplicated URLs
1534+ Log(,3) -- each accepted URL
1535+ """
1536+
1537+ def __init__(self):
1538+ self.num_errors = 0 # Count of errors
1539+ self.num_warns = 0 # Count of warnings
1540+
1541+ self._errors_shown = {} # Shown errors
1542+ self._warns_shown = {} # Shown warnings
1543+ self._verbose = 0 # Level of verbosity
1544+ #end def __init__
1545+
1546+ def Log(self, text, level):
1547+ """ Output a blurb of diagnostic text, if the verbose level allows it """
1548+ if text:
1549+ text = encoder.NarrowText(text, None)
1550+ if self._verbose >= level:
1551+ print text
1552+ #end def Log
1553+
1554+ def Warn(self, text):
1555+ """ Output and count a warning. Suppress duplicate warnings. """
1556+ if text:
1557+ text = encoder.NarrowText(text, None)
1558+ hash = md5.new(text).digest()
1559+ if not self._warns_shown.has_key(hash):
1560+ self._warns_shown[hash] = 1
1561+ print '[WARNING] ' + text
1562+ else:
1563+ self.Log('(suppressed) [WARNING] ' + text, 3)
1564+ self.num_warns = self.num_warns + 1
1565+ #end def Warn
1566+
1567+ def Error(self, text):
1568+ """ Output and count an error. Suppress duplicate errors. """
1569+ if text:
1570+ text = encoder.NarrowText(text, None)
1571+ hash = md5.new(text).digest()
1572+ if not self._errors_shown.has_key(hash):
1573+ self._errors_shown[hash] = 1
1574+ print '[ERROR] ' + text
1575+ else:
1576+ self.Log('(suppressed) [ERROR] ' + text, 3)
1577+ self.num_errors = self.num_errors + 1
1578+ #end def Error
1579+
1580+ def Fatal(self, text):
1581+ """ Output an error and terminate the program. """
1582+ if text:
1583+ text = encoder.NarrowText(text, None)
1584+ print '[FATAL] ' + text
1585+ else:
1586+ print 'Fatal error.'
1587+ sys.exit(1)
1588+ #end def Fatal
1589+
1590+ def SetVerbose(self, level):
1591+ """ Sets the verbose level. """
1592+ try:
1593+ if type(level) != types.IntType:
1594+ level = int(level)
1595+ if (level >= 0) and (level <= 3):
1596+ self._verbose = level
1597+ return
1598+ except ValueError:
1599+ pass
1600+ self.Error('Verbose level (%s) must be between 0 and 3 inclusive.' % level)
1601+ #end def SetVerbose
1602+#end class Output
1603+output = Output()
1604+
1605+
1606+class URL(object):
1607+ """ URL is a smart structure grouping together the properties we
1608+ care about for a single web reference. """
1609+ __slots__ = 'loc', 'lastmod', 'changefreq', 'priority'
1610+
1611+ def __init__(self):
1612+ self.loc = None # URL -- in Narrow characters
1613+ self.lastmod = None # ISO8601 timestamp of last modify
1614+ self.changefreq = None # Text term for update frequency
1615+ self.priority = None # Float between 0 and 1 (inc)
1616+ #end def __init__
1617+
1618+ def __cmp__(self, other):
1619+ if self.loc < other.loc:
1620+ return -1
1621+ if self.loc > other.loc:
1622+ return 1
1623+ return 0
1624+ #end def __cmp__
1625+
1626+ def TrySetAttribute(self, attribute, value):
1627+ """ Attempt to set the attribute to the value, with a pretty try
1628+ block around it. """
1629+ if attribute == 'loc':
1630+ self.loc = self.Canonicalize(value)
1631+ else:
1632+ try:
1633+ setattr(self, attribute, value)
1634+ except AttributeError:
1635+ output.Warn('Unknown URL attribute: %s' % attribute)
1636+ #end def TrySetAttribute
1637+
1638+ def IsAbsolute(loc):
1639+ """ Decide if the URL is absolute or not """
1640+ if not loc:
1641+ return False
1642+ narrow = encoder.NarrowText(loc, None)
1643+ (scheme, netloc, path, query, frag) = urlparse.urlsplit(narrow)
1644+ if (not scheme) or (not netloc):
1645+ return False
1646+ return True
1647+ #end def IsAbsolute
1648+ IsAbsolute = staticmethod(IsAbsolute)
1649+
1650+ def Canonicalize(loc):
1651+ """ Do encoding and canonicalization on a URL string """
1652+ if not loc:
1653+ return loc
1654+
1655+ # Let the encoder try to narrow it
1656+ narrow = encoder.NarrowText(loc, None)
1657+
1658+ # Escape components individually
1659+ (scheme, netloc, path, query, frag) = urlparse.urlsplit(narrow)
1660+ unr = '-._~'
1661+ sub = '!$&\'()*+,;='
1662+ netloc = urllib.quote(netloc, unr + sub + '%:@/[]')
1663+ path = urllib.quote(path, unr + sub + '%:@/')
1664+ query = urllib.quote(query, unr + sub + '%:@/?')
1665+ frag = urllib.quote(frag, unr + sub + '%:@/?')
1666+
1667+ # Try built-in IDNA encoding on the netloc
1668+ try:
1669+ (ignore, widenetloc, ignore, ignore, ignore) = urlparse.urlsplit(loc)
1670+ for c in widenetloc:
1671+ if c >= unichr(128):
1672+ netloc = widenetloc.encode(ENC_IDNA)
1673+ netloc = urllib.quote(netloc, unr + sub + '%:@/[]')
1674+ break
1675+ except UnicodeError:
1676+ # urlsplit must have failed, based on implementation differences in the
1677+ # library. There is not much we can do here, except ignore it.
1678+ pass
1679+ except LookupError:
1680+ output.Warn('An International Domain Name (IDN) is being used, but this '
1681+ 'version of Python does not have support for IDNA encoding. '
1682+ ' (IDNA support was introduced in Python 2.3) The encoding '
1683+ 'we have used instead is wrong and will probably not yield '
1684+ 'valid URLs.')
1685+ bad_netloc = False
1686+ if '%' in netloc:
1687+ bad_netloc = True
1688+
1689+ # Put it all back together
1690+ narrow = urlparse.urlunsplit((scheme, netloc, path, query, frag))
1691+
1692+ # I let '%' through. Fix any that aren't pre-existing escapes.
1693+ HEXDIG = '0123456789abcdefABCDEF'
1694+ list = narrow.split('%')
1695+ narrow = list[0]
1696+ del list[0]
1697+ for item in list:
1698+ if (len(item) >= 2) and (item[0] in HEXDIG) and (item[1] in HEXDIG):
1699+ narrow = narrow + '%' + item
1700+ else:
1701+ narrow = narrow + '%25' + item
1702+
1703+ # Issue a warning if this is a bad URL
1704+ if bad_netloc:
1705+ output.Warn('Invalid characters in the host or domain portion of a URL: '
1706+ + narrow)
1707+
1708+ return narrow
1709+ #end def Canonicalize
1710+ Canonicalize = staticmethod(Canonicalize)
1711+
1712+ def VerifyDate(self, date, metatag):
1713+ """Verify the date format is valid"""
1714+ match = False
1715+ if date:
1716+ date = date.upper()
1717+ for pattern in DATE_PATTERNS:
1718+ match = pattern.match(date)
1719+ if match:
1720+ return True
1721+ if not match:
1722+ output.Warn('The value for %s does not appear to be in ISO8601 '
1723+ 'format on URL: %s' % (metatag, self.loc))
1724+ return False
1725+ #end of VerifyDate
1726+
1727+ def Validate(self, base_url, allow_fragment):
1728+ """ Verify the data in this URL is well-formed, and override if not. """
1729+ assert type(base_url) == types.StringType
1730+
1731+ # Test (and normalize) the ref
1732+ if not self.loc:
1733+ output.Warn('Empty URL')
1734+ return False
1735+ if allow_fragment:
1736+ self.loc = urlparse.urljoin(base_url, self.loc)
1737+ if not self.loc.startswith(base_url):
1738+ output.Warn('Discarded URL for not starting with the base_url: %s' %
1739+ self.loc)
1740+ self.loc = None
1741+ return False
1742+
1743+ # Test the lastmod
1744+ if self.lastmod:
1745+ if not self.VerifyDate(self.lastmod, "lastmod"):
1746+ self.lastmod = None
1747+
1748+ # Test the changefreq
1749+ if self.changefreq:
1750+ match = False
1751+ self.changefreq = self.changefreq.lower()
1752+ for pattern in CHANGEFREQ_PATTERNS:
1753+ if self.changefreq == pattern:
1754+ match = True
1755+ break
1756+ if not match:
1757+ output.Warn('Changefreq "%s" is not a valid change frequency on URL '
1758+ ': %s' % (self.changefreq, self.loc))
1759+ self.changefreq = None
1760+
1761+ # Test the priority
1762+ if self.priority:
1763+ priority = -1.0
1764+ try:
1765+ priority = float(self.priority)
1766+ except ValueError:
1767+ pass
1768+ if (priority < 0.0) or (priority > 1.0):
1769+ output.Warn('Priority "%s" is not a number between 0 and 1 inclusive '
1770+ 'on URL: %s' % (self.priority, self.loc))
1771+ self.priority = None
1772+
1773+ return True
1774+ #end def Validate
1775+
1776+ def MakeHash(self):
1777+ """ Provides a uniform way of hashing URLs """
1778+ if not self.loc:
1779+ return None
1780+ if self.loc.endswith('/'):
1781+ return md5.new(self.loc[:-1]).digest()
1782+ return md5.new(self.loc).digest()
1783+ #end def MakeHash
1784+
1785+ def Log(self, prefix='URL', level=3):
1786+ """ Dump the contents, empty or not, to the log. """
1787+ out = prefix + ':'
1788+
1789+ for attribute in self.__slots__:
1790+ value = getattr(self, attribute)
1791+ if not value:
1792+ value = ''
1793+ out = out + (' %s=[%s]' % (attribute, value))
1794+
1795+ output.Log('%s' % encoder.NarrowText(out, None), level)
1796+ #end def Log
1797+
1798+ def WriteXML(self, file):
1799+ """ Dump non-empty contents to the output file, in XML format. """
1800+ if not self.loc:
1801+ return
1802+ out = SITEURL_XML_PREFIX
1803+
1804+ for attribute in self.__slots__:
1805+ value = getattr(self, attribute)
1806+ if value:
1807+ if type(value) == types.UnicodeType:
1808+ value = encoder.NarrowText(value, None)
1809+ elif type(value) != types.StringType:
1810+ value = str(value)
1811+ value = xml.sax.saxutils.escape(value)
1812+ out = out + (' <%s>%s</%s>\n' % (attribute, value, attribute))
1813+
1814+ out = out + SITEURL_XML_SUFFIX
1815+ file.write(out)
1816+ #end def WriteXML
1817+#end class URL
1818+
1819+class NewsURL(URL):
1820+ """ NewsURL is a subclass of URL with News-Sitemap specific properties. """
1821+ __slots__ = 'loc', 'lastmod', 'changefreq', 'priority', 'publication_date', \
1822+ 'keywords', 'stock_tickers'
1823+
1824+ def __init__(self):
1825+ URL.__init__(self)
1826+ self.publication_date = None # ISO8601 timestamp of publication date
1827+ self.keywords = None # Text keywords
1828+ self.stock_tickers = None # Text stock
1829+ #end def __init__
1830+
1831+ def Validate(self, base_url, allow_fragment):
1832+ """ Verify the data in this News URL is well-formed, and override if not. """
1833+ assert type(base_url) == types.StringType
1834+
1835+ if not URL.Validate(self, base_url, allow_fragment):
1836+ return False
1837+
1838+ if not URL.VerifyDate(self, self.publication_date, "publication_date"):
1839+ self.publication_date = None
1840+
1841+ return True
1842+ #end def Validate
1843+
1844+ def WriteXML(self, file):
1845+ """ Dump non-empty contents to the output file, in XML format. """
1846+ if not self.loc:
1847+ return
1848+ out = SITEURL_XML_PREFIX
1849+
1850+ # printed_news_tag indicates if news-specific metatags are present
1851+ printed_news_tag = False
1852+ for attribute in self.__slots__:
1853+ value = getattr(self, attribute)
1854+ if value:
1855+ if type(value) == types.UnicodeType:
1856+ value = encoder.NarrowText(value, None)
1857+ elif type(value) != types.StringType:
1858+ value = str(value)
1859+ value = xml.sax.saxutils.escape(value)
1860+ if attribute in NEWS_SPECIFIC_TAGS:
1861+ if not printed_news_tag:
1862+ printed_news_tag = True
1863+ out = out + NEWS_TAG_XML_PREFIX
1864+ out = out + (' <news:%s>%s</news:%s>\n' % (attribute, value, attribute))
1865+ else:
1866+ out = out + (' <%s>%s</%s>\n' % (attribute, value, attribute))
1867+
1868+ if printed_news_tag:
1869+ out = out + NEWS_TAG_XML_SUFFIX
1870+ out = out + SITEURL_XML_SUFFIX
1871+ file.write(out)
1872+ #end def WriteXML
1873+#end class NewsURL
1874+
1875+
1876+class Filter:
1877+ """
1878+ A filter on the stream of URLs we find. A filter is, in essence,
1879+ a wildcard applied to the stream. You can think of this as an
1880+ operator that returns a tri-state when given a URL:
1881+
1882+ True -- this URL is to be included in the sitemap
1883+ None -- this URL is undecided
1884+ False -- this URL is to be dropped from the sitemap
1885+ """
1886+
1887+ def __init__(self, attributes):
1888+ self._wildcard = None # Pattern for wildcard match
1889+ self._regexp = None # Pattern for regexp match
1890+ self._pass = False # "Drop" filter vs. "Pass" filter
1891+
1892+ if not ValidateAttributes('FILTER', attributes,
1893+ ('pattern', 'type', 'action')):
1894+ return
1895+
1896+ # Check error count on the way in
1897+ num_errors = output.num_errors
1898+
1899+ # Fetch the attributes
1900+ pattern = attributes.get('pattern')
1901+ type = attributes.get('type', 'wildcard')
1902+ action = attributes.get('action', 'drop')
1903+ if type:
1904+ type = type.lower()
1905+ if action:
1906+ action = action.lower()
1907+
1908+ # Verify the attributes
1909+ if not pattern:
1910+ output.Error('On a filter you must specify a "pattern" to match')
1911+ elif (not type) or ((type != 'wildcard') and (type != 'regexp')):
1912+ output.Error('On a filter you must specify either \'type="wildcard"\' '
1913+ 'or \'type="regexp"\'')
1914+ elif (action != 'pass') and (action != 'drop'):
1915+ output.Error('If you specify a filter action, it must be either '
1916+ '\'action="pass"\' or \'action="drop"\'')
1917+
1918+ # Set the rule
1919+ if action == 'drop':
1920+ self._pass = False
1921+ elif action == 'pass':
1922+ self._pass = True
1923+
1924+ if type == 'wildcard':
1925+ self._wildcard = pattern
1926+ elif type == 'regexp':
1927+ try:
1928+ self._regexp = re.compile(pattern)
1929+ except re.error:
1930+ output.Error('Bad regular expression: %s' % pattern)
1931+
1932+ # Log the final results iff we didn't add any errors
1933+ if num_errors == output.num_errors:
1934+ output.Log('Filter: %s any URL that matches %s "%s"' %
1935+ (action, type, pattern), 2)
1936+ #end def __init__
1937+
1938+ def Apply(self, url):
1939+ """ Process the URL, as above. """
1940+ if (not url) or (not url.loc):
1941+ return None
1942+
1943+ if self._wildcard:
1944+ if fnmatch.fnmatchcase(url.loc, self._wildcard):
1945+ return self._pass
1946+ return None
1947+
1948+ if self._regexp:
1949+ if self._regexp.search(url.loc):
1950+ return self._pass
1951+ return None
1952+
1953+ assert False # unreachable
1954+ #end def Apply
1955+#end class Filter
1956+
1957+
1958+class InputURL:
1959+ """
1960+ Each Input class knows how to yield a set of URLs from a data source.
1961+
1962+ This one handles a single URL, manually specified in the config file.
1963+ """
1964+
1965+ def __init__(self, attributes):
1966+ self._url = None # The lonely URL
1967+
1968+ if not ValidateAttributes('URL', attributes,
1969+ ('href', 'lastmod', 'changefreq', 'priority')):
1970+ return
1971+
1972+ url = URL()
1973+ for attr in attributes.keys():
1974+ if attr == 'href':
1975+ url.TrySetAttribute('loc', attributes[attr])
1976+ else:
1977+ url.TrySetAttribute(attr, attributes[attr])
1978+
1979+ if not url.loc:
1980+ output.Error('Url entries must have an href attribute.')
1981+ return
1982+
1983+ self._url = url
1984+ output.Log('Input: From URL "%s"' % self._url.loc, 2)
1985+ #end def __init__
1986+
1987+ def ProduceURLs(self, consumer):
1988+ """ Produces URLs from our data source, hands them in to the consumer. """
1989+ if self._url:
1990+ consumer(self._url, True)
1991+ #end def ProduceURLs
1992+#end class InputURL
1993+
1994+
1995+class InputURLList:
1996+ """
1997+ Each Input class knows how to yield a set of URLs from a data source.
1998+
1999+ This one handles a text file with a list of URLs
2000+ """
2001+
2002+ def __init__(self, attributes):
2003+ self._path = None # The file path
2004+ self._encoding = None # Encoding of that file
2005+
2006+ if not ValidateAttributes('URLLIST', attributes, ('path', 'encoding')):
2007+ return
2008+
2009+ self._path = attributes.get('path')
2010+ self._encoding = attributes.get('encoding', ENC_UTF8)
2011+ if self._path:
2012+ self._path = encoder.MaybeNarrowPath(self._path)
2013+ if os.path.isfile(self._path):
2014+ output.Log('Input: From URLLIST "%s"' % self._path, 2)
2015+ else:
2016+ output.Error('Can not locate file: %s' % self._path)
2017+ self._path = None
2018+ else:
2019+ output.Error('Urllist entries must have a "path" attribute.')
2020+ #end def __init__
2021+
2022+ def ProduceURLs(self, consumer):
2023+ """ Produces URLs from our data source, hands them in to the consumer. """
2024+
2025+ # Open the file
2026+ (frame, file) = OpenFileForRead(self._path, 'URLLIST')
2027+ if not file:
2028+ return
2029+
2030+ # Iterate lines
2031+ linenum = 0
2032+ for line in file.readlines():
2033+ linenum = linenum + 1
2034+
2035+ # Strip comments and empty lines
2036+ if self._encoding:
2037+ line = encoder.WidenText(line, self._encoding)
2038+ line = line.strip()
2039+ if (not line) or line[0] == '#':
2040+ continue
2041+
2042+ # Split the line on space
2043+ url = URL()
2044+ cols = line.split(' ')
2045+ for i in range(0,len(cols)):
2046+ cols[i] = cols[i].strip()
2047+ url.TrySetAttribute('loc', cols[0])
2048+
2049+ # Extract attributes from the other columns
2050+ for i in range(1,len(cols)):
2051+ if cols[i]:
2052+ try:
2053+ (attr_name, attr_val) = cols[i].split('=', 1)
2054+ url.TrySetAttribute(attr_name, attr_val)
2055+ except ValueError:
2056+ output.Warn('Line %d: Unable to parse attribute: %s' %
2057+ (linenum, cols[i]))
2058+
2059+ # Pass it on
2060+ consumer(url, False)
2061+
2062+ file.close()
2063+ if frame:
2064+ frame.close()
2065+ #end def ProduceURLs
2066+#end class InputURLList
2067+
2068+
2069+class InputNewsURLList:
2070+ """
2071+ Each Input class knows how to yield a set of URLs from a data source.
2072+
2073+ This one handles a text file with a list of News URLs and their metadata
2074+ """
2075+
2076+ def __init__(self, attributes):
2077+ self._path = None # The file path
2078+ self._encoding = None # Encoding of that file
2079+ self._tag_order = [] # Order of URL metadata
2080+
2081+ if not ValidateAttributes('URLLIST', attributes, ('path', 'encoding', \
2082+ 'tag_order')):
2083+ return
2084+
2085+ self._path = attributes.get('path')
2086+ self._encoding = attributes.get('encoding', ENC_UTF8)
2087+ self._tag_order = attributes.get('tag_order')
2088+
2089+ if self._path:
2090+ self._path = encoder.MaybeNarrowPath(self._path)
2091+ if os.path.isfile(self._path):
2092+ output.Log('Input: From URLLIST "%s"' % self._path, 2)
2093+ else:
2094+ output.Error('Can not locate file: %s' % self._path)
2095+ self._path = None
2096+ else:
2097+ output.Error('Urllist entries must have a "path" attribute.')
2098+
2099+ # parse tag_order into an array
2100+ # tag_order_ascii created for more readable logging
2101+ tag_order_ascii = []
2102+ if self._tag_order:
2103+ self._tag_order = self._tag_order.split(",")
2104+ for i in range(0, len(self._tag_order)):
2105+ element = self._tag_order[i].strip().lower()
2106+ self._tag_order[i]= element
2107+ tag_order_ascii.append(element.encode('ascii'))
2108+ output.Log('Input: From URLLIST tag order is "%s"' % tag_order_ascii, 0)
2109+ else:
2110+ output.Error('News Urllist configuration file must contain tag_order '
2111+ 'to define Sitemap metatags.')
2112+
2113+ # verify all tag_order inputs are valid
2114+ tag_order_dict = {}
2115+ for tag in self._tag_order:
2116+ tag_order_dict[tag] = ""
2117+ if not ValidateAttributes('URLLIST', tag_order_dict, \
2118+ NEWS_SITEMAP_TAGS):
2119+ return
2120+
2121+ # loc tag must be present
2122+ loc_tag = False
2123+ for tag in self._tag_order:
2124+ if tag == 'loc':
2125+ loc_tag = True
2126+ break
2127+ if not loc_tag:
2128+ output.Error('News Urllist tag_order in configuration file '
2129+ 'does not contain "loc" value: %s' % tag_order_ascii)
2130+ #end def __init__
2131+
2132+ def ProduceURLs(self, consumer):
2133+ """ Produces URLs from our data source, hands them in to the consumer. """
2134+
2135+ # Open the file
2136+ (frame, file) = OpenFileForRead(self._path, 'URLLIST')
2137+ if not file:
2138+ return
2139+
2140+ # Iterate lines
2141+ linenum = 0
2142+ for line in file.readlines():
2143+ linenum = linenum + 1
2144+
2145+ # Strip comments and empty lines
2146+ if self._encoding:
2147+ line = encoder.WidenText(line, self._encoding)
2148+ line = line.strip()
2149+ if (not line) or line[0] == '#':
2150+ continue
2151+
2152+ # Split the line on tabs
2153+ url = NewsURL()
2154+ cols = line.split('\t')
2155+ for i in range(0,len(cols)):
2156+ cols[i] = cols[i].strip()
2157+
2158+ for i in range(0,len(cols)):
2159+ if cols[i]:
2160+ attr_value = cols[i]
2161+ if i < len(self._tag_order):
2162+ attr_name = self._tag_order[i]
2163+ try:
2164+ url.TrySetAttribute(attr_name, attr_value)
2165+ except ValueError:
2166+ output.Warn('Line %d: Unable to parse attribute: %s' %
2167+ (linenum, cols[i]))
2168+
2169+ # Pass it on
2170+ consumer(url, False)
2171+
2172+ file.close()
2173+ if frame:
2174+ frame.close()
2175+ #end def ProduceURLs
2176+#end class InputNewsURLList
2177+
2178+
2179+class InputDirectory:
2180+ """
2181+ Each Input class knows how to yield a set of URLs from a data source.
2182+
2183+ This one handles a directory that acts as base for walking the filesystem.
2184+ """
2185+
2186+ def __init__(self, attributes, base_url):
2187+ self._path = None # The directory
2188+ self._url = None # The URL equivalent
2189+ self._default_file = None
2190+ self._remove_empty_directories = False
2191+
2192+ if not ValidateAttributes('DIRECTORY', attributes, ('path', 'url',
2193+ 'default_file', 'remove_empty_directories')):
2194+ return
2195+
2196+ # Prep the path -- it MUST end in a sep
2197+ path = attributes.get('path')
2198+ if not path:
2199+ output.Error('Directory entries must have both "path" and "url" '
2200+ 'attributes')
2201+ return
2202+ path = encoder.MaybeNarrowPath(path)
2203+ if not path.endswith(os.sep):
2204+ path = path + os.sep
2205+ if not os.path.isdir(path):
2206+ output.Error('Can not locate directory: %s' % path)
2207+ return
2208+
2209+ # Prep the URL -- it MUST end in a sep
2210+ url = attributes.get('url')
2211+ if not url:
2212+ output.Error('Directory entries must have both "path" and "url" '
2213+ 'attributes')
2214+ return
2215+ url = URL.Canonicalize(url)
2216+ if not url.endswith('/'):
2217+ url = url + '/'
2218+ if not url.startswith(base_url):
2219+ url = urlparse.urljoin(base_url, url)
2220+ if not url.startswith(base_url):
2221+ output.Error('The directory URL "%s" is not relative to the '
2222+ 'base_url: %s' % (url, base_url))
2223+ return
2224+
2225+ # Prep the default file -- it MUST be just a filename
2226+ file = attributes.get('default_file')
2227+ if file:
2228+ file = encoder.MaybeNarrowPath(file)
2229+ if os.sep in file:
2230+ output.Error('The default_file "%s" can not include path information.'
2231+ % file)
2232+ file = None
2233+
2234+ # Prep the remove_empty_directories -- default is false
2235+ remove_empty_directories = attributes.get('remove_empty_directories')
2236+ if remove_empty_directories:
2237+ if (remove_empty_directories == '1') or \
2238+ (remove_empty_directories.lower() == 'true'):
2239+ remove_empty_directories = True
2240+ elif (remove_empty_directories == '0') or \
2241+ (remove_empty_directories.lower() == 'false'):
2242+ remove_empty_directories = False
2243+ # otherwise the user set a non-default value
2244+ else:
2245+ output.Error('Configuration file remove_empty_directories '
2246+ 'value is not recognized. Value must be true or false.')
2247+ return
2248+ else:
2249+ remove_empty_directories = False
2250+
2251+ self._path = path
2252+ self._url = url
2253+ self._default_file = file
2254+ self._remove_empty_directories = remove_empty_directories
2255+
2256+ if file:
2257+ output.Log('Input: From DIRECTORY "%s" (%s) with default file "%s"'
2258+ % (path, url, file), 2)
2259+ else:
2260+ output.Log('Input: From DIRECTORY "%s" (%s) with no default file'
2261+ % (path, url), 2)
2262+ #end def __init__
2263+
2264+
2265+ def ProduceURLs(self, consumer):
2266+ """ Produces URLs from our data source, hands them in to the consumer. """
2267+ if not self._path:
2268+ return
2269+
2270+ root_path = self._path
2271+ root_URL = self._url
2272+ root_file = self._default_file
2273+ remove_empty_directories = self._remove_empty_directories
2274+
2275+ def HasReadPermissions(path):
2276+ """ Verifies a given path has read permissions. """
2277+ stat_info = os.stat(path)
2278+ mode = stat_info[stat.ST_MODE]
2279+ if mode & stat.S_IREAD:
2280+ return True
2281+ else:
2282+ return None
2283+
2284+ def PerFile(dirpath, name):
2285+ """
2286+ Called once per file.
2287+ Note that 'name' will occasionally be None -- for a directory itself
2288+ """
2289+ # Pull a timestamp
2290+ url = URL()
2291+ isdir = False
2292+ try:
2293+ if name:
2294+ path = os.path.join(dirpath, name)
2295+ else:
2296+ path = dirpath
2297+ isdir = os.path.isdir(path)
2298+ time = None
2299+ if isdir and root_file:
2300+ file = os.path.join(path, root_file)
2301+ try:
2302+ time = os.stat(file)[stat.ST_MTIME];
2303+ except OSError:
2304+ pass
2305+ if not time:
2306+ time = os.stat(path)[stat.ST_MTIME];
2307+ url.lastmod = TimestampISO8601(time)
2308+ except OSError:
2309+ pass
2310+ except ValueError:
2311+ pass
2312+
2313+ # Build a URL
2314+ middle = dirpath[len(root_path):]
2315+ if os.sep != '/':
2316+ middle = middle.replace(os.sep, '/')
2317+ if middle:
2318+ middle = middle + '/'
2319+ if name:
2320+ middle = middle + name
2321+ if isdir:
2322+ middle = middle + '/'
2323+ url.TrySetAttribute('loc', root_URL + encoder.WidenText(middle, None))
2324+
2325+ # Suppress default files. (All the way down here so we can log it.)
2326+ if name and (root_file == name):
2327+ url.Log(prefix='IGNORED (default file)', level=2)
2328+ return
2329+
2330+ # Suppress directories when remove_empty_directories="true"
2331+ try:
2332+ if isdir:
2333+ if HasReadPermissions(path):
2334+ if remove_empty_directories == 'true' and \
2335+ len(os.listdir(path)) == 0:
2336+ output.Log('IGNORED empty directory %s' % str(path), level=1)
2337+ return
2338+ elif path == self._path:
2339+ output.Error('IGNORED configuration file directory input %s due '
2340+ 'to file permissions' % self._path)
2341+ else:
2342+ output.Log('IGNORED files within directory %s due to file '
2343+ 'permissions' % str(path), level=0)
2344+ except OSError:
2345+ pass
2346+ except ValueError:
2347+ pass
2348+
2349+ consumer(url, False)
2350+ #end def PerFile
2351+
2352+ def PerDirectory(ignore, dirpath, namelist):
2353+ """
2354+ Called once per directory with a list of all the contained files/dirs.
2355+ """
2356+ ignore = ignore # Avoid warnings of an unused parameter
2357+
2358+ if not dirpath.startswith(root_path):
2359+ output.Warn('Unable to decide what the root path is for directory: '
2360+ '%s' % dirpath)
2361+ return
2362+
2363+ for name in namelist:
2364+ PerFile(dirpath, name)
2365+ #end def PerDirectory
2366+
2367+ output.Log('Walking DIRECTORY "%s"' % self._path, 1)
2368+ PerFile(self._path, None)
2369+ os.path.walk(self._path, PerDirectory, None)
2370+ #end def ProduceURLs
2371+#end class InputDirectory
2372+
2373+
2374+class InputAccessLog:
2375+ """
2376+ Each Input class knows how to yield a set of URLs from a data source.
2377+
2378+ This one handles access logs. It's non-trivial in that we want to
2379+ auto-detect log files in the Common Logfile Format (as used by Apache,
2380+ for instance) and the Extended Log File Format (as used by IIS, for
2381+ instance).
2382+ """
2383+
2384+ def __init__(self, attributes):
2385+ self._path = None # The file path
2386+ self._encoding = None # Encoding of that file
2387+ self._is_elf = False # Extended Log File Format?
2388+ self._is_clf = False # Common Logfile Format?
2389+ self._elf_status = -1 # ELF field: '200'
2390+ self._elf_method = -1 # ELF field: 'HEAD'
2391+ self._elf_uri = -1 # ELF field: '/foo?bar=1'
2392+ self._elf_urifrag1 = -1 # ELF field: '/foo'
2393+ self._elf_urifrag2 = -1 # ELF field: 'bar=1'
2394+
2395+ if not ValidateAttributes('ACCESSLOG', attributes, ('path', 'encoding')):
2396+ return
2397+
2398+ self._path = attributes.get('path')
2399+ self._encoding = attributes.get('encoding', ENC_UTF8)
2400+ if self._path:
2401+ self._path = encoder.MaybeNarrowPath(self._path)
2402+ if os.path.isfile(self._path):
2403+ output.Log('Input: From ACCESSLOG "%s"' % self._path, 2)
2404+ else:
2405+ output.Error('Can not locate file: %s' % self._path)
2406+ self._path = None
2407+ else:
2408+ output.Error('Accesslog entries must have a "path" attribute.')
2409+ #end def __init__
2410+
2411+ def RecognizeELFLine(self, line):
2412+ """ Recognize the Fields directive that heads an ELF file """
2413+ if not line.startswith('#Fields:'):
2414+ return False
2415+ fields = line.split(' ')
2416+ del fields[0]
2417+ for i in range(0, len(fields)):
2418+ field = fields[i].strip()
2419+ if field == 'sc-status':
2420+ self._elf_status = i
2421+ elif field == 'cs-method':
2422+ self._elf_method = i
2423+ elif field == 'cs-uri':
2424+ self._elf_uri = i
2425+ elif field == 'cs-uri-stem':
2426+ self._elf_urifrag1 = i
2427+ elif field == 'cs-uri-query':
2428+ self._elf_urifrag2 = i
2429+ output.Log('Recognized an Extended Log File Format file.', 2)
2430+ return True
2431+ #end def RecognizeELFLine
2432+
2433+ def GetELFLine(self, line):
2434+ """ Fetch the requested URL from an ELF line """
2435+ fields = line.split(' ')
2436+ count = len(fields)
2437+
2438+ # Verify status was Ok
2439+ if self._elf_status >= 0:
2440+ if self._elf_status >= count:
2441+ return None
2442+ if not fields[self._elf_status].strip() == '200':
2443+ return None
2444+
2445+ # Verify method was HEAD or GET
2446+ if self._elf_method >= 0:
2447+ if self._elf_method >= count:
2448+ return None
2449+ if not fields[self._elf_method].strip() in ('HEAD', 'GET'):
2450+ return None
2451+
2452+ # Pull the full URL if we can
2453+ if self._elf_uri >= 0:
2454+ if self._elf_uri >= count:
2455+ return None
2456+ url = fields[self._elf_uri].strip()
2457+ if url != '-':
2458+ return url
2459+
2460+ # Put together a fragmentary URL
2461+ if self._elf_urifrag1 >= 0:
2462+ if self._elf_urifrag1 >= count or self._elf_urifrag2 >= count:
2463+ return None
2464+ urlfrag1 = fields[self._elf_urifrag1].strip()
2465+ urlfrag2 = None
2466+ if self._elf_urifrag2 >= 0:
2467+ urlfrag2 = fields[self._elf_urifrag2]
2468+ if urlfrag1 and (urlfrag1 != '-'):
2469+ if urlfrag2 and (urlfrag2 != '-'):
2470+ urlfrag1 = urlfrag1 + '?' + urlfrag2
2471+ return urlfrag1
2472+
2473+ return None
2474+ #end def GetELFLine
2475+
2476+ def RecognizeCLFLine(self, line):
2477+ """ Try to tokenize a logfile line according to CLF pattern and see if
2478+ it works. """
2479+ match = ACCESSLOG_CLF_PATTERN.match(line)
2480+ recognize = match and (match.group(1) in ('HEAD', 'GET'))
2481+ if recognize:
2482+ output.Log('Recognized a Common Logfile Format file.', 2)
2483+ return recognize
2484+ #end def RecognizeCLFLine
2485+
2486+ def GetCLFLine(self, line):
2487+ """ Fetch the requested URL from a CLF line """
2488+ match = ACCESSLOG_CLF_PATTERN.match(line)
2489+ if match:
2490+ request = match.group(1)
2491+ if request in ('HEAD', 'GET'):
2492+ return match.group(2)
2493+ return None
2494+ #end def GetCLFLine
2495+
2496+ def ProduceURLs(self, consumer):
2497+ """ Produces URLs from our data source, hands them in to the consumer. """
2498+
2499+ # Open the file
2500+ (frame, file) = OpenFileForRead(self._path, 'ACCESSLOG')
2501+ if not file:
2502+ return
2503+
2504+ # Iterate lines
2505+ for line in file.readlines():
2506+ if self._encoding:
2507+ line = encoder.WidenText(line, self._encoding)
2508+ line = line.strip()
2509+
2510+ # If we don't know the format yet, try them both
2511+ if (not self._is_clf) and (not self._is_elf):
2512+ self._is_elf = self.RecognizeELFLine(line)
2513+ self._is_clf = self.RecognizeCLFLine(line)
2514+
2515+ # Digest the line
2516+ match = None
2517+ if self._is_elf:
2518+ match = self.GetELFLine(line)
2519+ elif self._is_clf:
2520+ match = self.GetCLFLine(line)
2521+ if not match:
2522+ continue
2523+
2524+ # Pass it on
2525+ url = URL()
2526+ url.TrySetAttribute('loc', match)
2527+ consumer(url, True)
2528+
2529+ file.close()
2530+ if frame:
2531+ frame.close()
2532+ #end def ProduceURLs
2533+#end class InputAccessLog
2534+
2535+
2536+class FilePathGenerator:
2537+ """
2538+ This class generates filenames in a series, upon request.
2539+ You can request any iteration number at any time, you don't
2540+ have to go in order.
2541+
2542+ Example of iterations for '/path/foo.xml.gz':
2543+ 0 --> /path/foo.xml.gz
2544+ 1 --> /path/foo1.xml.gz
2545+ 2 --> /path/foo2.xml.gz
2546+ _index.xml --> /path/foo_index.xml
2547+ """
2548+
2549+ def __init__(self):
2550+ self.is_gzip = False # Is this a GZIP file?
2551+
2552+ self._path = None # '/path/'
2553+ self._prefix = None # 'foo'
2554+ self._suffix = None # '.xml.gz'
2555+ #end def __init__
2556+
2557+ def Preload(self, path):
2558+ """ Splits up a path into forms ready for recombination. """
2559+ path = encoder.MaybeNarrowPath(path)
2560+
2561+ # Get down to a base name
2562+ path = os.path.normpath(path)
2563+ base = os.path.basename(path).lower()
2564+ if not base:
2565+ output.Error('Couldn\'t parse the file path: %s' % path)
2566+ return False
2567+ lenbase = len(base)
2568+
2569+ # Recognize extension
2570+ lensuffix = 0
2571+ compare_suffix = ['.xml', '.xml.gz', '.gz']
2572+ for suffix in compare_suffix:
2573+ if base.endswith(suffix):
2574+ lensuffix = len(suffix)
2575+ break
2576+ if not lensuffix:
2577+ output.Error('The path "%s" doesn\'t end in a supported file '
2578+ 'extension.' % path)
2579+ return False
2580+ self.is_gzip = suffix.endswith('.gz')
2581+
2582+ # Split the original path
2583+ lenpath = len(path)
2584+ self._path = path[:lenpath-lenbase]
2585+ self._prefix = path[lenpath-lenbase:lenpath-lensuffix]
2586+ self._suffix = path[lenpath-lensuffix:]
2587+
2588+ return True
2589+ #end def Preload
2590+
2591+ def GeneratePath(self, instance):
2592+ """ Generates the iterations, as described above. """
2593+ prefix = self._path + self._prefix
2594+ if type(instance) == types.IntType:
2595+ if instance:
2596+ return '%s%d%s' % (prefix, instance, self._suffix)
2597+ return prefix + self._suffix
2598+ return prefix + instance
2599+ #end def GeneratePath
2600+
2601+ def GenerateURL(self, instance, root_url):
2602+ """ Generates iterations, but as a URL instead of a path. """
2603+ prefix = root_url + self._prefix
2604+ retval = None
2605+ if type(instance) == types.IntType:
2606+ if instance:
2607+ retval = '%s%d%s' % (prefix, instance, self._suffix)
2608+ else:
2609+ retval = prefix + self._suffix
2610+ else:
2611+ retval = prefix + instance
2612+ return URL.Canonicalize(retval)
2613+ #end def GenerateURL
2614+
2615+ def GenerateWildURL(self, root_url):
2616+ """ Generates a wildcard that should match all our iterations """
2617+ prefix = URL.Canonicalize(root_url + self._prefix)
2618+ temp = URL.Canonicalize(prefix + self._suffix)
2619+ suffix = temp[len(prefix):]
2620+ return prefix + '*' + suffix
2621+ #end def GenerateURL
2622+#end class FilePathGenerator
2623+
2624+
2625+class PerURLStatistics:
2626+ """ Keep track of some simple per-URL statistics, like file extension. """
2627+
2628+ def __init__(self):
2629+ self._extensions = {} # Count of extension instances
2630+ #end def __init__
2631+
2632+ def Consume(self, url):
2633+ """ Log some stats for the URL. At the moment, that means extension. """
2634+ if url and url.loc:
2635+ (scheme, netloc, path, query, frag) = urlparse.urlsplit(url.loc)
2636+ if not path:
2637+ return
2638+
2639+ # Recognize directories
2640+ if path.endswith('/'):
2641+ if self._extensions.has_key('/'):
2642+ self._extensions['/'] = self._extensions['/'] + 1
2643+ else:
2644+ self._extensions['/'] = 1
2645+ return
2646+
2647+ # Strip to a filename
2648+ i = path.rfind('/')
2649+ if i >= 0:
2650+ assert i < len(path)
2651+ path = path[i:]
2652+
2653+ # Find extension
2654+ i = path.rfind('.')
2655+ if i > 0:
2656+ assert i < len(path)
2657+ ext = path[i:].lower()
2658+ if self._extensions.has_key(ext):
2659+ self._extensions[ext] = self._extensions[ext] + 1
2660+ else:
2661+ self._extensions[ext] = 1
2662+ else:
2663+ if self._extensions.has_key('(no extension)'):
2664+ self._extensions['(no extension)'] = self._extensions[
2665+ '(no extension)'] + 1
2666+ else:
2667+ self._extensions['(no extension)'] = 1
2668+ #end def Consume
2669+
2670+ def Log(self):
2671+ """ Dump out stats to the output. """
2672+ if len(self._extensions):
2673+ output.Log('Count of file extensions on URLs:', 1)
2674+ set = self._extensions.keys()
2675+ set.sort()
2676+ for ext in set:
2677+ output.Log(' %7d %s' % (self._extensions[ext], ext), 1)
2678+ #end def Log
2679+
2680+class Sitemap(xml.sax.handler.ContentHandler):
2681+ """
2682+ This is the big workhorse class that processes your inputs and spits
2683+ out sitemap files. It is built as a SAX handler for set up purposes.
2684+ That is, it processes an XML stream to bring itself up.
2685+ """
2686+
2687+ def __init__(self, suppress_notify):
2688+ xml.sax.handler.ContentHandler.__init__(self)
2689+ self._filters = [] # Filter objects
2690+ self._inputs = [] # Input objects
2691+ self._urls = {} # Maps URLs to count of dups
2692+ self._set = [] # Current set of URLs
2693+ self._filegen = None # Path generator for output files
2694+ self._wildurl1 = None # Sitemap URLs to filter out
2695+ self._wildurl2 = None # Sitemap URLs to filter out
2696+ self._sitemaps = 0 # Number of output files
2697+ # We init _dup_max to 2 so the default priority is 0.5 instead of 1.0
2698+ self._dup_max = 2 # Max number of duplicate URLs
2699+ self._stat = PerURLStatistics() # Some simple stats
2700+ self._in_site = False # SAX: are we in a Site node?
2701+ self._in_Site_ever = False # SAX: were we ever in a Site?
2702+
2703+ self._default_enc = None # Best encoding to try on URLs
2704+ self._base_url = None # Prefix to all valid URLs
2705+ self._store_into = None # Output filepath
2706+ self._sitemap_type = None # Sitemap type (web, mobile or news)
2707+ self._suppress = suppress_notify # Suppress notify of servers
2708+ #end def __init__
2709+
2710+ def ValidateBasicConfig(self):
2711+ """ Verifies (and cleans up) the basic user-configurable options. """
2712+ all_good = True
2713+
2714+ if self._default_enc:
2715+ encoder.SetUserEncoding(self._default_enc)
2716+
2717+ # Canonicalize the base_url
2718+ if all_good and not self._base_url:
2719+ output.Error('A site needs a "base_url" attribute.')
2720+ all_good = False
2721+ if all_good and not URL.IsAbsolute(self._base_url):
2722+ output.Error('The "base_url" must be absolute, not relative: %s' %
2723+ self._base_url)
2724+ all_good = False
2725+ if all_good:
2726+ self._base_url = URL.Canonicalize(self._base_url)
2727+ if not self._base_url.endswith('/'):
2728+ self._base_url = self._base_url + '/'
2729+ output.Log('BaseURL is set to: %s' % self._base_url, 2)
2730+
2731+ # Load store_into into a generator
2732+ if all_good:
2733+ if self._store_into:
2734+ self._filegen = FilePathGenerator()
2735+ if not self._filegen.Preload(self._store_into):
2736+ all_good = False
2737+ else:
2738+ output.Error('A site needs a "store_into" attribute.')
2739+ all_good = False
2740+
2741+ # Ask the generator for patterns on what its output will look like
2742+ if all_good:
2743+ self._wildurl1 = self._filegen.GenerateWildURL(self._base_url)
2744+ self._wildurl2 = self._filegen.GenerateURL(SITEINDEX_SUFFIX,
2745+ self._base_url)
2746+
2747+ # Unify various forms of False
2748+ if all_good:
2749+ if self._suppress:
2750+ if (type(self._suppress) == types.StringType) or (type(self._suppress)
2751+ == types.UnicodeType):
2752+ if (self._suppress == '0') or (self._suppress.lower() == 'false'):
2753+ self._suppress = False
2754+
2755+ # Clean up the sitemap_type
2756+ if all_good:
2757+ match = False
2758+ # If sitemap_type is not specified, default to web sitemap
2759+ if not self._sitemap_type:
2760+ self._sitemap_type = 'web'
2761+ else:
2762+ self._sitemap_type = self._sitemap_type.lower()
2763+ for pattern in SITEMAP_TYPES:
2764+ if self._sitemap_type == pattern:
2765+ match = True
2766+ break
2767+ if not match:
2768+ output.Error('The "sitemap_type" value must be "web", "mobile" '
2769+ 'or "news": %s' % self._sitemap_type)
2770+ all_good = False
2771+ output.Log('The Sitemap type is %s Sitemap.' % \
2772+ self._sitemap_type.upper(), 0)
2773+
2774+ # Done
2775+ if not all_good:
2776+ output.Log('See "example_config.xml" for more information.', 0)
2777+ return all_good
2778+ #end def ValidateBasicConfig
2779+
2780+ def Generate(self):
2781+ """ Run over all the Inputs and ask them to Produce """
2782+ # Run the inputs
2783+ for input in self._inputs:
2784+ input.ProduceURLs(self.ConsumeURL)
2785+
2786+ # Do last flushes
2787+ if len(self._set):
2788+ self.FlushSet()
2789+ if not self._sitemaps:
2790+ output.Warn('No URLs were recorded, writing an empty sitemap.')
2791+ self.FlushSet()
2792+
2793+ # Write an index as needed
2794+ if self._sitemaps > 1:
2795+ self.WriteIndex()
2796+
2797+ # Notify
2798+ self.NotifySearch()
2799+
2800+ # Dump stats
2801+ self._stat.Log()
2802+ #end def Generate
2803+
2804+ def ConsumeURL(self, url, allow_fragment):
2805+ """
2806+ All per-URL processing comes together here, regardless of Input.
2807+ Here we run filters, remove duplicates, spill to disk as needed, etc.
2808+
2809+ """
2810+ if not url:
2811+ return
2812+
2813+ # Validate
2814+ if not url.Validate(self._base_url, allow_fragment):
2815+ return
2816+
2817+ # Run filters
2818+ accept = None
2819+ for filter in self._filters:
2820+ accept = filter.Apply(url)
2821+ if accept != None:
2822+ break
2823+ if not (accept or (accept == None)):
2824+ url.Log(prefix='FILTERED', level=2)
2825+ return
2826+
2827+ # Ignore our out output URLs
2828+ if fnmatch.fnmatchcase(url.loc, self._wildurl1) or fnmatch.fnmatchcase(
2829+ url.loc, self._wildurl2):
2830+ url.Log(prefix='IGNORED (output file)', level=2)
2831+ return
2832+
2833+ # Note the sighting
2834+ hash = url.MakeHash()
2835+ if self._urls.has_key(hash):
2836+ dup = self._urls[hash]
2837+ if dup > 0:
2838+ dup = dup + 1
2839+ self._urls[hash] = dup
2840+ if self._dup_max < dup:
2841+ self._dup_max = dup
2842+ url.Log(prefix='DUPLICATE')
2843+ return
2844+
2845+ # Acceptance -- add to set
2846+ self._urls[hash] = 1
2847+ self._set.append(url)
2848+ self._stat.Consume(url)
2849+ url.Log()
2850+
2851+ # Flush the set if needed
2852+ if len(self._set) >= MAXURLS_PER_SITEMAP:
2853+ self.FlushSet()
2854+ #end def ConsumeURL
2855+
2856+ def FlushSet(self):
2857+ """
2858+ Flush the current set of URLs to the output. This is a little
2859+ slow because we like to sort them all and normalize the priorities
2860+ before dumping.
2861+ """
2862+
2863+ # Determine what Sitemap header to use (News or General)
2864+ if self._sitemap_type == 'news':
2865+ sitemap_header = NEWS_SITEMAP_HEADER
2866+ else:
2867+ sitemap_header = GENERAL_SITEMAP_HEADER
2868+
2869+ # Sort and normalize
2870+ output.Log('Sorting and normalizing collected URLs.', 1)
2871+ self._set.sort()
2872+ for url in self._set:
2873+ hash = url.MakeHash()
2874+ dup = self._urls[hash]
2875+ if dup > 0:
2876+ self._urls[hash] = -1
2877+ if not url.priority:
2878+ url.priority = '%.4f' % (float(dup) / float(self._dup_max))
2879+
2880+ # Get the filename we're going to write to
2881+ filename = self._filegen.GeneratePath(self._sitemaps)
2882+ if not filename:
2883+ output.Fatal('Unexpected: Couldn\'t generate output filename.')
2884+ self._sitemaps = self._sitemaps + 1
2885+ output.Log('Writing Sitemap file "%s" with %d URLs' %
2886+ (filename, len(self._set)), 1)
2887+
2888+ # Write to it
2889+ frame = None
2890+ file = None
2891+
2892+ try:
2893+ if self._filegen.is_gzip:
2894+ basename = os.path.basename(filename);
2895+ frame = open(filename, 'wb')
2896+ file = gzip.GzipFile(fileobj=frame, filename=basename, mode='wt')
2897+ else:
2898+ file = open(filename, 'wt')
2899+
2900+ file.write(sitemap_header)
2901+ for url in self._set:
2902+ url.WriteXML(file)
2903+ file.write(SITEMAP_FOOTER)
2904+
2905+ file.close()
2906+ if frame:
2907+ frame.close()
2908+
2909+ frame = None
2910+ file = None
2911+ except IOError:
2912+ output.Fatal('Couldn\'t write out to file: %s' % filename)
2913+ os.chmod(filename, 0644)
2914+
2915+ # Flush
2916+ self._set = []
2917+ #end def FlushSet
2918+
2919+ def WriteIndex(self):
2920+ """ Write the master index of all Sitemap files """
2921+ # Make a filename
2922+ filename = self._filegen.GeneratePath(SITEINDEX_SUFFIX)
2923+ if not filename:
2924+ output.Fatal('Unexpected: Couldn\'t generate output index filename.')
2925+ output.Log('Writing index file "%s" with %d Sitemaps' %
2926+ (filename, self._sitemaps), 1)
2927+
2928+ # Determine what Sitemap index header to use (News or General)
2929+ if self._sitemap_type == 'news':
2930+ sitemap_index_header = NEWS_SITEMAP_HEADER
2931+ else:
2932+ sitemap__index_header = GENERAL_SITEMAP_HEADER
2933+
2934+ # Make a lastmod time
2935+ lastmod = TimestampISO8601(time.time())
2936+
2937+ # Write to it
2938+ try:
2939+ fd = open(filename, 'wt')
2940+ fd.write(sitemap_index_header)
2941+
2942+ for mapnumber in range(0,self._sitemaps):
2943+ # Write the entry
2944+ mapurl = self._filegen.GenerateURL(mapnumber, self._base_url)
2945+ mapattributes = { 'loc' : mapurl, 'lastmod' : lastmod }
2946+ fd.write(SITEINDEX_ENTRY % mapattributes)
2947+
2948+ fd.write(SITEINDEX_FOOTER)
2949+
2950+ fd.close()
2951+ fd = None
2952+ except IOError:
2953+ output.Fatal('Couldn\'t write out to file: %s' % filename)
2954+ os.chmod(filename, 0644)
2955+ #end def WriteIndex
2956+
2957+ def NotifySearch(self):
2958+ """ Send notification of the new Sitemap(s) to the search engines. """
2959+ if self._suppress:
2960+ output.Log('Search engine notification is suppressed.', 1)
2961+ return
2962+
2963+ output.Log('Notifying search engines.', 1)
2964+
2965+ # Override the urllib's opener class with one that doesn't ignore 404s
2966+ class ExceptionURLopener(urllib.FancyURLopener):
2967+ def http_error_default(self, url, fp, errcode, errmsg, headers):
2968+ output.Log('HTTP error %d: %s' % (errcode, errmsg), 2)
2969+ raise IOError
2970+ #end def http_error_default
2971+ #end class ExceptionURLOpener
2972+ old_opener = urllib._urlopener
2973+ urllib._urlopener = ExceptionURLopener()
2974+
2975+ # Build the URL we want to send in
2976+ if self._sitemaps > 1:
2977+ url = self._filegen.GenerateURL(SITEINDEX_SUFFIX, self._base_url)
2978+ else:
2979+ url = self._filegen.GenerateURL(0, self._base_url)
2980+
2981+ # Test if we can hit it ourselves
2982+ try:
2983+ u = urllib.urlopen(url)
2984+ u.close()
2985+ except IOError:
2986+ output.Error('When attempting to access our generated Sitemap at the '
2987+ 'following URL:\n %s\n we failed to read it. Please '
2988+ 'verify the store_into path you specified in\n'
2989+ ' your configuration file is web-accessable. Consult '
2990+ 'the FAQ for more\n information.' % url)
2991+ output.Warn('Proceeding to notify with an unverifyable URL.')
2992+
2993+ # Cycle through notifications
2994+ # To understand this, see the comment near the NOTIFICATION_SITES comment
2995+ for ping in NOTIFICATION_SITES:
2996+ query_map = ping[3]
2997+ query_attr = ping[5]
2998+ query_map[query_attr] = url
2999+ query = urllib.urlencode(query_map)
3000+ notify = urlparse.urlunsplit((ping[0], ping[1], ping[2], query, ping[4]))
3001+
3002+ # Send the notification
3003+ output.Log('Notifying: %s' % ping[1], 0)
3004+ output.Log('Notification URL: %s' % notify, 2)
3005+ try:
3006+ u = urllib.urlopen(notify)
3007+ u.read()
3008+ u.close()
3009+ except IOError:
3010+ output.Warn('Cannot contact: %s' % ping[1])
3011+
3012+ if old_opener:
3013+ urllib._urlopener = old_opener
3014+ #end def NotifySearch
3015+
3016+ def startElement(self, tag, attributes):
3017+ """ SAX processing, called per node in the config stream. """
3018+ if tag == 'site':
3019+ if self._in_site:
3020+ output.Error('Can not nest Site entries in the configuration.')
3021+ else:
3022+ self._in_site = True
3023+
3024+ if not ValidateAttributes('SITE', attributes,
3025+ ('verbose', 'default_encoding', 'base_url', 'store_into',
3026+ 'suppress_search_engine_notify', 'sitemap_type')):
3027+ return
3028+
3029+ verbose = attributes.get('verbose', 0)
3030+ if verbose:
3031+ output.SetVerbose(verbose)
3032+
3033+ self._default_enc = attributes.get('default_encoding')
3034+ self._base_url = attributes.get('base_url')
3035+ self._store_into = attributes.get('store_into')
3036+ self._sitemap_type= attributes.get('sitemap_type')
3037+ if not self._suppress:
3038+ self._suppress = attributes.get('suppress_search_engine_notify',
3039+ False)
3040+ self.ValidateBasicConfig()
3041+ elif tag == 'filter':
3042+ self._filters.append(Filter(attributes))
3043+
3044+ elif tag == 'url':
3045+ print type(attributes)
3046+ self._inputs.append(InputURL(attributes))
3047+
3048+ elif tag == 'urllist':
3049+ for attributeset in ExpandPathAttribute(attributes, 'path'):
3050+ if self._sitemap_type == 'news':
3051+ self._inputs.append(InputNewsURLList(attributeset))
3052+ else:
3053+ self._inputs.append(InputURLList(attributeset))
3054+
3055+ elif tag == 'directory':
3056+ self._inputs.append(InputDirectory(attributes, self._base_url))
3057+
3058+ elif tag == 'accesslog':
3059+ for attributeset in ExpandPathAttribute(attributes, 'path'):
3060+ self._inputs.append(InputAccessLog(attributeset))
3061+ else:
3062+ output.Error('Unrecognized tag in the configuration: %s' % tag)
3063+ #end def startElement
3064+
3065+ def endElement(self, tag):
3066+ """ SAX processing, called per node in the config stream. """
3067+ if tag == 'site':
3068+ assert self._in_site
3069+ self._in_site = False
3070+ self._in_site_ever = True
3071+ #end def endElement
3072+
3073+ def endDocument(self):
3074+ """ End of SAX, verify we can proceed. """
3075+ if not self._in_site_ever:
3076+ output.Error('The configuration must specify a "site" element.')
3077+ else:
3078+ if not self._inputs:
3079+ output.Warn('There were no inputs to generate a sitemap from.')
3080+ #end def endDocument
3081+#end class Sitemap
3082+
3083+
3084+def ValidateAttributes(tag, attributes, goodattributes):
3085+ """ Makes sure 'attributes' does not contain any attribute not
3086+ listed in 'goodattributes' """
3087+ all_good = True
3088+ for attr in attributes.keys():
3089+ if not attr in goodattributes:
3090+ output.Error('Unknown %s attribute: %s' % (tag, attr))
3091+ all_good = False
3092+ return all_good
3093+#end def ValidateAttributes
3094+
3095+def ExpandPathAttribute(src, attrib):
3096+ """ Given a dictionary of attributes, return a list of dictionaries
3097+ with all the same attributes except for the one named attrib.
3098+ That one, we treat as a file path and expand into all its possible
3099+ variations. """
3100+ # Do the path expansion. On any error, just return the source dictionary.
3101+ path = src.get(attrib)
3102+ if not path:
3103+ return [src]
3104+ path = encoder.MaybeNarrowPath(path);
3105+ pathlist = glob.glob(path)
3106+ if not pathlist:
3107+ return [src]
3108+
3109+ # If this isn't actually a dictionary, make it one
3110+ if type(src) != types.DictionaryType:
3111+ tmp = {}
3112+ for key in src.keys():
3113+ tmp[key] = src[key]
3114+ src = tmp
3115+ # Create N new dictionaries
3116+ retval = []
3117+ for path in pathlist:
3118+ dst = src.copy()
3119+ dst[attrib] = path
3120+ retval.append(dst)
3121+
3122+ return retval
3123+#end def ExpandPathAttribute
3124+
3125+def OpenFileForRead(path, logtext):
3126+ """ Opens a text file, be it GZip or plain """
3127+
3128+ frame = None
3129+ file = None
3130+
3131+ if not path:
3132+ return (frame, file)
3133+
3134+ try:
3135+ if path.endswith('.gz'):
3136+ frame = open(path, 'rb')
3137+ file = gzip.GzipFile(fileobj=frame, mode='rt')
3138+ else:
3139+ file = open(path, 'rt')
3140+
3141+ if logtext:
3142+ output.Log('Opened %s file: %s' % (logtext, path), 1)
3143+ else:
3144+ output.Log('Opened file: %s' % path, 1)
3145+ except IOError:
3146+ output.Error('Can not open file: %s' % path)
3147+
3148+ return (frame, file)
3149+#end def OpenFileForRead
3150+
3151+def TimestampISO8601(t):
3152+ """Seconds since epoch (1970-01-01) --> ISO 8601 time string."""
3153+ return time.strftime('%Y-%m-%dT%H:%M:%SZ', time.gmtime(t))
3154+#end def TimestampISO8601
3155+
3156+def CreateSitemapFromFile(configpath, suppress_notify):
3157+ """ Sets up a new Sitemap object from the specified configuration file. """
3158+
3159+ # Remember error count on the way in
3160+ num_errors = output.num_errors
3161+
3162+ # Rev up SAX to parse the config
3163+ sitemap = Sitemap(suppress_notify)
3164+ try:
3165+ output.Log('Reading configuration file: %s' % configpath, 0)
3166+ xml.sax.parse(configpath, sitemap)
3167+ except IOError:
3168+ output.Error('Cannot read configuration file: %s' % configpath)
3169+ except xml.sax._exceptions.SAXParseException, e:
3170+ output.Error('XML error in the config file (line %d, column %d): %s' %
3171+ (e._linenum, e._colnum, e.getMessage()))
3172+ except xml.sax._exceptions.SAXReaderNotAvailable:
3173+ output.Error('Some installs of Python 2.2 did not include complete support'
3174+ ' for XML.\n Please try upgrading your version of Python'
3175+ ' and re-running the script.')
3176+
3177+ # If we added any errors, return no sitemap
3178+ if num_errors == output.num_errors:
3179+ return sitemap
3180+ return None
3181+#end def CreateSitemapFromFile
3182+
3183+def ProcessCommandFlags(args):
3184+ """
3185+ Parse command line flags per specified usage, pick off key, value pairs
3186+ All flags of type "--key=value" will be processed as __flags[key] = value,
3187+ "--option" will be processed as __flags[option] = option
3188+ """
3189+
3190+ flags = {}
3191+ rkeyval = '--(?P<key>\S*)[=](?P<value>\S*)' # --key=val
3192+ roption = '--(?P<option>\S*)' # --key
3193+ r = '(' + rkeyval + ')|(' + roption + ')'
3194+ rc = re.compile(r)
3195+ for a in args:
3196+ try:
3197+ rcg = rc.search(a).groupdict()
3198+ if rcg.has_key('key'):
3199+ flags[rcg['key']] = rcg['value']
3200+ if rcg.has_key('option'):
3201+ flags[rcg['option']] = rcg['option']
3202+ except AttributeError:
3203+ return None
3204+ return flags
3205+#end def ProcessCommandFlags
3206+
3207+
3208+#
3209+# __main__
3210+#
3211+
3212+if __name__ == '__main__':
3213+ flags = ProcessCommandFlags(sys.argv[1:])
3214+ if not flags or not flags.has_key('config') or flags.has_key('help'):
3215+ output.Log(__usage__, 0)
3216+ else:
3217+ suppress_notify = flags.has_key('testing')
3218+ sitemap = CreateSitemapFromFile(flags['config'], suppress_notify)
3219+ if not sitemap:
3220+ output.Log('Configuration file errors -- exiting.', 0)
3221+ else:
3222+ sitemap.Generate()
3223+ output.Log('Number of errors: %d' % output.num_errors, 1)
3224+ output.Log('Number of warnings: %d' % output.num_warns, 1)
3225
3226=== added file 'doc/sitemap_gen_config.xml'
3227--- doc/sitemap_gen_config.xml 1970-01-01 00:00:00 +0000
3228+++ doc/sitemap_gen_config.xml 2009-03-03 11:01:42 +0000
3229@@ -0,0 +1,14 @@
3230+<?xml version="1.0" encoding="UTF-8"?>
3231+<site
3232+ base_url="http://igraph.sourceforge.net/"
3233+ store_into="sitemap.xml"
3234+ verbose="1"
3235+ sitemap_type="web">
3236+ <directory path="homepage" url="http://igraph.sourceforge.net/" />
3237+ <directory path="html" url="http://igraph.sourceforge.net/doc/" />
3238+ <directory path="book/html" url="http://igraph.sourceforge.net/igraphbook/" />
3239+ <filter action="drop" type="wildcard" pattern="*~" />
3240+ <filter action="drop" type="wildcard" pattern="*.html.in" />
3241+ <filter action="drop" type="wildcard" pattern="*.py" />
3242+ <filter action="drop" type="regexp" pattern="/\.[^/]*" />
3243+</site>

Subscribers

People subscribed via source and target branches