Merge lp:~brunogirin/python-snippets/csv-snippets into lp:~jonobacon/python-snippets/trunk

Proposed by Bruno Girin
Status: Merged
Merged at revision: not available
Proposed branch: lp:~brunogirin/python-snippets/csv-snippets
Merge into: lp:~jonobacon/python-snippets/trunk
Diff against target: 304 lines (+262/-0) (has conflicts)
7 files modified
CATEGORIES (+5/-0)
csv/csv101.csv (+13/-0)
csv/csv101.py (+41/-0)
csv/csv2dict.csv (+13/-0)
csv/csv2dict.py (+66/-0)
csv/vmstat-reader.py (+97/-0)
csv/vmstat.log (+27/-0)
Text conflict in CATEGORIES
To merge this branch: bzr merge lp:~brunogirin/python-snippets/csv-snippets
Reviewer Review Type Date Requested Status
Jono Bacon Pending
Review via email: mp+22517@code.launchpad.net

Description of the change

Added a new CSV category to include snippets related to the Python csv module.
Added 3 snippets to the new category:
- csv101.py: a very simple example, reading the content of a CSV file into a list of lists.
- csv2dict.py: a more convoluted example that reads the content of a CSV file to a dictionary of dictionaries for easy querying and processsing.
- vmstat-reader.py: an example that shows how to use custom delimiters to read space delimited files, as output by a number of command line tools, in this case vmstat.

To post a comment you must log in.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'CATEGORIES'
2--- CATEGORIES 2010-03-31 04:07:19 +0000
3+++ CATEGORIES 2010-03-31 08:18:21 +0000
4@@ -16,7 +16,12 @@
5 bzrlib Bazaar source control system Python module.
6 Cairo Cairo drawing examples
7 Clutter Clutter toolkit examples.
8+<<<<<<< TREE
9 dbus dbus messaging system
10+=======
11+ CSV csv Python module, reading and writing CSV files
12+ dbus dbus messaging system
13+>>>>>>> MERGE-SOURCE
14 DesktopCouch DesktopCouch examples.
15 feedparser Parsing RSS feeds.
16 gobject The gobject library, part of GNOME.
17
18=== added directory 'csv'
19=== added file 'csv/csv101.csv'
20--- csv/csv101.csv 1970-01-01 00:00:00 +0000
21+++ csv/csv101.csv 2010-03-31 08:18:21 +0000
22@@ -0,0 +1,13 @@
23+"Line","Route Length","Tunnel","Open","Stations Served","No of Escalators","No of Lifts"
24+"Bakerloo","23 kms","11 kms","12 kms",25,29,11
25+"Central","74 kms","23 kms","51 kms",49,72,12
26+"Circle","21 kms","18 kms","3 kms",27,"See Metropolitan","See Metropolitan"
27+"District","64 kms","17 kms","47 kms",60,21,6
28+"East London","8 kms","4 kms","4 kms",8,2,4
29+"Hammersmith & City","27 kms","12 kms","15 kms",28,"See Metropolitan","See Metropolitan"
30+"Jubilee","38 kms","19 kms","19 kms",27,127,34
31+"Metropolitan","67 kms","10 kms","57 kms",34,30,2
32+"Northern","58 kms","39 kms","19 kms",51,54,21
33+"Piccadilly","71 kms","21 kms","50 kms",52,43,17
34+"Victoria","21 kms","21 kms","0 kms",16,31,1
35+"Waterloo & City","2 kms","2 kms","0 kms",2,"see Central","see Central"
36
37=== added file 'csv/csv101.py'
38--- csv/csv101.py 1970-01-01 00:00:00 +0000
39+++ csv/csv101.py 2010-03-31 08:18:21 +0000
40@@ -0,0 +1,41 @@
41+#!/usr/bin/env python
42+#
43+# [SNIPPET_NAME: CSV 101]
44+# [SNIPPET_CATEGORIES: CSV]
45+# [SNIPPET_DESCRIPTION: Basic CSV file reading example]
46+# [SNIPPET_AUTHOR: Bruno Girin <brunogirin@gmail.com>]
47+# [SNIPPET_LICENSE: GPL]
48+
49+# This snippet demonstrates how the basics on how to read CSV files using
50+# the Python csv module.
51+# The full documentation for the csv module is available here:
52+# http://docs.python.org/library/csv.html
53+#
54+# The data used in the companion csv101.csv file was taken from here:
55+# http://www.trainweb.org/tubeprune/Statistics.htm
56+# See, you can even learn some interesting facts about the London Underground
57+# network while learning Python.
58+
59+#
60+# First things first, we need to import the csv module
61+# Also import sys to get argv[0], which holds the name of the script
62+#
63+import csv
64+import sys
65+
66+# Derive the name of the CSV file from the name of the script and initialise
67+# the content list
68+csvFile = sys.argv[0].replace('.py', '.csv')
69+content = []
70+
71+print('Reading file %s' % csvFile)
72+# And the rest is really easy as csv.reader can be iterated upon
73+reader=csv.reader(open(csvFile))
74+for row in reader:
75+ """
76+ For each row, we append the row to the content list, which will
77+ produce a list of lists.
78+ """
79+ content.append(row)
80+print content
81+
82
83=== added file 'csv/csv2dict.csv'
84--- csv/csv2dict.csv 1970-01-01 00:00:00 +0000
85+++ csv/csv2dict.csv 2010-03-31 08:18:21 +0000
86@@ -0,0 +1,13 @@
87+"Line","Peak Service","Off Peak Service","Trains required","Stabling","Control Centre"
88+"Bakerloo",23,15,"32 x 7-cars","Stonebridge Pk, London Rd, Elephant, Queens Pk.","Baker Street"
89+"Central",30,18,"72 x 8-cars","Hainault, White City, West Ruislip, Loughton, Woodford","Wood Lane"
90+"Circle",7,7,"14 x 6-cars","Hammersmith, Edgware Rd, Farringdon","Baker Street"
91+"District","23 plus 7 Circles","14 plus 7 Circles ","74 x 6-cars","Ealing Common, Parsons Green, Triangle Sdgs, Barking, Upminster","Earls Court"
92+"East London",6,6,"6 x 4-cars","New Cross","New Cross Depot"
93+"Hammersmith & City",7,7,"17 x 6-cars","Hammersmith, Barking","Baker Street"
94+"Jubilee",24,15,"46 x 6-cars","Neasden, Stratford, Stanmore","Neasden"
95+"Metropolitan","16 plus 14 C & H trains","20 incl C & H lines","44 x 8-cars","Uxbridge, Rickmansworth, Neasden","Neasden"
96+"Northern","30 max. 20 (branches)","20 max. 15 (branches)","84 x 6-cars","Morden, Golders Green, Highgate, High Barnet, Edgware","Coburg Street"
97+"Piccadilly",24,18,"76 x 6-cars","Northfields, Cockfosters, Arnos Grove","Earls Court"
98+"Victoria",28.5,18,"37 x 8-cars","Northumberland Park, Brixton, Walthamstow","Coburg Street"
99+"Waterloo & City",19,12,"4 x 4-cars","Waterloo, Bank","Waterloo"
100
101=== added file 'csv/csv2dict.py'
102--- csv/csv2dict.py 1970-01-01 00:00:00 +0000
103+++ csv/csv2dict.py 2010-03-31 08:18:21 +0000
104@@ -0,0 +1,66 @@
105+#!/usr/bin/env python
106+#
107+# [SNIPPET_NAME: CSV to Dictionary]
108+# [SNIPPET_CATEGORIES: CSV]
109+# [SNIPPET_DESCRIPTION: Read a CSV file to a dictionary of dictionaries]
110+# [SNIPPET_AUTHOR: Bruno Girin <brunogirin@gmail.com>]
111+# [SNIPPET_LICENSE: GPL]
112+
113+# This snippet demonstrates how to read a CSV file into a dictionary of
114+# dictionaries in order to be able to query it easily.
115+# The full documentation for the csv module is available here:
116+# http://docs.python.org/library/csv.html
117+#
118+# The data used in the companion csv2dict.csv file was taken from here:
119+# http://www.trainweb.org/tubeprune/Statistics.htm
120+# See, you can even learn some interesting facts about the London Underground
121+# network while learning Python.
122+
123+#
124+# First things first, we need to import the csv module
125+# Also import sys to get argv[0], which holds the name of the script
126+#
127+import csv
128+import sys
129+
130+# Derive the name of the CSV file from the name of the script and initialise
131+# the headers list and content dictionary
132+csvFile = sys.argv[0].replace('.py', '.csv')
133+headers = None
134+content = {}
135+
136+print('Reading file %s' % csvFile)
137+reader=csv.reader(open(csvFile))
138+for row in reader:
139+ if reader.line_num == 1:
140+ """
141+ If we are on the first line, create the headers list from the first row
142+ by taking a slice from item 1 as we don't need the very first header.
143+ """
144+ headers = row[1:]
145+ else:
146+ """
147+ Otherwise, the key in the content dictionary is the first item in the
148+ row and we can create the sub-dictionary by using the zip() function.
149+ We also know that the stabling entry is a comma separated list of names
150+ so we split it into a list for easier processing.
151+ """
152+ content[row[0]] = dict(zip(headers, row[1:]))
153+ content[row[0]]['Stabling'] = [s.strip() for s in content[row[0]]['Stabling'].split(',')]
154+
155+# We can know get to the content by using the resulting dictionary, so to see
156+# the list of lines, we can do:
157+print "\nList of lines"
158+print content.keys()
159+# To see the list of statistics available for each line
160+print "\nAvailable statistics for each line"
161+print headers
162+# To see any statistic for a line, we can just request it by name
163+print "\nPeak hourly train frequency for the Piccadilly line"
164+print content['Piccadilly']['Peak Service']
165+# Or we can use list comprehensions to filter the list
166+print "\nThe list of lines that have Earl's Court as a control centre"
167+print [k for k, v in content.items() if v['Control Centre'] == 'Earls Court']
168+print "\nThe list of lines that have Hammersmith as one of their stabling stations"
169+print [k for k, v in content.items() if 'Hammersmith' in v['Stabling']]
170+
171
172=== added file 'csv/vmstat-reader.py'
173--- csv/vmstat-reader.py 1970-01-01 00:00:00 +0000
174+++ csv/vmstat-reader.py 2010-03-31 08:18:21 +0000
175@@ -0,0 +1,97 @@
176+#!/usr/bin/env python
177+#
178+# [SNIPPET_NAME: vmstat Reader]
179+# [SNIPPET_CATEGORIES: CSV]
180+# [SNIPPET_DESCRIPTION: Custom CSV reader to read files like vmstat output]
181+# [SNIPPET_AUTHOR: Bruno Girin <brunogirin@gmail.com>]
182+# [SNIPPET_LICENSE: GPL]
183+
184+# This snippet demonstrates how to use the csv module with a custom separator
185+# in order to read space separated value files such as the output of the
186+# vmstat command.
187+# The full documentation for the csv module is available here:
188+# http://docs.python.org/library/csv.html
189+#
190+# The data used in the companion vmstat.log file was taken bu running the command:
191+# vmstat -n 5
192+
193+#
194+# First things first, we need to import the csv module
195+# Also import sys to get argv[0], which holds the name of the script
196+#
197+import csv
198+import sys
199+
200+# Derive the name of the CSV file from the name of the script
201+csvFile = sys.argv[0].replace('-reader.py', '.log')
202+
203+# Create a map from minor to major header as the minor headers are easy to
204+# associate to columns, which is not the case for major headers.
205+minor2major = {
206+ 'r': 'procs',
207+ 'b': 'procs',
208+ 'swpd': 'memory',
209+ 'free': 'memory',
210+ 'buff': 'memory',
211+ 'cache': 'memory',
212+ 'inact': 'memory', # to support the vmstat -a option if required
213+ 'active': 'memory', # to support the vmstat -a option if required
214+ 'si': 'swap',
215+ 'so': 'swap',
216+ 'bi': 'io',
217+ 'bo': 'io',
218+ 'in': 'system',
219+ 'cs': 'system',
220+ 'us': 'cpu',
221+ 'sy': 'cpu',
222+ 'id': 'cpu',
223+ 'wa': 'cpu'
224+}
225+minors = []
226+
227+# Initialise the content map by creating an empty sub-map against each
228+# unique major header
229+content = dict([(h, {}) for h in set(minor2major.values())])
230+
231+print('Reading file %s' % csvFile)
232+# Create the reader and specify the delimier to be a space; also set the
233+# skipinitialspace flag to true to ensure that several spaces are seen as a
234+# single delimiter and that initial spaces in a line are ignored
235+reader=csv.reader(open(csvFile), delimiter=' ', skipinitialspace=True)
236+for row in reader:
237+ if reader.line_num == 1:
238+ """
239+ Ignore the first line as it contains major headers.
240+ """
241+ elif reader.line_num == 2:
242+ """
243+ If we are on the first line, create the headers list from the first row.
244+ We also keep a copy of the minor headers, in the order that they appear
245+ in the file to ensure that we can map the values to the correct entry
246+ in the content map.
247+ """
248+ minors = row
249+ for h in row:
250+ content[minor2major[h]][h] = []
251+ elif row[0] != minors[0] and row[0] != minor2major[minors[0]]:
252+ """
253+ If the -n option was not specified when running the vmstat command,
254+ major and minor headers are repeated so we need to ensure that we
255+ ignore such lines and only deal with lines that contain actual data.
256+ For each value in the row, we append it to the respective entry in
257+ the content dictionary. In addition, we transform the value to an int
258+ before appending it as we know that the content of the log should only
259+ have integer values.
260+ """
261+ for i, v in enumerate(row):
262+ content[minor2major[minors[i]]][minors[i]].append(int(v))
263+
264+print "\nThe minor headers read from the file"
265+print minors
266+print "\nThe CPU user process stats"
267+print content['cpu']['us']
268+print "\nMinimum free memory in the data set"
269+print min(content['memory']['free'])
270+print "\nMaximum IO, either input or output"
271+print max([max(l) for l in content['io'].values()])
272+
273
274=== added file 'csv/vmstat.log'
275--- csv/vmstat.log 1970-01-01 00:00:00 +0000
276+++ csv/vmstat.log 2010-03-31 08:18:21 +0000
277@@ -0,0 +1,27 @@
278+procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
279+ r b swpd free buff cache si so bi bo in cs us sy id wa
280+ 5 1 16272 309800 202736 712492 0 0 31 24 497 192 18 5 77 1
281+ 0 0 16272 309784 202736 712552 0 0 7 0 103 224 1 1 98 0
282+ 0 0 16272 309784 202752 712552 0 0 0 8 112 216 1 1 98 0
283+ 0 0 16272 309412 202756 712552 0 0 0 4 407 675 11 5 84 0
284+ 1 0 16272 302576 202756 716008 0 0 690 0 509 945 14 5 80 1
285+ 0 0 16272 282292 202812 719208 0 0 574 5 474 3026 31 8 60 1
286+ 0 0 16272 280184 202820 719316 0 0 22 26 433 996 27 5 68 0
287+ 8 0 16272 260288 202828 720344 0 0 130 9 370 4638 32 9 59 0
288+ 2 0 16268 226656 202852 722984 6 0 454 77 524 2484 47 12 41 0
289+ 1 0 16268 208024 202860 730784 0 0 1550 11 571 6092 93 7 0 0
290+ 2 0 16268 146360 202860 734772 0 0 794 0 705 3879 83 16 1 0
291+ 0 1 16268 123132 202864 750356 0 0 3116 8 660 1356 29 62 4 4
292+ 3 0 16264 93936 202864 764216 0 0 2714 5 640 1428 26 72 0 2
293+ 1 0 17192 53500 188852 783932 0 186 6819 202 669 1049 25 69 2 4
294+ 0 0 18028 55152 176676 786992 0 167 2153 191 703 1112 13 39 47 1
295+ 5 0 18724 54216 158244 791824 0 139 3410 146 655 1434 11 30 47 12
296+ 0 0 18948 53236 155644 791812 0 45 572 52 619 1277 6 14 79 0
297+ 0 0 18948 53236 155656 791812 0 0 0 144 571 1265 5 3 91 0
298+ 0 1 18948 52988 155660 792028 0 0 0 5 550 1217 4 3 91 3
299+ 0 0 18948 52864 155668 792148 0 0 0 10 568 1291 5 4 73 18
300+ 0 0 18948 52920 155676 792160 0 0 0 6 562 1292 6 5 89 0
301+ 2 0 18960 56156 155688 788076 0 2 38 14 565 1149 16 53 29 1
302+ 0 0 18960 264784 155696 786928 0 0 0 166 449 3601 18 16 66 0
303+ 0 0 18960 264808 155708 786928 0 0 0 19 430 746 13 7 77 4
304+ 0 0 18960 264776 155708 786928 0 0 0 2 236 389 5 2 94 0

Subscribers

People subscribed via source and target branches