Python Snippets

Merge lp:~brunogirin/python-snippets/csv-snippets into lp:~jonobacon/python-snippets/trunk

csv-snippets
Merge into trunk

Proposed by Bruno Girin on 2010-03-31

Status:	Merged
Merged at revision:	not available
Proposed branch:	lp:~brunogirin/python-snippets/csv-snippets
Merge into:	lp:~jonobacon/python-snippets/trunk
Diff against target:	304 lines (+262/-0) (has conflicts) 7 files modified CATEGORIES (+5/-0) csv/csv101.csv (+13/-0) csv/csv101.py (+41/-0) csv/csv2dict.csv (+13/-0) csv/csv2dict.py (+66/-0) csv/vmstat-reader.py (+97/-0) csv/vmstat.log (+27/-0) Text conflict in CATEGORIES
To merge this branch:	bzr merge lp:~brunogirin/python-snippets/csv-snippets
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
Jono Bacon		2010-03-31	Pending
Review via email: mp+22517@code.launchpad.net

Description of the change

Added a new CSV category to include snippets related to the Python csv module.
Added 3 snippets to the new category:
- csv101.py: a very simple example, reading the content of a CSV file into a list of lists.
- csv2dict.py: a more convoluted example that reads the content of a CSV file to a dictionary of dictionaries for easy querying and processsing.
- vmstat-reader.py: an example that shows how to use custom delimiters to read space delimited files, as output by a number of command line tools, in this case vmstat.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Akkana Peck

Bruno Girin

Jono Bacon

Nathan Handler

Oliver Marks

zpletan

 === modified file 'CATEGORIES'
 --- CATEGORIES	2010-03-31 04:07:19 +0000
 +++ CATEGORIES	2010-03-31 08:18:21 +0000
@@ -16,7 +16,12 @@
      bzrlib                  Bazaar source control system Python module.
      Cairo		            Cairo drawing examples
      Clutter                 Clutter toolkit examples.
++<<<<<<< TREE
      dbus        	        dbus messaging system
++=======
++    CSV                     csv Python module, reading and writing CSV files
++    dbus        	    dbus messaging system
++>>>>>>> MERGE-SOURCE
      DesktopCouch            DesktopCouch examples.
      feedparser	            Parsing RSS feeds.
      gobject		            The gobject library, part of GNOME.
 === added directory 'csv'
 === added file 'csv/csv101.csv'
 --- csv/csv101.csv	1970-01-01 00:00:00 +0000
 +++ csv/csv101.csv	2010-03-31 08:18:21 +0000
@@ -0,0 +1,13 @@
++"Line","Route Length","Tunnel","Open","Stations Served","No of Escalators","No of Lifts"
++"Bakerloo","23 kms","11 kms","12 kms",25,29,11
++"Central","74 kms","23 kms","51 kms",49,72,12
++"Circle","21 kms","18 kms","3 kms",27,"See Metropolitan","See Metropolitan"
++"District","64 kms","17 kms","47 kms",60,21,6
++"East London","8 kms","4 kms","4 kms",8,2,4
++"Hammersmith & City","27 kms","12 kms","15 kms",28,"See Metropolitan","See Metropolitan"
++"Jubilee","38 kms","19 kms","19 kms",27,127,34
++"Metropolitan","67 kms","10 kms","57 kms",34,30,2
++"Northern","58 kms","39 kms","19 kms",51,54,21
++"Piccadilly","71 kms","21 kms","50 kms",52,43,17
++"Victoria","21 kms","21 kms","0 kms",16,31,1
++"Waterloo & City","2 kms","2 kms","0 kms",2,"see Central","see Central"
 === added file 'csv/csv101.py'
 --- csv/csv101.py	1970-01-01 00:00:00 +0000
 +++ csv/csv101.py	2010-03-31 08:18:21 +0000
@@ -0,0 +1,41 @@
++#!/usr/bin/env python
++#
++# [SNIPPET_NAME: CSV 101]
++# [SNIPPET_CATEGORIES: CSV]
++# [SNIPPET_DESCRIPTION: Basic CSV file reading example]
++# [SNIPPET_AUTHOR: Bruno Girin <brunogirin@gmail.com>]
++# [SNIPPET_LICENSE: GPL]
++
++# This snippet demonstrates how the basics on how to read CSV files using
++# the Python csv module.
++# The full documentation for the csv module is available here:
++# http://docs.python.org/library/csv.html
++#
++# The data used in the companion csv101.csv file was taken from here:
++# http://www.trainweb.org/tubeprune/Statistics.htm
++# See, you can even learn some interesting facts about the London Underground
++# network while learning Python.
++
++#
++# First things first, we need to import the csv module
++# Also import sys to get argv[0], which holds the name of the script
++#
++import csv
++import sys
++
++# Derive the name of the CSV file from the name of the script and initialise
++# the content list
++csvFile = sys.argv[0].replace('.py', '.csv')
++content = []
++
++print('Reading file %s' % csvFile)
++# And the rest is really easy as csv.reader can be iterated upon
++reader=csv.reader(open(csvFile))
++for row in reader:
++    """
++    For each row, we append the row to the content list, which will
++    produce a list of lists.
++    """
++    content.append(row)
++print content
++
 === added file 'csv/csv2dict.csv'
 --- csv/csv2dict.csv	1970-01-01 00:00:00 +0000
 +++ csv/csv2dict.csv	2010-03-31 08:18:21 +0000
@@ -0,0 +1,13 @@
++"Line","Peak Service","Off Peak Service","Trains required","Stabling","Control Centre"
++"Bakerloo",23,15,"32 x 7-cars","Stonebridge Pk, London Rd, Elephant, Queens Pk.","Baker Street"
++"Central",30,18,"72 x 8-cars","Hainault, White City, West Ruislip, Loughton, Woodford","Wood Lane"
++"Circle",7,7,"14 x 6-cars","Hammersmith, Edgware Rd, Farringdon","Baker Street"
++"District","23 plus 7 Circles","14 plus 7 Circles ","74 x 6-cars","Ealing Common, Parsons Green, Triangle Sdgs, Barking, Upminster","Earls Court"
++"East London",6,6,"6 x 4-cars","New Cross","New Cross Depot"
++"Hammersmith & City",7,7,"17 x 6-cars","Hammersmith, Barking","Baker Street"
++"Jubilee",24,15,"46 x 6-cars","Neasden, Stratford, Stanmore","Neasden"
++"Metropolitan","16 plus 14 C & H trains","20 incl C & H lines","44 x 8-cars","Uxbridge, Rickmansworth, Neasden","Neasden"
++"Northern","30 max. 20 (branches)","20 max. 15 (branches)","84 x 6-cars","Morden, Golders Green, Highgate, High Barnet, Edgware","Coburg Street"
++"Piccadilly",24,18,"76 x 6-cars","Northfields, Cockfosters, Arnos Grove","Earls Court"
++"Victoria",28.5,18,"37 x 8-cars","Northumberland Park, Brixton, Walthamstow","Coburg Street"
++"Waterloo & City",19,12,"4 x 4-cars","Waterloo, Bank","Waterloo"
 === added file 'csv/csv2dict.py'
 --- csv/csv2dict.py	1970-01-01 00:00:00 +0000
 +++ csv/csv2dict.py	2010-03-31 08:18:21 +0000
@@ -0,0 +1,66 @@
++#!/usr/bin/env python
++#
++# [SNIPPET_NAME: CSV to Dictionary]
++# [SNIPPET_CATEGORIES: CSV]
++# [SNIPPET_DESCRIPTION: Read a CSV file to a dictionary of dictionaries]
++# [SNIPPET_AUTHOR: Bruno Girin <brunogirin@gmail.com>]
++# [SNIPPET_LICENSE: GPL]
++
++# This snippet demonstrates how to read a CSV file into a dictionary of
++# dictionaries in order to be able to query it easily.
++# The full documentation for the csv module is available here:
++# http://docs.python.org/library/csv.html
++#
++# The data used in the companion csv2dict.csv file was taken from here:
++# http://www.trainweb.org/tubeprune/Statistics.htm
++# See, you can even learn some interesting facts about the London Underground
++# network while learning Python.
++
++#
++# First things first, we need to import the csv module
++# Also import sys to get argv[0], which holds the name of the script
++#
++import csv
++import sys
++
++# Derive the name of the CSV file from the name of the script and initialise
++# the headers list and content dictionary
++csvFile = sys.argv[0].replace('.py', '.csv')
++headers = None
++content = {}
++
++print('Reading file %s' % csvFile)
++reader=csv.reader(open(csvFile))
++for row in reader:
++    if reader.line_num == 1:
++        """
++        If we are on the first line, create the headers list from the first row
++        by taking a slice from item 1  as we don't need the very first header.
++        """
++        headers = row[1:]
++    else:
++        """
++        Otherwise, the key in the content dictionary is the first item in the
++        row and we can create the sub-dictionary by using the zip() function.
++        We also know that the stabling entry is a comma separated list of names
++        so we split it into a list for easier processing.
++        """
++        content[row[0]] = dict(zip(headers, row[1:]))
++        content[row[0]]['Stabling'] = [s.strip() for s in content[row[0]]['Stabling'].split(',')]
++
++# We can know get to the content by using the resulting dictionary, so to see
++# the list of lines, we can do:
++print "\nList of lines"
++print content.keys()
++# To see the list of statistics available for each line
++print "\nAvailable statistics for each line"
++print headers
++# To see any statistic for a line, we can just request it by name
++print "\nPeak hourly train frequency for the Piccadilly line"
++print content['Piccadilly']['Peak Service']
++# Or we can use list comprehensions to filter the list
++print "\nThe list of lines that have Earl's Court as a control centre"
++print [k for k, v in content.items() if v['Control Centre'] == 'Earls Court']
++print "\nThe list of lines that have Hammersmith as one of their stabling stations"
++print [k for k, v in content.items() if 'Hammersmith' in v['Stabling']]
++
 === added file 'csv/vmstat-reader.py'
 --- csv/vmstat-reader.py	1970-01-01 00:00:00 +0000
 +++ csv/vmstat-reader.py	2010-03-31 08:18:21 +0000
@@ -0,0 +1,97 @@
++#!/usr/bin/env python
++#
++# [SNIPPET_NAME: vmstat Reader]
++# [SNIPPET_CATEGORIES: CSV]
++# [SNIPPET_DESCRIPTION: Custom CSV reader to read files like vmstat output]
++# [SNIPPET_AUTHOR: Bruno Girin <brunogirin@gmail.com>]
++# [SNIPPET_LICENSE: GPL]
++
++# This snippet demonstrates how to use the csv module with a custom separator
++# in order to read space separated value files such as the output of the
++# vmstat command.
++# The full documentation for the csv module is available here:
++# http://docs.python.org/library/csv.html
++#
++# The data used in the companion vmstat.log file was taken bu running the command:
++# vmstat -n 5
++
++#
++# First things first, we need to import the csv module
++# Also import sys to get argv[0], which holds the name of the script
++#
++import csv
++import sys
++
++# Derive the name of the CSV file from the name of the script
++csvFile = sys.argv[0].replace('-reader.py', '.log')
++
++# Create a map from minor to major header as the minor headers are easy to
++# associate to columns, which is not the case for major headers.
++minor2major = {
++    'r': 'procs',
++    'b': 'procs',
++    'swpd': 'memory',
++    'free': 'memory',
++    'buff': 'memory',
++    'cache': 'memory',
++    'inact': 'memory',  # to support the vmstat -a option if required
++    'active': 'memory', # to support the vmstat -a option if required
++    'si': 'swap',
++    'so': 'swap',
++    'bi': 'io',
++    'bo': 'io',
++    'in': 'system',
++    'cs': 'system',
++    'us': 'cpu',
++    'sy': 'cpu',
++    'id': 'cpu',
++    'wa': 'cpu'
++}
++minors = []
++
++# Initialise the content map by creating an empty sub-map against each
++# unique major header
++content = dict([(h, {}) for h in set(minor2major.values())])
++
++print('Reading file %s' % csvFile)
++# Create the reader and specify the delimier to be a space; also set the
++# skipinitialspace flag to true to ensure that several spaces are seen as a
++# single delimiter and that initial spaces in a line are ignored
++reader=csv.reader(open(csvFile), delimiter=' ', skipinitialspace=True)
++for row in reader:
++    if reader.line_num == 1:
++        """
++        Ignore the first line as it contains major headers.
++        """
++    elif reader.line_num == 2:
++        """
++        If we are on the first line, create the headers list from the first row.
++        We also keep a copy of the minor headers, in the order that they appear
++        in the file to ensure that we can map the values to the correct entry
++        in the content map.
++        """
++        minors = row
++        for h in row:
++            content[minor2major[h]][h] = []
++    elif row[0] != minors[0] and row[0] != minor2major[minors[0]]:
++        """
++        If the -n option was not specified when running the vmstat command,
++        major and minor headers are repeated so we need to ensure that we
++        ignore such lines and only deal with lines that contain actual data.
++        For each value in the row, we append it to the respective entry in
++        the content dictionary. In addition, we transform the value to an int
++        before appending it as we know that the content of the log should only
++        have integer values.
++        """
++        for i, v in enumerate(row):
++            content[minor2major[minors[i]]][minors[i]].append(int(v))
++
++print "\nThe minor headers read from the file"
++print minors
++print "\nThe CPU user process stats"
++print content['cpu']['us']
++print "\nMinimum free memory in the data set"
++print min(content['memory']['free'])
++print "\nMaximum IO, either input or output"
++print max([max(l) for l in content['io'].values()])
++
 === added file 'csv/vmstat.log'
 --- csv/vmstat.log	1970-01-01 00:00:00 +0000
 +++ csv/vmstat.log	2010-03-31 08:18:21 +0000
@@ -0,0 +1,27 @@
++procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
++ r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
++ 5  1  16272 309800 202736 712492    0    0    31    24  497  192 18  5 77  1
++ 0  0  16272 309784 202736 712552    0    0     7     0  103  224  1  1 98  0
++ 0  0  16272 309784 202752 712552    0    0     0     8  112  216  1  1 98  0
++ 0  0  16272 309412 202756 712552    0    0     0     4  407  675 11  5 84  0
++ 1  0  16272 302576 202756 716008    0    0   690     0  509  945 14  5 80  1
++ 0  0  16272 282292 202812 719208    0    0   574     5  474 3026 31  8 60  1
++ 0  0  16272 280184 202820 719316    0    0    22    26  433  996 27  5 68  0
++ 8  0  16272 260288 202828 720344    0    0   130     9  370 4638 32  9 59  0
++ 2  0  16268 226656 202852 722984    6    0   454    77  524 2484 47 12 41  0
++ 1  0  16268 208024 202860 730784    0    0  1550    11  571 6092 93  7  0  0
++ 2  0  16268 146360 202860 734772    0    0   794     0  705 3879 83 16  1  0
++ 0  1  16268 123132 202864 750356    0    0  3116     8  660 1356 29 62  4  4
++ 3  0  16264  93936 202864 764216    0    0  2714     5  640 1428 26 72  0  2
++ 1  0  17192  53500 188852 783932    0  186  6819   202  669 1049 25 69  2  4
++ 0  0  18028  55152 176676 786992    0  167  2153   191  703 1112 13 39 47  1
++ 5  0  18724  54216 158244 791824    0  139  3410   146  655 1434 11 30 47 12
++ 0  0  18948  53236 155644 791812    0   45   572    52  619 1277  6 14 79  0
++ 0  0  18948  53236 155656 791812    0    0     0   144  571 1265  5  3 91  0
++ 0  1  18948  52988 155660 792028    0    0     0     5  550 1217  4  3 91  3
++ 0  0  18948  52864 155668 792148    0    0     0    10  568 1291  5  4 73 18
++ 0  0  18948  52920 155676 792160    0    0     0     6  562 1292  6  5 89  0
++ 2  0  18960  56156 155688 788076    0    2    38    14  565 1149 16 53 29  1
++ 0  0  18960 264784 155696 786928    0    0     0   166  449 3601 18 16 66  0
++ 0  0  18960 264808 155708 786928    0    0     0    19  430  746 13  7 77  4
++ 0  0  18960 264776 155708 786928    0    0     0     2  236  389  5  2 94  0

Python Snippets

Merge lp:~brunogirin/python-snippets/csv-snippets into lp:~jonobacon/python-snippets/trunk

Commit message

Description of the change

Preview Diff

Subscribers