Merge lp:~roadmr/checkbox/lightning into lp:checkbox

Proposed by Daniel Manrique
Status: Merged
Merged at revision: 1276
Proposed branch: lp:~roadmr/checkbox/lightning
Merge into: lp:checkbox
Diff against target: 180 lines (+53/-14)
6 files modified
checkbox/contrib/persist.py (+12/-9)
checkbox/lib/safe.py (+4/-3)
checkbox/message.py (+6/-2)
debian/changelog (+2/-0)
plugins/jobs_prompt.py (+15/-0)
plugins/persist_info.py (+14/-0)
To merge this branch: bzr merge lp:~roadmr/checkbox/lightning
Reviewer Review Type Date Requested Status
Marc Tardif (community) Approve
Review via email: mp+94284@code.launchpad.net

Description of the change

This branch adds a way for the two main objects dealing with persisting stuff to disk (MessageStore and Persist) to decide whether or not they should use "safe" file closing (which flushes the file buffer and syncs to disk to ensure data integrity).

The default (and very robust) behavior is to use safe file closing, just as it has been since this feature was introduced in rev 1090.

It then adds a few methods in the persist_info and jobs_prompt plugins to *disable* safeness prior to starting the gathering phase, and reenable it right after finishing gathering.

In principle this should cause no data loss, since the gathered data is obtained automatically and thus the process can be very easily repeated if needed.

The benefit of doing all this juggling is a significant speed increase in gathering. My system goes from 37-40 seconds on the gathering phase to just under 7 seconds with these changes.

I'm not entirely happy by the handling of Persist's _backend directly (in persist_info). Maybe adding a method to Persist proper to then set the backend's safeness would be more kosher. But this seemed so much easier!

To post a comment you must log in.
Revision history for this message
Marc Tardif (cr3) wrote :

Very useful, thanks man! Just a couple comments about your code:

1. It is more Pythonish (or Pythonic?) to pass keyword arguments to function without spaces:

  foo(a=1, b=2).

2. In plugins/persist_info.py, you access private variables from the persist object:

  self.persist._backend.safe_file_closing = True

Instead, you might have liked to consider adding a public getter and setter on the Persist class:

  self.persist.setSafeFileClosing(True)

However, this doesn't prevent the code from working so I'm merging. All good!

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'checkbox/contrib/persist.py'
2--- checkbox/contrib/persist.py 2011-10-07 07:16:47 +0000
3+++ checkbox/contrib/persist.py 2012-02-22 22:10:23 +0000
4@@ -506,8 +506,11 @@
5 def save(self, filepath, map):
6 self._store[filepath] = map
7
8-
9-class PickleBackend(Backend):
10+class DiskBackend(Backend):
11+
12+ safe_file_closing = True
13+
14+class PickleBackend(DiskBackend):
15
16 def __init__(self):
17 import cPickle
18@@ -518,17 +521,17 @@
19 try:
20 return self._pickle.load(file)
21 finally:
22- safe_close(file)
23+ safe_close(file, self.safe_file_closing)
24
25 def save(self, filepath, map):
26 file = open(filepath, "w")
27 try:
28 self._pickle.dump(map, file, 2)
29 finally:
30- safe_close(file)
31-
32-
33-class BPickleBackend(Backend):
34+ safe_close(file, self.safe_file_closing)
35+
36+
37+class BPickleBackend(DiskBackend):
38
39 def __init__(self):
40 from checkbox.contrib import bpickle
41@@ -539,11 +542,11 @@
42 try:
43 return self._bpickle.loads(file.read())
44 finally:
45- safe_close(file)
46+ safe_close(file, self.safe_file_closing)
47
48 def save(self, filepath, map):
49 file = open(filepath, "w")
50 try:
51 file.write(self._bpickle.dumps(map))
52 finally:
53- safe_close(file)
54+ safe_close(file, self.safe_file_closing)
55
56=== modified file 'checkbox/lib/safe.py'
57--- checkbox/lib/safe.py 2011-10-07 07:16:47 +0000
58+++ checkbox/lib/safe.py 2012-02-22 22:10:23 +0000
59@@ -95,7 +95,8 @@
60
61 return md5sum
62
63-def safe_close(file):
64- file.flush()
65- os.fsync(file.fileno())
66+def safe_close(file, safe = True):
67+ if safe:
68+ file.flush()
69+ os.fsync(file.fileno())
70 file.close()
71
72=== modified file 'checkbox/message.py'
73--- checkbox/message.py 2012-02-14 20:32:39 +0000
74+++ checkbox/message.py 2012-02-22 22:10:23 +0000
75@@ -44,6 +44,10 @@
76 #This caches everything but a message's data, making it manageable to keep in memory.
77 _message_cache = {}
78
79+ #Setting this to False speeds things up considerably, at the expense
80+ #of a higher risk of data loss during a crash
81+ safe_file_closing = True
82+
83 def __init__(self, persist, directory, directory_size=1000):
84 self._directory = directory
85 self._directory_size = directory_size
86@@ -219,7 +223,7 @@
87 try:
88 return file.read()
89 finally:
90- safe_close(file)
91+ safe_close(file, safe = self.safe_file_closing)
92
93 def _get_flags(self, path):
94 basename = posixpath.basename(path)
95@@ -261,7 +265,7 @@
96
97 file = open(filename + ".tmp", "w")
98 file.write(message_data)
99- safe_close(file)
100+ safe_close(file, safe = self.safe_file_closing)
101
102 os.rename(filename + ".tmp", filename)
103
104
105=== modified file 'debian/changelog'
106--- debian/changelog 2012-02-22 21:05:54 +0000
107+++ debian/changelog 2012-02-22 22:10:23 +0000
108@@ -5,6 +5,8 @@
109
110 [Daniel Manrique]
111 * Use GObject from gi.repository instead of gobject (LP: #937099)
112+ * Disable flushing to disk after every file access during gathering phase for
113+ a significant speed boost. (LP: #939019)
114
115 [Javier Collado]
116 * Fixed running of disk/read_performance tests (LP: #933528)
117
118=== modified file 'plugins/jobs_prompt.py'
119--- plugins/jobs_prompt.py 2012-02-15 16:31:04 +0000
120+++ plugins/jobs_prompt.py 2012-02-22 22:10:23 +0000
121@@ -65,6 +65,13 @@
122 ("report-job", self.report_job)]:
123 self._manager.reactor.call_on(rt, rh)
124
125+ #This should fire first thing during the gathering phase.
126+ self._manager.reactor.call_on("gather", self.begin_gather, -900)
127+
128+ #This should fire last during gathering (i.e. after
129+ #all other gathering callbacks are finished)
130+ self._manager.reactor.call_on("gather", self.end_gather, 900)
131+
132 def begin_persist(self, persist):
133 self._persist = persist
134
135@@ -72,6 +79,14 @@
136 if not recover:
137 self.store.delete_all_messages()
138
139+ def begin_gather(self):
140+ #Speed boost during the gathering phase. Not critical data anyway.
141+ self.store.safe_file_closing = False
142+
143+ def end_gather(self):
144+ #Back to saving data very carefully once gathering is done.
145+ self.store.safe_file_closing = True
146+
147 def ignore_jobs(self, jobs):
148 self._ignore = jobs
149
150
151=== modified file 'plugins/persist_info.py'
152--- plugins/persist_info.py 2011-10-03 10:50:20 +0000
153+++ plugins/persist_info.py 2012-02-22 22:10:23 +0000
154@@ -42,6 +42,13 @@
155 # Save persist data last
156 self._manager.reactor.call_on("stop", self.save, 1000)
157
158+ #This should fire first thing during the gathering phase.
159+ self._manager.reactor.call_on("gather", self.begin_gather, -900)
160+
161+ #This should fire last during gathering (i.e. after
162+ #all other gathering callbacks are finished)
163+ self._manager.reactor.call_on("gather", self.end_gather, 900)
164+
165 def begin(self, interface=None):
166 if self.persist is None:
167 self.persist = Persist(self.filename)
168@@ -52,5 +59,12 @@
169 if self.persist:
170 self.persist.save()
171
172+ def begin_gather(self):
173+ #Speed boost during the gathering phase. Not critical data anyway.
174+ self.persist._backend.safe_file_closing = False
175+
176+ def end_gather(self):
177+ #Back to saving data very carefully once gathering is done.
178+ self.persist._backend.safe_file_closing = True
179
180 factory = PersistInfo

Subscribers

People subscribed via source and target branches