Merge lp:~1chb1n/charm-helpers/amulet-svc-restart-race into lp:charm-helpers
- amulet-svc-restart-race
- Merge into devel
Status: | Merged |
---|---|
Merged at revision: | 444 |
Proposed branch: | lp:~1chb1n/charm-helpers/amulet-svc-restart-race |
Merge into: | lp:charm-helpers |
Diff against target: |
241 lines (+101/-53) 1 file modified
charmhelpers/contrib/amulet/utils.py (+101/-53) |
To merge this branch: | bzr merge lp:~1chb1n/charm-helpers/amulet-svc-restart-race |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Billy Olsen | Approve | ||
Adam Collard (community) | Abstain | ||
charmers | Pending | ||
Review via email: mp+269098@code.launchpad.net |
Commit message
Description of the change
Add exception handling to the retry logic in service_
Add deprecation warn re: service_restarted amulet helper, in favor of validate_
FYI - The following charm MPs for updated tests are dependent on this C-H change landing first:
https:/
https:/
https:/
Ryan Beisner (1chb1n) wrote : | # |
See comment on the same bug. Thank you.
Ryan Beisner (1chb1n) wrote : | # |
Since pgrep -p doesn't work on Precise, and we already have another helper to get a process ID, I've refactored _get_proc_
Note that that renders pgrep_full a no-op, and I've also updated the calling helpers to ensure the user receives a deprecation WARN about that, while still ensuring that the default use of the refactored _get_proc_
Ryan Beisner (1chb1n) wrote : | # |
FYI, the proposed charm-helpers changes are working well in 2 charms where the race was prevalent, and in 1 charm where the race was seldom seen (keystone):
# openstack-dashboard
https:/
# neutron-api
https:/
# keystone
https:/
Adam Collard (adam-collard) : | # |
Ryan Beisner (1chb1n) wrote : | # |
As we are needing to move forward with this and the child MPs, and we have addressed the original concerns, and Adam isn't in a position to be able to fully review at this time, I'm adding Wolsen back for a review.
PS. Thanks to Adam for raising that issue, so that we could propose something even more robust.
Billy Olsen (billy-olsen) wrote : | # |
So I've made a few (mostly) editorial comments in-line. I think at the root of whether this should go ahead and go forward or not is based upon whether there exists any implementation dependent upon the pgrep_full behavior. I'm a big +1 on the switch to the use of pidof rather than pgrep because the pgrep implementation allows charm authors to fall into an obscure bug here if they were not careful enough (though admittedly, it'd be due to some lazy coding which could be fixed via being more explicit).
The intention of the pgrep_full flag appears to be to allow amulet test authors to also search script processes rather than just executable names. The -x parameter for pidof allows the same thing to occur. If we define the pgrep_full flag as 'also match script names' rather than the literal interpretation of pgrep_full, then the flag doesn't need to be deprecated.
Most of my in-line comments are based upon the parameter being deprecated. After thinking hard, I'm not entirely sure its required to be deprecated if there is a successful argument in the intent outweighing what is conveyed by the name of the parameter.
Ryan Beisner (1chb1n) wrote : | # |
I totally know what you're saying, this is a tricky one for sure. There are other methods there, intended to check the same thing, but they are also broken in some way, and are basically dying on the vine (and without user warnings).
Deprecation reasoning:
The pgrep_full option will be ill-named after switching to pidof behind the scenes.
We could preserve the parameter and just re-plumb it to pidof's -x, but that would be confusing to consume, and no longer self-descriptive, if not actually misleading.
We could add a new option, such as check_scripts_too, and perhaps just aliased pgrep_full for backward compatibility. That would still be confusing in that the pgrep_full option is out of context, and in that we are inviting user error by exposing that control in the first place.
Which brought me to: can we just get to the core of the intention of the method, and make it smart enough to just do-the-
ie. my_thing_
We are already using pidof -x reliably in an existing is_my_service_
And that is the journey of my arrival at: just deprecate it and announce in any case that it is defined.
I did backwards-
I recognize that there is a remote possibility that someone outside our awareness is using that helper function, and this could possibly break them. I think we've minimized that risk, and we've given user feedback appropriately. That's always debatable of course.
So..
What I want to avoid is adding yet-another rewrite of a my_thing_
Ideas?
Ryan Beisner (1chb1n) wrote : | # |
See replies inline. The more I look over this, and it's ancestors, the more I want to just revert all of this and add a new helper that just works.
Talk me out of doing that?
Billy Olsen (billy-olsen) wrote : | # |
Okay, let's go deprecation warnings for the parameter, which is where you went anyway. I probably should not have gone in the direction that I went in with the possibility of not deprecating the parameter. The deprecation warnings serve to notify charm authors but also to allow for this to be a drop-in replacement with as little backwards compatibility issues as possible, so it gets my upvote for that.
Please do NOT add a new option for checking scripts and just use the pidof -x. By deprecating the pgrep_full option we're favoring the desirable approach. If someone can make a really strong use case for adding another flag lets consider adding the flag or something at that point.
I haven't gone through all the ancestry for the method yet, but if we add yet another way then it becomes to confusing as to which method to use. So let's just do the right thing and fix the behavior to work as advertised/expected as this MP is doing.
Ryan Beisner (1chb1n) wrote : | # |
Thanks again for the good questions and replies and advice.
Add'l replies in-line.
Billy Olsen (billy-olsen) wrote : | # |
Ah, thanks for the updates Ryan - I think we're now fully on the same page. Just a few minor edits then I guess and it gets my +1.
- 438. By Ryan Beisner
-
update comments, pgrep_full conditional
Billy Olsen (billy-olsen) wrote : | # |
Thanks for making the changes. LGTM - I added an additional comment for follow up later but it shouldn't hold this MP up in my mind.
Ryan Beisner (1chb1n) wrote : | # |
Replied inline. Thanks again!
Billy Olsen (billy-olsen) wrote : | # |
One more back at ya. Let's get it merged.
Ryan Beisner (1chb1n) wrote : | # |
Thanks again, are we ready to land this, so we can also land the related charm test updates?
Billy Olsen (billy-olsen) wrote : | # |
Yes, I think this is ready to be landed.
Preview Diff
1 | === modified file 'charmhelpers/contrib/amulet/utils.py' |
2 | --- charmhelpers/contrib/amulet/utils.py 2015-08-17 10:47:36 +0000 |
3 | +++ charmhelpers/contrib/amulet/utils.py 2015-09-02 01:09:48 +0000 |
4 | @@ -114,7 +114,7 @@ |
5 | # /!\ DEPRECATION WARNING (beisner): |
6 | # New and existing tests should be rewritten to use |
7 | # validate_services_by_name() as it is aware of init systems. |
8 | - self.log.warn('/!\\ DEPRECATION WARNING: use ' |
9 | + self.log.warn('DEPRECATION WARNING: use ' |
10 | 'validate_services_by_name instead of validate_services ' |
11 | 'due to init system differences.') |
12 | |
13 | @@ -269,33 +269,52 @@ |
14 | """Get last modification time of directory.""" |
15 | return sentry_unit.directory_stat(directory)['mtime'] |
16 | |
17 | - def _get_proc_start_time(self, sentry_unit, service, pgrep_full=False): |
18 | - """Get process' start time. |
19 | - |
20 | - Determine start time of the process based on the last modification |
21 | - time of the /proc/pid directory. If pgrep_full is True, the process |
22 | - name is matched against the full command line. |
23 | - """ |
24 | - if pgrep_full: |
25 | - cmd = 'pgrep -o -f {}'.format(service) |
26 | - else: |
27 | - cmd = 'pgrep -o {}'.format(service) |
28 | - cmd = cmd + ' | grep -v pgrep || exit 0' |
29 | - cmd_out = sentry_unit.run(cmd) |
30 | - self.log.debug('CMDout: ' + str(cmd_out)) |
31 | - if cmd_out[0]: |
32 | - self.log.debug('Pid for %s %s' % (service, str(cmd_out[0]))) |
33 | - proc_dir = '/proc/{}'.format(cmd_out[0].strip()) |
34 | - return self._get_dir_mtime(sentry_unit, proc_dir) |
35 | + def _get_proc_start_time(self, sentry_unit, service, pgrep_full=None): |
36 | + """Get start time of a process based on the last modification time |
37 | + of the /proc/pid directory. |
38 | + |
39 | + :sentry_unit: The sentry unit to check for the service on |
40 | + :service: service name to look for in process table |
41 | + :pgrep_full: [Deprecated] Use full command line search mode with pgrep |
42 | + :returns: epoch time of service process start |
43 | + :param commands: list of bash commands |
44 | + :param sentry_units: list of sentry unit pointers |
45 | + :returns: None if successful; Failure message otherwise |
46 | + """ |
47 | + if pgrep_full is not None: |
48 | + # /!\ DEPRECATION WARNING (beisner): |
49 | + # No longer implemented, as pidof is now used instead of pgrep. |
50 | + # https://bugs.launchpad.net/charm-helpers/+bug/1474030 |
51 | + self.log.warn('DEPRECATION WARNING: pgrep_full bool is no ' |
52 | + 'longer implemented re: lp 1474030.') |
53 | + |
54 | + pid_list = self.get_process_id_list(sentry_unit, service) |
55 | + pid = pid_list[0] |
56 | + proc_dir = '/proc/{}'.format(pid) |
57 | + self.log.debug('Pid for {} on {}: {}'.format( |
58 | + service, sentry_unit.info['unit_name'], pid)) |
59 | + |
60 | + return self._get_dir_mtime(sentry_unit, proc_dir) |
61 | |
62 | def service_restarted(self, sentry_unit, service, filename, |
63 | - pgrep_full=False, sleep_time=20): |
64 | + pgrep_full=None, sleep_time=20): |
65 | """Check if service was restarted. |
66 | |
67 | Compare a service's start time vs a file's last modification time |
68 | (such as a config file for that service) to determine if the service |
69 | has been restarted. |
70 | """ |
71 | + # /!\ DEPRECATION WARNING (beisner): |
72 | + # This method is prone to races in that no before-time is known. |
73 | + # Use validate_service_config_changed instead. |
74 | + |
75 | + # NOTE(beisner) pgrep_full is no longer implemented, as pidof is now |
76 | + # used instead of pgrep. pgrep_full is still passed through to ensure |
77 | + # deprecation WARNS. lp1474030 |
78 | + self.log.warn('DEPRECATION WARNING: use ' |
79 | + 'validate_service_config_changed instead of ' |
80 | + 'service_restarted due to known races.') |
81 | + |
82 | time.sleep(sleep_time) |
83 | if (self._get_proc_start_time(sentry_unit, service, pgrep_full) >= |
84 | self._get_file_mtime(sentry_unit, filename)): |
85 | @@ -304,15 +323,15 @@ |
86 | return False |
87 | |
88 | def service_restarted_since(self, sentry_unit, mtime, service, |
89 | - pgrep_full=False, sleep_time=20, |
90 | - retry_count=2): |
91 | + pgrep_full=None, sleep_time=20, |
92 | + retry_count=2, retry_sleep_time=30): |
93 | """Check if service was been started after a given time. |
94 | |
95 | Args: |
96 | sentry_unit (sentry): The sentry unit to check for the service on |
97 | mtime (float): The epoch time to check against |
98 | service (string): service name to look for in process table |
99 | - pgrep_full (boolean): Use full command line search mode with pgrep |
100 | + pgrep_full: [Deprecated] Use full command line search mode with pgrep |
101 | sleep_time (int): Seconds to sleep before looking for process |
102 | retry_count (int): If service is not found, how many times to retry |
103 | |
104 | @@ -321,30 +340,44 @@ |
105 | False if service is older than mtime or if service was |
106 | not found. |
107 | """ |
108 | - self.log.debug('Checking %s restarted since %s' % (service, mtime)) |
109 | + # NOTE(beisner) pgrep_full is no longer implemented, as pidof is now |
110 | + # used instead of pgrep. pgrep_full is still passed through to ensure |
111 | + # deprecation WARNS. lp1474030 |
112 | + |
113 | + unit_name = sentry_unit.info['unit_name'] |
114 | + self.log.debug('Checking that %s service restarted since %s on ' |
115 | + '%s' % (service, mtime, unit_name)) |
116 | time.sleep(sleep_time) |
117 | - proc_start_time = self._get_proc_start_time(sentry_unit, service, |
118 | - pgrep_full) |
119 | - while retry_count > 0 and not proc_start_time: |
120 | - self.log.debug('No pid file found for service %s, will retry %i ' |
121 | - 'more times' % (service, retry_count)) |
122 | - time.sleep(30) |
123 | - proc_start_time = self._get_proc_start_time(sentry_unit, service, |
124 | - pgrep_full) |
125 | - retry_count = retry_count - 1 |
126 | + proc_start_time = None |
127 | + tries = 0 |
128 | + while tries <= retry_count and not proc_start_time: |
129 | + try: |
130 | + proc_start_time = self._get_proc_start_time(sentry_unit, |
131 | + service, |
132 | + pgrep_full) |
133 | + self.log.debug('Attempt {} to get {} proc start time on {} ' |
134 | + 'OK'.format(tries, service, unit_name)) |
135 | + except IOError: |
136 | + # NOTE(beisner) - race avoidance, proc may not exist yet. |
137 | + # https://bugs.launchpad.net/charm-helpers/+bug/1474030 |
138 | + self.log.debug('Attempt {} to get {} proc start time on {} ' |
139 | + 'failed'.format(tries, service, unit_name)) |
140 | + time.sleep(retry_sleep_time) |
141 | + tries += 1 |
142 | |
143 | if not proc_start_time: |
144 | self.log.warn('No proc start time found, assuming service did ' |
145 | 'not start') |
146 | return False |
147 | if proc_start_time >= mtime: |
148 | - self.log.debug('proc start time is newer than provided mtime' |
149 | - '(%s >= %s)' % (proc_start_time, mtime)) |
150 | + self.log.debug('Proc start time is newer than provided mtime' |
151 | + '(%s >= %s) on %s (OK)' % (proc_start_time, |
152 | + mtime, unit_name)) |
153 | return True |
154 | else: |
155 | - self.log.warn('proc start time (%s) is older than provided mtime ' |
156 | - '(%s), service did not restart' % (proc_start_time, |
157 | - mtime)) |
158 | + self.log.warn('Proc start time (%s) is older than provided mtime ' |
159 | + '(%s) on %s, service did not ' |
160 | + 'restart' % (proc_start_time, mtime, unit_name)) |
161 | return False |
162 | |
163 | def config_updated_since(self, sentry_unit, filename, mtime, |
164 | @@ -361,12 +394,15 @@ |
165 | bool: True if file was modified more recently than mtime, False if |
166 | file was modified before mtime, |
167 | """ |
168 | - self.log.debug('Checking %s updated since %s' % (filename, mtime)) |
169 | + self.log.debug('Checking that %s file updated since ' |
170 | + '%s' % (filename, mtime)) |
171 | + unit_name = sentry_unit.info['unit_name'] |
172 | time.sleep(sleep_time) |
173 | file_mtime = self._get_file_mtime(sentry_unit, filename) |
174 | if file_mtime >= mtime: |
175 | self.log.debug('File mtime is newer than provided mtime ' |
176 | - '(%s >= %s)' % (file_mtime, mtime)) |
177 | + '(%s >= %s) on %s (OK)' % (file_mtime, mtime, |
178 | + unit_name)) |
179 | return True |
180 | else: |
181 | self.log.warn('File mtime %s is older than provided mtime %s' |
182 | @@ -374,8 +410,9 @@ |
183 | return False |
184 | |
185 | def validate_service_config_changed(self, sentry_unit, mtime, service, |
186 | - filename, pgrep_full=False, |
187 | - sleep_time=20, retry_count=2): |
188 | + filename, pgrep_full=None, |
189 | + sleep_time=20, retry_count=2, |
190 | + retry_sleep_time=30): |
191 | """Check service and file were updated after mtime |
192 | |
193 | Args: |
194 | @@ -383,9 +420,10 @@ |
195 | mtime (float): The epoch time to check against |
196 | service (string): service name to look for in process table |
197 | filename (string): The file to check mtime of |
198 | - pgrep_full (boolean): Use full command line search mode with pgrep |
199 | - sleep_time (int): Seconds to sleep before looking for process |
200 | + pgrep_full: [Deprecated] Use full command line search mode with pgrep |
201 | + sleep_time (int): Initial sleep in seconds to pass to test helpers |
202 | retry_count (int): If service is not found, how many times to retry |
203 | + retry_sleep_time (int): Time in seconds to wait between retries |
204 | |
205 | Typical Usage: |
206 | u = OpenStackAmuletUtils(ERROR) |
207 | @@ -402,15 +440,25 @@ |
208 | mtime, False if service is older than mtime or if service was |
209 | not found or if filename was modified before mtime. |
210 | """ |
211 | - self.log.debug('Checking %s restarted since %s' % (service, mtime)) |
212 | - time.sleep(sleep_time) |
213 | - service_restart = self.service_restarted_since(sentry_unit, mtime, |
214 | - service, |
215 | - pgrep_full=pgrep_full, |
216 | - sleep_time=0, |
217 | - retry_count=retry_count) |
218 | - config_update = self.config_updated_since(sentry_unit, filename, mtime, |
219 | - sleep_time=0) |
220 | + |
221 | + # NOTE(beisner) pgrep_full is no longer implemented, as pidof is now |
222 | + # used instead of pgrep. pgrep_full is still passed through to ensure |
223 | + # deprecation WARNS. lp1474030 |
224 | + |
225 | + service_restart = self.service_restarted_since( |
226 | + sentry_unit, mtime, |
227 | + service, |
228 | + pgrep_full=pgrep_full, |
229 | + sleep_time=sleep_time, |
230 | + retry_count=retry_count, |
231 | + retry_sleep_time=retry_sleep_time) |
232 | + |
233 | + config_update = self.config_updated_since( |
234 | + sentry_unit, |
235 | + filename, |
236 | + mtime, |
237 | + sleep_time=0) |
238 | + |
239 | return service_restart and config_update |
240 | |
241 | def get_sentry_time(self, sentry_unit): |
See https:/ /bugs.launchpad .net/charm- helpers/ +bug/1474030/ comments/ 2 for explanation on why this isn't sufficient