Merge ~sylvain-pineau/checkbox-ng:remote_resume_after_reboot_V2 into checkbox-ng:master

Proposed by Sylvain Pineau
Status: Merged
Approved by: Sylvain Pineau
Approved revision: ec339fae4f096036203cf1b579c56ac71054b04f
Merged at revision: 321808ffa9f1a1bf05f6eeacb43835b01cb61891
Proposed branch: ~sylvain-pineau/checkbox-ng:remote_resume_after_reboot_V2
Merge into: checkbox-ng:master
Diff against target: 436 lines (+216/-22)
5 files modified
checkbox_ng/launcher/remote.py (+48/-5)
checkbox_ng/urwid_ui.py (+55/-0)
plainbox/impl/session/assistant.py (+2/-1)
plainbox/impl/session/remote_assistant.py (+54/-14)
plainbox/impl/session/restart.py (+57/-2)
Reviewer Review Type Date Requested Status
Jonathan Cave (community) Approve
Sylvain Pineau (community) Needs Resubmitting
Maciej Kisielewski (community) Approve
Review via email: mp+362852@code.launchpad.net

Description of the change

Improved version of https://code.launchpad.net/~sylvain-pineau/checkbox-ng/+git/checkbox-ng/+merge/362546

There's no Resumed state, the resume event info happening on the slave is sent via the payload data.
the run_jobs method gets a new argument to process it to essentially display the resumed job id and its outcome.

On slave side, the new version of this MR saves the launcher settings in order to set them again when resuming (i.g auto_retry parameters). It gets also support for jobs altering the result via $CHECKBOX_DATA/__result.

One bonus since I bumped the API, stderr streams are colored in RED.

Tested using the same env/methodology as the previous MR (and to see how the new one evolved).

Warning: I still consider this version as a first set of commits to fully support reboots/autorestart/noreturn jobs. rebooting once works fine but for jobs like 30 suspends + 3 reboots handled by the job itself it won't as the service will start on boot and try to resume the session. For such complex scenario disabling the systemd unit and activating it again via the __respawn way is probably the best solution but deserves its own MR.

To post a comment you must log in.
Revision history for this message
Maciej Kisielewski (kissiel) wrote :

Amazing work, huge +1.
I’d only change that reraising. Good to land after testing that we in fact don’t need that try/except block

review: Approve
Revision history for this message
Sylvain Pineau (sylvain-pineau) wrote :

I've removed the re raised exception. tested, ctrl-c lead exactly to the same screen. Good catch!

review: Needs Resubmitting
Revision history for this message
Jonathan Cave (jocave) wrote :

One tiny flake warning and a question if I may...

review: Needs Information
Revision history for this message
Sylvain Pineau (sylvain-pineau) wrote :

flake error fixed.

Regarding your question, of course /tmp will go away if you reboot your development system. But the debug mode was put in place to actually avoid having to reboot the system. Interrupting the slave with a CTRL-C will do the same (i.e going through all the strategy methods, prime&diffuse). So yes, the debug mode is not meant to validate the restart strategy with "real" reboots. It's just a convenience way to simulate them and speed up the debugging/development of features related to UC restarts on your development machine (i.e your laptop running classic)

Revision history for this message
Jonathan Cave (jocave) wrote :

Understood, thanks.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/checkbox_ng/launcher/remote.py b/checkbox_ng/launcher/remote.py
2index 03c600c..b884a4f 100644
3--- a/checkbox_ng/launcher/remote.py
4+++ b/checkbox_ng/launcher/remote.py
5@@ -44,11 +44,13 @@ from plainbox.impl.color import Colorizer
6 from plainbox.impl.launcher import DefaultLauncherDefinition
7 from plainbox.impl.secure.sudo_broker import SudoProvider
8 from plainbox.impl.session.remote_assistant import RemoteSessionAssistant
9+from plainbox.impl.session.restart import RemoteSnappyRestartStrategy
10 from plainbox.vendor import rpyc
11 from plainbox.vendor.rpyc.utils.server import ThreadedServer
12 from checkbox_ng.urwid_ui import test_plan_browser
13 from checkbox_ng.urwid_ui import CategoryBrowser
14 from checkbox_ng.urwid_ui import interrupt_dialog
15+from checkbox_ng.urwid_ui import resume_dialog
16 from checkbox_ng.launcher.run import NormalUI
17 from checkbox_ng.launcher.stages import MainLoopStage
18 from checkbox_ng.launcher.stages import ReportsStage
19@@ -82,7 +84,10 @@ class SimpleUI(NormalUI, MainLoopStage):
20 print(SimpleUI.C.header(header, fill='-'))
21
22 def green_text(text, end='\n'):
23- print(SimpleUI.C.GREEN(text), end)
24+ print(SimpleUI.C.GREEN(text), end=end)
25+
26+ def red_text(text, end='\n'):
27+ print(SimpleUI.C.RED(text), end=end)
28
29 def horiz_line():
30 print(SimpleUI.C.WHITE('-' * 80))
31@@ -124,7 +129,22 @@ class RemoteSlave(Command):
32
33 SessionAssistantSlave.session_assistant = RemoteSessionAssistant(
34 lambda s: [sys.argv[0] + ' remote-service --resume'])
35- if ctx.args.resume:
36+ snap_data = os.getenv('SNAP_DATA')
37+ remote_restart_stragegy_debug = os.getenv('REMOTE_RESTART_DEBUG')
38+ if snap_data or remote_restart_stragegy_debug:
39+ if remote_restart_stragegy_debug:
40+ strategy = RemoteSnappyRestartStrategy(debug=True)
41+ else:
42+ strategy = RemoteSnappyRestartStrategy()
43+ if os.path.exists(strategy.session_resume_filename):
44+ with open(strategy.session_resume_filename, 'rt') as f:
45+ session_id = f.readline()
46+ try:
47+ SessionAssistantSlave.session_assistant.resume_by_id(
48+ session_id)
49+ except StopIteration:
50+ print("Couldn't resume the session")
51+ elif ctx.args.resume:
52 try:
53 SessionAssistantSlave.session_assistant.resume_last()
54 except StopIteration:
55@@ -236,7 +256,8 @@ class RemoteMaster(Command, ReportsStage, MainLoopStage):
56 'idle': self.new_session,
57 'running': self.wait_and_continue,
58 'finalizing': self.finish_session,
59- 'testsselected': self.run_jobs,
60+ 'testsselected': partial(
61+ self.run_jobs, resumed_session_info=payload),
62 'bootstrapping': self.restart,
63 'bootstrapped': partial(
64 self.select_jobs, all_jobs=payload),
65@@ -363,6 +384,7 @@ class RemoteMaster(Command, ReportsStage, MainLoopStage):
66 return True
67
68 def finish_session(self):
69+ print(self.C.header("Results"))
70 if self.launcher.local_submission:
71 # Disable SIGINT while we save local results
72 tmp_sig = signal.signal(signal.SIGINT, signal.SIG_IGN)
73@@ -379,7 +401,24 @@ class RemoteMaster(Command, ReportsStage, MainLoopStage):
74 self.wait_for_job()
75 self.run_jobs()
76
77- def run_jobs(self):
78+ def _handle_last_job_after_resume(self, resumed_session_info):
79+ if self.launcher.ui_type == 'silent':
80+ time.sleep(20)
81+ else:
82+ resume_dialog(10)
83+ jobs_repr = self.sa.get_jobs_repr([resumed_session_info['last_job']])
84+ job = jobs_repr[-1]
85+ SimpleUI.header(job['name'])
86+ print(_("ID: {0}").format(job['id']))
87+ print(_("Category: {0}").format(job['category_name']))
88+ SimpleUI.horiz_line()
89+ print(
90+ _("Outcome") + ": " +
91+ SimpleUI.C.result(self.sa.get_job_result(job['id'])))
92+
93+ def run_jobs(self, resumed_session_info=None):
94+ if resumed_session_info:
95+ self._handle_last_job_after_resume(resumed_session_info)
96 _logger.info("master: Running jobs.")
97 jobs = self.sa.get_session_progress()
98 _logger.debug("master: Jobs to be run:\n%s",
99@@ -406,7 +445,11 @@ class RemoteMaster(Command, ReportsStage, MainLoopStage):
100 while True:
101 state, payload = self.sa.monitor_job()
102 if payload and not self._is_bootstrapping:
103- SimpleUI.green_text(payload, end='')
104+ for stream, line in payload:
105+ if stream == 'stderr':
106+ SimpleUI.red_text(line, end='')
107+ else:
108+ SimpleUI.green_text(line, end='')
109 if state == 'running':
110 time.sleep(0.5)
111 while True:
112diff --git a/checkbox_ng/urwid_ui.py b/checkbox_ng/urwid_ui.py
113index 8820b6b..992a471 100644
114--- a/checkbox_ng/urwid_ui.py
115+++ b/checkbox_ng/urwid_ui.py
116@@ -22,6 +22,8 @@
117 ============================================================
118 """
119
120+import time
121+
122 from gettext import gettext as _
123 import urwid
124
125@@ -621,6 +623,59 @@ def interrupt_dialog(host):
126 return None
127
128
129+class CountdownWidget(urwid.BigText):
130+
131+ def __init__(self, duration):
132+ self._started = time.time()
133+ self._duration = duration
134+ self.set_text('{0:.1f}'.format(duration))
135+ self.font = urwid.HalfBlock6x5Font()
136+ super().__init__(self.get_text()[0], self.font)
137+
138+ def update(self):
139+ remaining = self._duration + self._started - time.time()
140+ if remaining <= 0:
141+ remaining = 0
142+ text = '{0:.1f}'.format(remaining)
143+ self.set_text(text)
144+ print('\33]2;Auto resume remote session in %s\007' % text, end='')
145+ if remaining:
146+ return True
147+ else:
148+ raise urwid.ExitMainLoop
149+
150+
151+def resume_dialog(duration):
152+ palette = [
153+ ('body', 'light gray', 'black', 'standout'),
154+ ('header', 'black', 'light gray', 'bold'),
155+ ('buttnf', 'black', 'light gray'),
156+ ('buttn', 'light gray', 'black', 'bold'),
157+ ('foot', 'light gray', 'black'),
158+ ('start', 'dark green,bold', 'black'),
159+ ]
160+ footer_text = [
161+ ('Press '), ('<CTRL + C>'),
162+ (" to open the cancellation menu")]
163+ timer = CountdownWidget(duration)
164+ timer_pad = urwid.Padding(timer, align='center', width='clip')
165+ timer_fill = urwid.Filler(timer_pad)
166+ title = _("Checkbox slave is about to resume the session!")
167+ header = urwid.AttrWrap(urwid.Padding(urwid.Text(title), left=1), 'header')
168+ footer = urwid.AttrWrap(
169+ urwid.Padding(urwid.Text(footer_text), left=1), 'foot')
170+ frame = urwid.Frame(urwid.AttrWrap(urwid.LineBox(timer_fill), 'body'),
171+ header=header, footer=footer)
172+
173+ def update_timer(loop, timer):
174+ if timer.update():
175+ loop.set_alarm_in(0.1, update_timer, timer)
176+
177+ loop = urwid.MainLoop(frame, palette)
178+ update_timer(loop, timer)
179+ loop.run()
180+
181+
182 def add_widget(id, widget):
183 """Add the widget for a given id."""
184 _widget_cache[id] = widget
185diff --git a/plainbox/impl/session/assistant.py b/plainbox/impl/session/assistant.py
186index 1f67655..9e0c652 100644
187--- a/plainbox/impl/session/assistant.py
188+++ b/plainbox/impl/session/assistant.py
189@@ -675,7 +675,6 @@ class SessionAssistant:
190 io_log_filename=self._runner.get_record_path_for_job(job),
191 ).get_result()
192 self._context.state.update_job_result(job, result)
193- self._metadata.running_job_name = None
194 self._manager.checkpoint()
195 if self._restart_strategy is not None:
196 self._restart_strategy.diffuse_application_restart(self._app_id)
197@@ -686,6 +685,8 @@ class SessionAssistant:
198 else:
199 UsageExpectation.of(self).allowed_calls = {
200 self.select_test_plan: "to save test plan selection",
201+ self.use_alternate_configuration: (
202+ "use an alternate configuration system"),
203 }
204 return self._metadata
205
206diff --git a/plainbox/impl/session/remote_assistant.py b/plainbox/impl/session/remote_assistant.py
207index abd6233..96786d1 100644
208--- a/plainbox/impl/session/remote_assistant.py
209+++ b/plainbox/impl/session/remote_assistant.py
210@@ -42,7 +42,7 @@ from checkbox_ng.launcher.run import SilentUI
211
212 _ = gettext.gettext
213
214-_logger = logging.getLogger("plainbox.session.assistant2")
215+_logger = logging.getLogger("plainbox.session.remote_assistant")
216
217 Interaction = namedtuple('Interaction', ['kind', 'message', 'extra'])
218
219@@ -74,7 +74,8 @@ class BufferedUI(SilentUI):
220 self.clear_buffers()
221
222 def got_program_output(self, stream_name, line):
223- self._queue.put(line.decode(sys.stdout.encoding, 'replace'))
224+ self._queue.put(
225+ (stream_name, line.decode(sys.stdout.encoding, 'replace')))
226 self._whole_queue.put(line)
227
228 def whole_output(self):
229@@ -85,9 +86,9 @@ class BufferedUI(SilentUI):
230
231 def get_output(self):
232 """Returns all the output queued up since previous call."""
233- output = ''
234+ output = []
235 while not self._queue.empty():
236- output += self._queue.get()
237+ output.append(self._queue.get())
238 return output
239
240 def clear_buffers(self):
241@@ -126,7 +127,7 @@ class BackgroundExecutor(Thread):
242 class RemoteSessionAssistant():
243 """Remote execution enabling wrapper for the SessionAssistant"""
244
245- REMOTE_API_VERSION = 4
246+ REMOTE_API_VERSION = 5
247
248 def __init__(self, cmd_callback):
249 _logger.debug("__init__()")
250@@ -154,6 +155,7 @@ class RemoteSessionAssistant():
251 self._jobs_count = 0
252 self._job_index = 0
253 self._currently_running_job = None # XXX: yuck!
254+ self._last_job = None
255 self._current_comments = ""
256 self._last_response = None
257 self._normal_user = ''
258@@ -234,6 +236,8 @@ class RemoteSessionAssistant():
259
260 self._sa.update_app_blob(json.dumps(
261 {'description': session_desc, }).encode("UTF-8"))
262+ self._sa.update_app_blob(json.dumps(
263+ {'launcher': configuration['launcher'], }).encode("UTF-8"))
264
265 self._session_id = self._sa.get_session_id()
266 tps = self._sa.get_test_plans()
267@@ -393,6 +397,8 @@ class RemoteSessionAssistant():
268 payload = (
269 self._job_index, self._jobs_count, self._currently_running_job
270 )
271+ if self._state == TestsSelected and not self._currently_running_job:
272+ payload = {'last_job': self._last_job}
273 elif self._state == Started:
274 payload = self._available_testplans
275 elif self._state == Interacting:
276@@ -491,6 +497,9 @@ class RemoteSessionAssistant():
277 self._state = TestsSelected
278 return candidates
279
280+ def get_job_result(self, job_id):
281+ return self._sa.get_job_state(job_id).result
282+
283 def get_jobs_repr(self, job_ids, offset=0):
284 """
285 Translate jobs into a {'field': 'val'} representations.
286@@ -535,24 +544,55 @@ class RemoteSessionAssistant():
287 return test_info_list
288
289 def resume_last(self):
290+ last = next(self._sa.get_resumable_sessions())
291+ self.resume_by_id(last.id)
292+
293+ def resume_by_id(self, session_id):
294 self._launcher = DefaultLauncherDefinition()
295 self._sa.select_providers(*self._launcher.providers)
296- last = next(self._sa.get_resumable_sessions())
297- meta = self._sa.resume_session(last.id)
298+ resume_candidates = list(self._sa.get_resumable_sessions())
299+ _logger.warning("Resuming session: %r", session_id)
300+ meta = self._sa.resume_session(session_id)
301 app_blob = json.loads(meta.app_blob.decode("UTF-8"))
302+ launcher = app_blob['launcher']
303+ self._launcher.read_string(launcher)
304+ self._sa.use_alternate_configuration(self._launcher)
305 test_plan_id = app_blob['testplan_id']
306 self._sa.select_test_plan(test_plan_id)
307 self._sa.bootstrap()
308- result = MemoryJobResult({
309+ self._last_job = meta.running_job_name
310+
311+ result_dict = {
312 'outcome': IJobResult.OUTCOME_PASS,
313- 'comments': _("Passed after resuming execution")
314- })
315- last_job = meta.running_job_name
316- if last_job:
317+ 'comments': _("Automatically passed after resuming execution"),
318+ }
319+ result_path = os.path.join(
320+ self._sa.get_session_dir(), 'CHECKBOX_DATA', '__result')
321+ if os.path.exists(result_path):
322+ try:
323+ with open(result_path, 'rt') as f:
324+ result_dict = json.load(f)
325+ # the only really important field in the result is
326+ # 'outcome' so let's make sure it doesn't contain
327+ # anything stupid
328+ if result_dict.get('outcome') not in [
329+ 'pass', 'fail', 'skip']:
330+ result_dict['outcome'] = IJobResult.OUTCOME_PASS
331+ except json.JSONDecodeError as e:
332+ pass
333+ result = MemoryJobResult(result_dict)
334+ if self._last_job:
335 try:
336- self._sa.use_job_result(last_job, result)
337+ self._sa.use_job_result(self._last_job, result)
338 except KeyError:
339- raise SystemExit(last_job)
340+ raise SystemExit(self._last_job)
341+
342+ if self._launcher.auto_retry:
343+ for job_id in [job.id for job in self.get_auto_retry_candidates()]:
344+ job_state = self._sa.get_job_state(job_id)
345+ job_state.attempts = self._launcher.max_attempts - len(
346+ job_state.result_history)
347+
348 self._state = TestsSelected
349
350 def finalize_session(self):
351diff --git a/plainbox/impl/session/restart.py b/plainbox/impl/session/restart.py
352index eaa1f4d..0b497e5 100644
353--- a/plainbox/impl/session/restart.py
354+++ b/plainbox/impl/session/restart.py
355@@ -197,6 +197,37 @@ class SnappyRestartStrategy(IRestartStrategy):
356 subprocess.call(['sudo', 'rm', filename])
357
358
359+class RemoteSnappyRestartStrategy(IRestartStrategy):
360+
361+ """
362+ Remote Restart strategy for checkbox snaps.
363+ """
364+
365+ def __init__(self, debug=False):
366+ self.debug = debug
367+ self.session_resume_filename = self.get_session_resume_filename()
368+
369+ def get_session_resume_filename(self) -> str:
370+ if self.debug:
371+ return '/tmp/session_resume'
372+ snap_data = os.getenv('SNAP_DATA')
373+ return os.path.join(snap_data, 'session_resume')
374+
375+ def prime_application_restart(self, app_id: str,
376+ session_id: str, cmd: str) -> None:
377+ with open(self.session_resume_filename, 'wt') as f:
378+ f.write(session_id)
379+
380+ def diffuse_application_restart(self, app_id: str) -> None:
381+ try:
382+ os.remove(self.session_resume_filename)
383+ except OSError as exc:
384+ if exc.errno == errno.ENOENT:
385+ pass
386+ else:
387+ raise
388+
389+
390 def detect_restart_strategy() -> IRestartStrategy:
391 """
392 Detect the restart strategy for the current environment.
393@@ -206,17 +237,41 @@ def detect_restart_strategy() -> IRestartStrategy:
394 :raises LookupError:
395 When no such object can be found.
396 """
397+ # XXX: RemoteSnappyRestartStrategy debug
398+ remote_restart_stragegy_debug = os.getenv('REMOTE_RESTART_DEBUG')
399+ if remote_restart_stragegy_debug:
400+ return RemoteSnappyRestartStrategy(debug=True)
401 # If we are running as a confined Snappy app this variable will have been
402 # set by the launcher script
403 if on_ubuntucore():
404- return SnappyRestartStrategy()
405-
406+ try:
407+ slave_status = subprocess.check_output(
408+ ['snapctl', 'get', 'slave'], universal_newlines=True).rstrip()
409+ if slave_status == 'disabled':
410+ return SnappyRestartStrategy()
411+ else:
412+ return RemoteSnappyRestartStrategy()
413+ except subprocess.CalledProcessError:
414+ return SnappyRestartStrategy()
415+
416+ # Classic + remote service enabled
417+ snap_data = os.getenv('SNAP_DATA')
418+ if snap_data:
419+ try:
420+ slave_status = subprocess.check_output(
421+ ['snapctl', 'get', 'slave'], universal_newlines=True).rstrip()
422+ if slave_status == 'enabled':
423+ return RemoteSnappyRestartStrategy()
424+ except subprocess.CalledProcessError:
425+ pass
426+
427 if os.path.isdir('/etc/xdg/autostart'):
428 # NOTE: Assume this is a terminal application
429 return XDGRestartStrategy(app_terminal=True)
430
431 raise LookupError("Unable to find appropriate strategy.""")
432
433+
434 def get_strategy_by_name(name: str) -> type:
435 """
436 Get restart strategy class identified by a string.

Subscribers

People subscribed via source and target branches