1
=== modified file '.bzrignore'
2
--- .bzrignore	2014-03-10 22:25:00 +0000
3
+++ .bzrignore	2014-04-01 06:59:36 +0000
4
@@ -1,6 +1,10 @@
5
1
# For all dependencies running from source
1
# For all dependencies running from source
6
2
./branches/*
2
./branches/*
7
3
# Where sphinx puts all its produced files
8
3
./docs/_build
4
./docs/_build
9
5
# We ignore the new .png files created for dot, so they will have to be bzr
10
6
# added explictely.
11
7
./docs/images/*.png
12
4
.deps
8
.deps
13
5
*.egg-info
9
*.egg-info
14
6
*.pyc
10
*.pyc
15
7
11
16
=== modified file 'docs/Makefile'
17
--- docs/Makefile	2013-11-16 10:12:08 +0000
18
+++ docs/Makefile	2014-04-01 06:59:36 +0000
19
@@ -38,10 +38,17 @@
20
38
	@echo "  linkcheck  to check all external links for integrity"
38
	@echo "  linkcheck  to check all external links for integrity"
21
39
	@echo "  doctest    to run all doctests embedded in the documentation (if enabled)"
39
	@echo "  doctest    to run all doctests embedded in the documentation (if enabled)"
22
40
40
23
41
SCHEMAS=$(wildcard images/*.dot)
24
42
PNGS=${SCHEMAS:images/%.dot=images/%.png}
25
43
26
44
%.png : %.dot
27
45
	dot -Tpng $< -o$@
28
46
29
41
clean:
47
clean:
30
42
	-rm -rf $(BUILDDIR)/*
48
	-rm -rf $(BUILDDIR)/*
31
43
49
33
44
html:
50
html: 
34
51
html: ${PNGS}
35
45
	$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
52
	$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
36
46
	@echo
53
	@echo
37
47
	@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
54
	@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
38
48
55
39
=== added file 'docs/architecture.rst'
40
--- docs/architecture.rst	1970-01-01 00:00:00 +0000
41
+++ docs/architecture.rst	2014-04-01 06:59:36 +0000
42
@@ -0,0 +1,134 @@
43
1
======
44
2
Engine
45
3
======
46
4
47
5
The engine accepts tickets that follow a task workflow. Each task is atomic
48
6
and succeed or fail. Some tasks can be retried allowing the ticket to
49
7
complete successfully. Other tasks can't be retried in which case the ticket
50
8
fails.
51
9
52
10
The engine outputs include:
53
11
- binary packages,
54
12
- images,
55
13
- test failures,
56
14
- logs and metrics associated with any of the above.
57
15
58
16
========
59
17
Workflow
60
18
========
61
19
62
20
A workflow describes a list of tasks that should succeed for a ticket to
63
21
succeed. The ticket state represents the place where a ticket is in the
64
22
workflow at a given point in time.
65
23
66
24
A single ticket worker is responsible for a given ticket, schedules and
67
25
monitors tasks according to the ticket workflow.
68
26
69
27
From the ticket worker, a task can:
70
28
71
29
- succeed and change the place of the ticket,
72
30
73
31
- fail and not change the place of the ticket, if needed, a new task is
74
32
  created,
75
33
76
34
- hang and therefore not change the place of the ticket. The ticket worker
77
35
  will kill the task when a timeout is reached, a killed task fails.
78
36
79
37
The following state automaton captures the above definition:
80
38
81
39
.. image:: images/ticket-worker.png
82
40
83
41
84
42
85
43
The ticket worker owns the ticket and its state. A ticket state changes
86
44
under the worker responsibility in a an atomic (and persistent) way when a
87
45
task completes (success or failure).
88
46
89
47
90
48
If a ticket worker dies, another ticket worker will takes ownership of the
91
49
ticket and acquire the ticket state from the persistent storage.
92
50
93
51
====
94
52
Task
95
53
====
96
54
97
55
A task:
98
56
99
57
- has a task id including the ticket-id,
100
58
101
59
- a task receives an incoming message and produces an outgoing message,
102
60
103
61
- acquire an incoming message uniquely defining a task including the task
104
62
  id,
105
63
106
64
- the task setup its environment from the message content only. If this
107
65
  fails the incoming message is nacked,
108
66
109
67
- the task does its core job (build a package, an image, run tests). If this
110
68
  fails the incoming message is nacked,
111
69
112
70
- the task upload its outputs (uniquely identified with the task id to
113
71
  swift). If this fails, the incoming message is nacked,
114
72
115
73
- the task send an outgoing message listing the outputs to another queue. If
116
74
  that fails the incoming message is nacked,
117
75
118
76
- the task tears down its environment. We don't care if that fails. If that
119
77
  leads to a worker dieing, another worker will step up.
120
78
121
79
122
80
========
123
81
RabbitMQ
124
82
========
125
83
126
84
Rabbit provides support for a message store and forward protocol.
127
85
128
86
This guarantees that no messages are lost once they enter the queues
129
87
("store"). They also guarantees that a message is not stuck in queues as
130
88
long as consumers exist or appear after a reasonable time ("forward").
131
89
132
90
We have two use cases:
133
91
134
92
- a single server that never fail,
135
93
136
94
- a cluster of servers that never fail
137
95
  ([http://www.rabbitmq.com/ha.html|High availability]], one AZ, no
138
96
  [[|http://www.rabbitmq.com/partitions.html|net partitions]]).
139
97
140
98
The first one is what we have for phase-0 and is enough for most of our
141
99
tests. We'll need some specific tests to cover the scenarios we care about
142
100
in the cluster case.
143
101
144
102
The outcome is that we can rely on the following properties:
145
103
146
104
- a message that entered a queue will never be lost,
147
105
148
106
- a message that left a queue will never be lost.
149
107
150
108
The later case has two applications:
151
109
152
110
- a output message guarantees that a task is done, if that fails the message
153
111
  stays in the queue,
154
112
155
113
- a worker acquiring an output message will always produce an input message
156
114
  in another queue. If that fails the output message stays in the
157
115
  queue. There is a caveat here as the input message in the other queue
158
116
  won't be deleted if the output message cannot be acked.
159
117
160
118
In summary, while we have the guaranty that a message will never be lost, we
161
119
may encounter cases where duplicate messages appear in the system.
162
120
163
121
To address the duplicate messages we need a way to identify their intent
164
122
uniquely. In our case, this is the ticket id and the task id.
165
123
166
124
At the workflow level, for a given ticket, we can identify and ignore
167
125
duplicate messages.
168
126
169
127
=====
170
128
Swift
171
129
=====
172
130
173
131
Tasks produce artifacts, logs and results that are stored securely in swift.
174
132
175
133
If a task fails to upload an object, it fails and nacks its incoming message.
176
134
177
0
135
178
=== added file 'docs/images/ticket-worker.dot'
179
--- docs/images/ticket-worker.dot	1970-01-01 00:00:00 +0000
180
+++ docs/images/ticket-worker.dot	2014-04-01 06:59:36 +0000
181
@@ -0,0 +1,12 @@
182
1
digraph "test worker state automaton" {
183
2
        "created" [peripheries=2]
184
3
        "done" [peripheries=2]
185
4
        "created" -> "started" [label="setup"]
186
5
        "started" -> "waiting on task" [label="do_task"]
187
6
        "waiting on task" -> "task succeeded" [label="success"]
188
7
        "task succeeded" -> "done" [label="tears down"]
189
8
        "waiting on task" -> "task failed" [label="fails"]
190
9
        "waiting on task" -> "task failed" [label="task_times_out"]
191
10
        "task failed" -> "waiting on task" [label="do_task"]
192
11
        "task failed" -> "done" [label="tears_down"]
193
12
}
Status:	Work in progress
Proposed branch:	lp:~vila/uci-engine/ideas
Merge into:	lp:uci-engine
Diff against target:	193 lines (+158/-1) 4 files modified .bzrignore (+4/-0) docs/Makefile (+8/-1) docs/architecture.rst (+134/-0) docs/images/ticket-worker.dot (+12/-0)
To merge this branch:	bzr merge lp:~vila/uci-engine/ideas
Related bugs:	Link a bug report
Reviewer	Review Type	Date Requested	Status
Canonical CI Engineering		2014-04-01	Pending
Review via email: mp+213601@code.launchpad.net