Merge lp:~hazmat/pyjuju/security-specification into lp:pyjuju

Proposed by Kapil Thangavelu
Status: Work in progress
Proposed branch: lp:~hazmat/pyjuju/security-specification
Merge into: lp:pyjuju
Diff against target: 264 lines (+260/-0)
1 file modified
docs/source/drafts/security.rst (+260/-0)
To merge this branch: bzr merge lp:~hazmat/pyjuju/security-specification
Reviewer Review Type Date Requested Status
Gustavo Niemeyer Needs Fixing
Review via email: mp+63921@code.launchpad.net

Description of the change

A basic overview of the security architecture, attack vectors, and next steps. its a bit rambling but hopefully it makes a good start.

To post a comment you must log in.
Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :
Download full text (5.1 KiB)

This is going into a very good direction.

Here are some comments on a first read:

[1]

47 +to each node. Every zookeeper connection can associate principal
48 +credentials to its connection, and all access by that connection is
49 +validated against the per node ACL mapping.

This would be a good place to link some upstream documentation for zk.

[2]

Just in terms of style, on the big reorganization I've standardized the headings
on the three styles:

  1) ===
  2) ---
  3) ~~~

I suggest we try to keep this convention across all documents, otherwise it's
hard to follow what's a sub-header or not (e.g. I wouldn't expect === as a
sub-header of ---)

[3]

55 +An additional zookeeper connected actor responsible for creating principals
56 +and providing an up to date token database.

Do we need to manage an external agent for that?

[4]

65 +Each actor employs a security policy, to determine the ACL map for a given
66 +node path that may create. The policy simply takes the path to the node
67 +to be created, and returns back an ACL map that can be set on the node.

This wasn't very clear on a first read. Some ideas:

- "employs" could mean different things in that context, so I suggest replacing by
  a less open synonym ("uses"?).

- "that may create" feels like missing the subject ("that the actor may
  create?")

- At least the first comma should be dropped.

- s/can be set/must be set/?

[5]

73 Every actor in the system needs its own
74 +unique principal, to provide an auth identity, the credentials for a
75 +principal are known only to the actor utilizing them and transiently
76 +the security agent when they are created.

This sentence isn't reading well.

[6]

78 +Instead of passing principals credentials directly via insecure
79 +channels, an actor creating another actor also establishes a principal
80 +creation token via the security agent. The principal creation token is
81 +a one time use string which can be used to create a principal and its
82 +password, and update the token database.
83 +
84 +The security agent has a simple policy in place regarding principal
85 +names and which actors can create them, ie. a provisioning agent can
86 +create machine principals, but not service unit principals.

It's not clear from the document how that's going to work, and it also feels a
little bit like delegating permissions could be more complexity than is
necessary (e.g. what about the agent itself creating these identities instead
of providing a one time token?). This area needs some debate.

[7]

88 +If a malicious user intercepts the token and uses it, compared with
89 +passing credentials directly it minimizes the time that a third party
90 +has to perform such an interception. Moreover invalid use of a token
91 +can be logged as foresenic information.

This assumes knowledge about about how such interceptions would take place,
or why it minimizes the time, that isn't available in the document up to
then.

Also, we should design the security system in a way that such interceptions
can't take place, rather than minimizing the time an interception can be
made (ssh vs. telnet).

[8]

118 +Additionally services utilize relations to communic...

Read more...

review: Needs Fixing
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Hi Kapil.

I noticed you're suggesting MD5 for the password hashes. I'd suggest going 1 step further and using multiple iterations of MD5. Grid computing has made cracking a single MD5 password trivial. Hash 200,000 times, and at least you require 200,000 times more power to do a mass dictionary attack (and it shouldn't add much time considering how seldom the actual password will need to be checked.

Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

200k+MD5 is a bit too much, but I agree that using HMAC (perhaps HMAC+SHA1) rather than plain MD5 would be good. HMAC is targeted at specifically this use case too.

Revision history for this message
Kapil Thangavelu (hazmat) wrote :

Excerpts from Clint Byrum's message of Thu Jun 09 23:51:31 UTC 2011:
> Hi Kapil.
>
> I noticed you're suggesting MD5 for the password hashes. I'd suggest going 1 step further and using multiple iterations of MD5. Grid computing has made cracking a single MD5 password trivial. Hash 200,000 times, and at least you require 200,000 times more power to do a mass dictionary attack (and it shouldn't add much time considering how seldom the actual password will need to be checked.

Its actually not something we control explicitly as what we set for the acl identity token (the username:md5hash) needs to match the enforcement side which is provided by zookeeper. We could in future play around with a zk authentication plugin that we could provide custom logic for, but i think we've deemed modification of zk out of scope for the moment. I agree though that things like cuda/gpugpu programming and the fact that we're manipulating cloud environments, makes brute force attacks more likely and it would be nice to have a more resistant approach.

Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

Good point. On the bright side, these tokens are going to be automatically generated. We can easily make them long and random enough to invalidate any brute force attacks (not counting fancy quantum computing advancements ;-).

242. By Kapil Thangavelu

address some formatting comments

243. By Kapil Thangavelu

merge trunk and address some formatting comments

Unmerged revisions

243. By Kapil Thangavelu

merge trunk and address some formatting comments

242. By Kapil Thangavelu

address some formatting comments

241. By Kapil Thangavelu

additional escalation scenarios, outline next steps broadly

240. By Kapil Thangavelu

security overview and todo spec draft

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== added file 'docs/source/drafts/security.rst'
--- docs/source/drafts/security.rst 1970-01-01 00:00:00 +0000
+++ docs/source/drafts/security.rst 2012-06-22 15:29:20 +0000
@@ -0,0 +1,260 @@
1Security Overview
2=================
3
4Ensemble is committed to providing a reliable secure mechanism for
5deploying services. What follows is an overview of these different
6mechanisms and how they contribute to keeping an ensemble environment
7secure.
8
9Glossary
10--------
11
12First a glossary of terms used in this document.
13
14
15Principal
16~~~~~~~~~
17
18A principal in the context of ensemble can represent any actor or
19group of actors within the system. Each principal is authenticated via
20a user name/principal id and password. For example a service unit
21agent would associate its own unique principal information to its
22connection, and would thus have access to all nodes that have their
23ACL mapping explicitly giving access to the node.
24
25Token Database
26~~~~~~~~~~~~~~
27
28A mapping of principal id to their acl identity token. The identity
29token is a md5 checksum of username/password prefixed of the form
30username:identity_scheme:checksum, where identity_scheme is md5 in the
31case of ensemble. The mapping is stored in a zookeeper node which is
32world readable but only writable by the security (ticket granting)
33agent, which is responsible for creating principals.
34
35Zookeeper ACLs
36~~~~~~~~~~~~~~
37
38Ensemble relies on the security facilities provided by the zookeeper's
39coordination storage, whereby zookeeper automatically restricts access
40to each node, based on the ACL permission map on each node. This ACL
41facility maps permissions to principal identity tokens. Zookeeper
42provides permissions for read, write, delete, create, and admin access
43to each node. Every zookeeper connection can associate principal
44credentials to its connection, and all access by that connection is
45validated against the per node ACL mapping.
46
47
48Additional documentation available here.
49http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#sc_ZooKeeperAccessControl
50
51
52Security Agent
53~~~~~~~~~~~~~~
54
55An additional zookeeper connected actor responsible for creating principals
56and providing an up to date token database.
57
58The security agent manages a token database (definition to follow),
59and provides for the creation of new principals and handing out their
60hash tokens to inquiring parties.
61
62Security Policy
63~~~~~~~~~~~~~~~
64
65Each actor employs a security policy, to determine the ACL map for a given
66node path that may create. The policy simply takes the path to the node
67to be created, and returns back an ACL map that can be set on the node.
68
69
70Identity
71--------
72
73How the system passes credentials to an actor is a critical aspect to
74managing principals securely. Every actor in the system needs its own
75unique principal, to provide an auth identity, the credentials for a
76principal are known only to the actor utilizing them and transiently
77the security agent when they are created.
78
79Instead of passing principals credentials directly via insecure
80channels, an actor creating another actor also establishes a principal
81creation token via the security agent. The principal creation token is
82a one time use string which can be used to create a principal and its
83password, and update the token database.
84
85The security agent has a simple policy in place regarding principal
86names and which actors can create them, ie. a provisioning agent can
87create machine principals, but not service unit principals.
88
89If a malicious user intercepts the token and uses it, compared with
90passing credentials directly it minimizes the time that a third party
91has to perform such an interception. Moreover invalid use of a token
92can be logged as foresenic information.
93
94One question that emerges with the use of a separate agent for creating
95identities, is how agents needed for bootstrap recieve their credentials.
96
97 - The bootstrap can utilize a specialized OTP interface with a precreated
98 known value, which it can use to initialize the tree.
99
100Transport level security
101------------------------
102
103As zookeeper does not currently support SSL/TLS transport level
104security, Ensemble utilizes SSH port forwarding to ensure encrypted
105communications to zookeeper. One significant lacking to this approach,
106is that any process on the set of ensemble machines can attempt to
107connect zookeeper to brute force principal passwords.
108
109See Also Alternatives#NodeEncryption
110
111Privileged Data
112---------------
113
114Certain data stored within zookeeper, is by its nature privileged and
115should only be shared with agents requiring it for their function. For
116example the Ensemble provider credentials should only be exposed to
117the provisioning agent, as its required for it to function, any
118additional access to the data, would be regarded as a data escalation
119vulnerability.
120
121Additionally services utilize relations to communicate with each
122other, every service unit of the services participating within a
123relation gets write access only to its own node within the relation,
124and has read access to all service unit relation settings. An
125unrelated service unit from a different service, is not allowed to
126read any settings from the relation.
127
128
129Data Security
130-------------
131
132A pub key/priv key can be associated to the OTP to
133
134
135OTP Security
136------------
137
138The otp is not secure without an additional enforcement, as it does
139not exist as a native capability of the zk interface. An additional
140actor responsible for creating identities and processing OTP tokens
141would be an alternative (See futures).
142
143
144Relations attacks
145-----------------
146
147Ensemble is comprised of a number of actors connecting to and
148communicating via a shared storage. When two services enter into a
149relation, a private bidirectional channel is created for them to
150exchange data.
151
152Ensemble ensures that the zookeeper nodes used for this communication
153are subject to the proper ACL constraints such that unrelated services
154are unable to access them.
155
156But these relations represent adhoc inter machine communication, which
157are formula defined. A malicious agent could possibly abuse one of
158these protocols to further compromise additional agents. Unlike other
159attack vectors in ensemble, this is one that ensemble can only make
160minimal safety guarantees regarding, outside of perhaps a simple
161validation of relation data (currently treated as a binary blob) with
162relation type associated schemas.
163
164The formulas executed by the unit agent provide for user executed code
165done within an lxc container (with root privileges). LXC provides
166limited support for security against root in a container, so a
167container compromise can escalate to a machine level compromise and
168those of the other units on a machine.
169
170
171Privilege Escalation Scenarios
172------------------------------
173
174We have serveral different levels of escalation within ensemble for
175malicious code that need to be considered.
176
177container escalation
178++++++++++++++++++++
179
180All formula hooks are executed within an lxc container to give a
181minimally isolated environment. This lxc container is rather trivially
182exploitable to gain root access on the machine, as formulas execute
183as root within the container and lxc provides minimal security guarantees
184atm, which leads to the next escalation level.
185
186Future work is needed to provide better security around lxc
187integration, perhaps via integration of apparmor and ongoing lxc
188isolation work.
189
190Machine escalation
191++++++++++++++++++
192
193A machine is considered compromised if malicious code has root access
194on the machine, all service units colocated on the machine are also
195considered compromised if this occurs.
196
197Agent Escalation
198++++++++++++++++
199
200An agent is considered compromised if malicious code has an open zookeeper
201connection with a valid actor principal identity. The malicious code
202has access to all data exposed via ACL to the compromised identity.
203
204Beyond these generic scenarios we have particular escalations which
205are effectively fatal, as they entail access to sensitive data that
206spans the ensemble environment or machine provider.
207
208A bootstrap machine compromise which allow for disk access could be
209considered fatal as the Ensemble shared state (zookeeper) data is
210resident on disk.
211
212Certain agents like the provisioning agent, compromise of whose identity
213would allow malicious code to utilize the machine provider credentials.
214
215
216Access to Deployed services
217----------------------------
218
219A plan for controlled public access to deployed services is provided
220separately by the expose-services specification.
221
222Currently all internal access within a machine provider environment
223like ec2 is unfiltered.
224
225In future we should have machine level firewalling to allow access
226between services based on their relations.
227
228Alternatives
229------------
230
231
232Node Encryption
233~~~~~~~~~~~~~~~
234
235Principal Agent
236~~~~~~~~~~~~~~~
237
238A security agent responsible for
239 for transport security via node encryption.
240
241Next Steps
242----------
243
244SSH Host Identity Checks
245
246we should pull the ssh key of the machine into zk, so connections to a
247given machine can verify against valid keys of environment machines
248
249Formula Storage URLs
250
251Currently the formula storage access is referenced by a storage key
252which is retrieved via the machine provider storage interface. This
253requires access to the machine provider credentials by Formula Storage
254by machine agents, which they shouldn't need.
255
256- Security Agent & Token Database
257- Security Policy (Path Based ACL generator)
258- Connections w/ Principal
259
260

Subscribers

People subscribed via source and target branches

to status/vote changes: