This is going into a very good direction. Here are some comments on a first read: [1] 47 +to each node. Every zookeeper connection can associate principal 48 +credentials to its connection, and all access by that connection is 49 +validated against the per node ACL mapping. This would be a good place to link some upstream documentation for zk. [2] Just in terms of style, on the big reorganization I've standardized the headings on the three styles: 1) === 2) --- 3) ~~~ I suggest we try to keep this convention across all documents, otherwise it's hard to follow what's a sub-header or not (e.g. I wouldn't expect === as a sub-header of ---) [3] 55 +An additional zookeeper connected actor responsible for creating principals 56 +and providing an up to date token database. Do we need to manage an external agent for that? [4] 65 +Each actor employs a security policy, to determine the ACL map for a given 66 +node path that may create. The policy simply takes the path to the node 67 +to be created, and returns back an ACL map that can be set on the node. This wasn't very clear on a first read. Some ideas: - "employs" could mean different things in that context, so I suggest replacing by a less open synonym ("uses"?). - "that may create" feels like missing the subject ("that the actor may create?") - At least the first comma should be dropped. - s/can be set/must be set/? [5] 73 Every actor in the system needs its own 74 +unique principal, to provide an auth identity, the credentials for a 75 +principal are known only to the actor utilizing them and transiently 76 +the security agent when they are created. This sentence isn't reading well. [6] 78 +Instead of passing principals credentials directly via insecure 79 +channels, an actor creating another actor also establishes a principal 80 +creation token via the security agent. The principal creation token is 81 +a one time use string which can be used to create a principal and its 82 +password, and update the token database. 83 + 84 +The security agent has a simple policy in place regarding principal 85 +names and which actors can create them, ie. a provisioning agent can 86 +create machine principals, but not service unit principals. It's not clear from the document how that's going to work, and it also feels a little bit like delegating permissions could be more complexity than is necessary (e.g. what about the agent itself creating these identities instead of providing a one time token?). This area needs some debate. [7] 88 +If a malicious user intercepts the token and uses it, compared with 89 +passing credentials directly it minimizes the time that a third party 90 +has to perform such an interception. Moreover invalid use of a token 91 +can be logged as foresenic information. This assumes knowledge about about how such interceptions would take place, or why it minimizes the time, that isn't available in the document up to then. Also, we should design the security system in a way that such interceptions can't take place, rather than minimizing the time an interception can be made (ssh vs. telnet). [8] 118 +Additionally services utilize relations to communicate with each 119 +other, every service unit of the services participating within a 120 +relation gets write access only to its own node within the relation, 121 +and has read access to all service unit relation settings. An 122 +unrelated service unit from a different service, is not allowed to 123 +read any settings from the relation. Well written, just a few punctuation issues: s/Additionally/Additionally,/ s/other, every/other. Every/ s/service, is/service is/ [9] 136 +are unable to access them. 137 + 138 +But these relations represent adhoc inter machine communication, which s/them.\n\nBut/them. But/ [10] 146 +The formulas executed by the unit agent provide for user executed code 147 +done within an lxc container (with root privileges). LXC provides 148 +limited support for security against root in a container, so a 149 +container compromise can escalate to a machine level compromise and 150 +those of the other units on a machine. This is sensible info, but it's unrelated to relation attacks. [11] s/lxc/LXC/ on the whole document. [12] 153 +Privilege Escalation Scenarios 154 +------------------------------ 159 +container escalation 160 +++++++++++++++++++++ 198 +Access to Deployed services 199 +---------------------------- The document should use "Title case topics" consistently. [13] +like ec2 is unfiltered. s/ec2/EC2/ [14] +In future we should have machine level firewalling to allow access s/future/the future/ [15] +Next Steps This section needs further love. We can leave it unfinished, but it needs to be at least polished enough to look basically ok (proper headers, sentences starting with capitals, etc). Also, I think we should possibly split that section onto a separate document which is specific to the implementation details of the high-level spec, and keep this text as a pleasant read for someone that wants to understand it from a high-level perspective without diving into details. Good work, thanks!