Merge lp:~bigdata-dev/charms/trusty/apache-hadoop-compute-slave/readme into lp:~bigdata-dev/charms/trusty/apache-hadoop-compute-slave/trunk
- Trusty Tahr (14.04)
- readme
- Merge into trunk
Proposed by
Cory Johns
Status: | Merged |
---|---|
Merged at revision: | 42 |
Proposed branch: | lp:~bigdata-dev/charms/trusty/apache-hadoop-compute-slave/readme |
Merge into: | lp:~bigdata-dev/charms/trusty/apache-hadoop-compute-slave/trunk |
Diff against target: |
278 lines (+110/-110) 3 files modified
README.dev.md (+72/-0) README.md (+36/-108) resources.yaml (+2/-2) |
To merge this branch: | bzr merge lp:~bigdata-dev/charms/trusty/apache-hadoop-compute-slave/readme |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
amir sanjar (community) | Approve | ||
Review via email: mp+252617@code.launchpad.net |
Commit message
Description of the change
New READMEs and minor relation cleanups
To post a comment you must log in.
- 42. By Cory Johns
-
Disambiguate the interface in README.dev
Preview Diff
[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1 | === added file 'README.dev.md' | |||
2 | --- README.dev.md 1970-01-01 00:00:00 +0000 | |||
3 | +++ README.dev.md 2015-03-11 18:02:10 +0000 | |||
4 | @@ -0,0 +1,72 @@ | |||
5 | 1 | ## Overview | ||
6 | 2 | |||
7 | 3 | This charm provides computation and storage resources for an Apache Hadoop | ||
8 | 4 | deployment, and is intended to be used only as a part of that deployment. | ||
9 | 5 | This document describes how this charm connects to and interacts with the | ||
10 | 6 | other components of the deployment. | ||
11 | 7 | |||
12 | 8 | |||
13 | 9 | ## Provided Relations | ||
14 | 10 | |||
15 | 11 | ### datanode (interface: dfs-slave) | ||
16 | 12 | |||
17 | 13 | This relation connects this charm to the apache-hadoop-hdfs-master charm. | ||
18 | 14 | It is a bi-directional interface, with the following keys being exchanged: | ||
19 | 15 | |||
20 | 16 | * Sent to hdfs-master: | ||
21 | 17 | |||
22 | 18 | * `private-address`: Address of this unit, to be registered as a DataNode | ||
23 | 19 | |||
24 | 20 | * Received from hdfs-master: | ||
25 | 21 | |||
26 | 22 | * `private-address`: Address of the HDFS master unit, to provide the NameNode | ||
27 | 23 | * `ready`: A flag indicating that HDFS is ready to register DataNodes | ||
28 | 24 | |||
29 | 25 | Ports will soon be added to both of these. | ||
30 | 26 | |||
31 | 27 | |||
32 | 28 | ### nodemanager (interface: mapred-slave) | ||
33 | 29 | |||
34 | 30 | This relation connects this charm to the apache-hadoop-yarn-master charm. | ||
35 | 31 | It is a bi-directional interface, with the following keys being exchanged: | ||
36 | 32 | |||
37 | 33 | * Sent to yarn-master: | ||
38 | 34 | |||
39 | 35 | * `private-address`: Address of this unit, to be registered as a NodeManager | ||
40 | 36 | |||
41 | 37 | * Received from yarn-master: | ||
42 | 38 | |||
43 | 39 | * `private-address`: Address of the YARN master unit, to provide the ResourceManager | ||
44 | 40 | * `ready`: A flag indicating that YARN is ready to register NodeManagers | ||
45 | 41 | |||
46 | 42 | Ports will soon be added to both of these. | ||
47 | 43 | |||
48 | 44 | |||
49 | 45 | ## Required Relations | ||
50 | 46 | |||
51 | 47 | *There are no required relations for this charm.* | ||
52 | 48 | |||
53 | 49 | |||
54 | 50 | ## Manual Deployment | ||
55 | 51 | |||
56 | 52 | The easiest way to deploy the core Apache Hadoop platform is to use one of | ||
57 | 53 | the [apache-core-batch-processing-* bundles](https://jujucharms.com/q/bigdata-dev/apache?type=bundle). | ||
58 | 54 | However, to manually deploy the base Apache Hadoop platform without using one of the | ||
59 | 55 | bundles, you can use the following: | ||
60 | 56 | |||
61 | 57 | juju deploy apache-hadoop-hdfs-master hdfs-master | ||
62 | 58 | juju deploy apache-hadoop-hdfs-secondary secondary-namenode | ||
63 | 59 | juju deploy apache-hadoop-yarn-master yarn-master | ||
64 | 60 | juju deploy apache-hadoop-compute-slave compute-slave -n3 | ||
65 | 61 | juju deploy apache-hadoop-client client | ||
66 | 62 | juju add-relation yarn-master hdfs-master | ||
67 | 63 | juju add-relation secondary-namenode hdfs-master | ||
68 | 64 | juju add-relation compute-slave yarn-master | ||
69 | 65 | juju add-relation compute-slave hdfs-master | ||
70 | 66 | juju add-relation client yarn-master | ||
71 | 67 | juju add-relation client hdfs-master | ||
72 | 68 | |||
73 | 69 | This will create a scalable deployment with separate nodes for each master, | ||
74 | 70 | and a three unit compute slave (NodeManager and DataNode) cluster. The master | ||
75 | 71 | charms also support co-locating using the `--to` option to `juju deploy` for | ||
76 | 72 | more dense deployments. | ||
77 | 0 | 73 | ||
78 | === modified file 'README.md' | |||
79 | --- README.md 2015-02-13 22:34:21 +0000 | |||
80 | +++ README.md 2015-03-11 18:02:10 +0000 | |||
81 | @@ -1,118 +1,39 @@ | |||
82 | 1 | ## Overview | 1 | ## Overview |
83 | 2 | 2 | ||
84 | 3 | This charm is a component of the Apache Hadoop platform. It is intended | ||
85 | 4 | to be deployed with the other components using the bundle: | ||
86 | 5 | `bundle:~bigdata-charmers/apache-hadoop` | ||
87 | 6 | |||
88 | 7 | **What is Apache Hadoop?** | ||
89 | 8 | |||
90 | 9 | The Apache Hadoop software library is a framework that allows for the | 3 | The Apache Hadoop software library is a framework that allows for the |
91 | 10 | distributed processing of large data sets across clusters of computers | 4 | distributed processing of large data sets across clusters of computers |
92 | 11 | using a simple programming model. | 5 | using a simple programming model. |
93 | 12 | 6 | ||
136 | 13 | It is designed to scale up from single servers to thousands of machines, | 7 | This charm deploys a compute / slave node running the NodeManager |
137 | 14 | each offering local computation and storage. Rather than rely on hardware | 8 | and DataNode components of |
138 | 15 | to deliver high-avaiability, the library itself is designed to detect | 9 | [Apache Hadoop 2.4.1](http://hadoop.apache.org/docs/r2.4.1/), |
139 | 16 | and handle failures at the application layer, so delivering a | 10 | which provides computation and storage resources to the platform. |
98 | 17 | highly-availabile service on top of a cluster of computers, each of | ||
99 | 18 | which may be prone to failures. | ||
100 | 19 | |||
101 | 20 | Apache Hadoop 2.4.1 consists of significant improvements over the previous stable | ||
102 | 21 | release (hadoop-1.x). | ||
103 | 22 | |||
104 | 23 | Here is a short overview of the improvments to both HDFS and MapReduce. | ||
105 | 24 | |||
106 | 25 | - **HDFS Federation** | ||
107 | 26 | In order to scale the name service horizontally, federation uses multiple | ||
108 | 27 | independent Namenodes/Namespaces. The Namenodes are federated, that is, the | ||
109 | 28 | Namenodes are independent and don't require coordination with each other. | ||
110 | 29 | The datanodes are used as common storage for blocks by all the Namenodes. | ||
111 | 30 | Each datanode registers with all the Namenodes in the cluster. Datanodes | ||
112 | 31 | send periodic heartbeats and block reports and handles commands from the | ||
113 | 32 | Namenodes. | ||
114 | 33 | |||
115 | 34 | More details are available in the HDFS Federation document: | ||
116 | 35 | <http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/Federation.html> | ||
117 | 36 | |||
118 | 37 | - **MapReduce NextGen aka YARN aka MRv2** | ||
119 | 38 | The new architecture introduced in hadoop-0.23, divides the two major functions of the | ||
120 | 39 | JobTracker: resource management and job life-cycle management into separate components. | ||
121 | 40 | The new ResourceManager manages the global assignment of compute resources to | ||
122 | 41 | applications and the per-application ApplicationMaster manages the application‚ | ||
123 | 42 | scheduling and coordination. | ||
124 | 43 | An application is either a single job in the sense of classic MapReduce jobs or a DAG of | ||
125 | 44 | such jobs. | ||
126 | 45 | |||
127 | 46 | The ResourceManager and per-machine NodeManager daemon, which manages the user | ||
128 | 47 | processes on that machine, form the computation fabric. | ||
129 | 48 | |||
130 | 49 | The per-application ApplicationMaster is, in effect, a framework specific | ||
131 | 50 | library and is tasked with negotiating resources from the ResourceManager and | ||
132 | 51 | working with the NodeManager(s) to execute and monitor the tasks. | ||
133 | 52 | |||
134 | 53 | More details are available in the YARN document: | ||
135 | 54 | <http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/YARN.html> | ||
140 | 55 | 11 | ||
141 | 56 | ## Usage | 12 | ## Usage |
142 | 57 | 13 | ||
201 | 58 | This charm manages the compute slave nodes, which include the DataNode and | 14 | This charm is intended to be deployed via one of the |
202 | 59 | NodeManager components. It is intended to be used with `apache-hadoop-hdfs-master` | 15 | [bundles](https://jujucharms.com/q/bigdata-dev/apache?type=bundle). |
203 | 60 | and `apache-hadoop-yarn-master`. | 16 | For example: |
204 | 61 | 17 | ||
205 | 62 | ### Simple Usage: Single YARN / HDFS master deployment | 18 | juju quickstart u/bigdata-dev/apache-analytics-sql |
206 | 63 | 19 | ||
207 | 64 | In this configuration, the YARN and HDFS master components run on the same | 20 | This will deploy the Apache Hadoop platform with Apache Hive available to |
208 | 65 | machine. This is useful for lower-resource deployments:: | 21 | perform SQL-like queries against your data. |
209 | 66 | 22 | ||
210 | 67 | juju deploy apache-hadoop-hdfs-master hdfs-master | 23 | You can also manually load and run map-reduce jobs via the client: |
211 | 68 | juju deploy apache-hadoop-hdfs-checkpoint secondary-namenode --to 1 | 24 | |
212 | 69 | juju deploy apache-hadoop-yarn-master yarn-master --to 1 | 25 | juju scp my-job.jar client/0: |
213 | 70 | juju deploy apache-hadoop-compute-slave compute-slave | 26 | juju ssh client/0 |
214 | 71 | juju deploy apache-hadoop-client client | 27 | hadoop jar my-job.jar |
215 | 72 | juju add-relation yarn-master hdfs-master | 28 | |
216 | 73 | juju add-relation secondary-namenode hdfs-master | 29 | |
217 | 74 | juju add-relation compute-slave yarn-master | 30 | ### Scaling |
218 | 75 | juju add-relation compute-slave hdfs-master | 31 | |
219 | 76 | juju add-relation client yarn-master | 32 | The compute-slave node is the "workhorse" of the Apache Hadoop platform. |
220 | 77 | juju add-relation client hdfs-master | 33 | To scale your deployment's performance, you can simply add more compute-slave |
221 | 78 | 34 | units. For example, to add three mode units: | |
222 | 79 | Note that the machine number (`--to 1`) should match the machine number | 35 | |
223 | 80 | for the `hdfs-master` charm. If you previously deployed other services | 36 | juju add-unit compute-slave -n 3 |
166 | 81 | in your environment, you may need to adjust the machine number appropriately. | ||
167 | 82 | |||
168 | 83 | |||
169 | 84 | ### Scale Out Usage: Separate HDFS, YARN, and compute nodes | ||
170 | 85 | |||
171 | 86 | In this configuration the HDFS and YARN deployments operate on | ||
172 | 87 | different service units as separate services:: | ||
173 | 88 | |||
174 | 89 | juju deploy apache-hadoop-hdfs-master hdfs-master | ||
175 | 90 | juju deploy apache-hadoop-hdfs-checkpoint secondary-namenode | ||
176 | 91 | juju deploy apache-hadoop-yarn-master yarn-master | ||
177 | 92 | juju deploy apache-hadoop-compute-slave compute-slave -n 3 | ||
178 | 93 | juju deploy apache-hadoop-client client | ||
179 | 94 | juju add-relation yarn-master hdfs-master | ||
180 | 95 | juju add-relation secondary-namenode hdfs-master | ||
181 | 96 | juju add-relation compute-slave yarn-master | ||
182 | 97 | juju add-relation compute-slave hdfs-master | ||
183 | 98 | juju add-relation client yarn-master | ||
184 | 99 | juju add-relation client hdfs-master | ||
185 | 100 | |||
186 | 101 | The `-n 3` option can be adjusted according to the number of compute nodes | ||
187 | 102 | you need. You can also add additional compute nodes later:: | ||
188 | 103 | |||
189 | 104 | juju add-unit compute-slave -n 2 | ||
190 | 105 | |||
191 | 106 | |||
192 | 107 | ### To deploy a Hadoop service with elasticsearch service:: | ||
193 | 108 | # deploy ElasticSearch locally: | ||
194 | 109 | **juju deploy elasticsearch elasticsearch** | ||
195 | 110 | # elasticsearch-hadoop.jar file will be added to LIBJARS path | ||
196 | 111 | # Recommanded to use hadoop -libjars option to included elk jar file | ||
197 | 112 | **juju add-unit -n elasticsearch** | ||
198 | 113 | # deploy hive service by any senarios mentioned above | ||
199 | 114 | # associate Hive with elasticsearch | ||
200 | 115 | **juju add-relation {hadoop master}:elasticsearch elasticsearch:client** | ||
224 | 116 | 37 | ||
225 | 117 | 38 | ||
226 | 118 | ## Deploying in Network-Restricted Environments | 39 | ## Deploying in Network-Restricted Environments |
227 | @@ -121,12 +42,14 @@ | |||
228 | 121 | access. To deploy in this environment, you will need a local mirror to serve | 42 | access. To deploy in this environment, you will need a local mirror to serve |
229 | 122 | the packages and resources required by these charms. | 43 | the packages and resources required by these charms. |
230 | 123 | 44 | ||
231 | 45 | |||
232 | 124 | ### Mirroring Packages | 46 | ### Mirroring Packages |
233 | 125 | 47 | ||
234 | 126 | You can setup a local mirror for apt packages using squid-deb-proxy. | 48 | You can setup a local mirror for apt packages using squid-deb-proxy. |
235 | 127 | For instructions on configuring juju to use this, see the | 49 | For instructions on configuring juju to use this, see the |
236 | 128 | [Juju Proxy Documentation](https://juju.ubuntu.com/docs/howto-proxies.html). | 50 | [Juju Proxy Documentation](https://juju.ubuntu.com/docs/howto-proxies.html). |
237 | 129 | 51 | ||
238 | 52 | |||
239 | 130 | ### Mirroring Resources | 53 | ### Mirroring Resources |
240 | 131 | 54 | ||
241 | 132 | In addition to apt packages, the Apache Hadoop charms require a few binary | 55 | In addition to apt packages, the Apache Hadoop charms require a few binary |
242 | @@ -144,14 +67,19 @@ | |||
243 | 144 | 67 | ||
244 | 145 | You can fetch the resources for all of the Apache Hadoop charms | 68 | You can fetch the resources for all of the Apache Hadoop charms |
245 | 146 | (`apache-hadoop-hdfs-master`, `apache-hadoop-yarn-master`, | 69 | (`apache-hadoop-hdfs-master`, `apache-hadoop-yarn-master`, |
247 | 147 | `apache-hadoop-compute-slave`, `apache-hadoop-client`, etc) into a single | 70 | `apache-hadoop-hdfs-secondary`, `apache-hadoop-client`, etc) into a single |
248 | 148 | directory and serve them all with a single `juju resources serve` instance. | 71 | directory and serve them all with a single `juju resources serve` instance. |
249 | 149 | 72 | ||
250 | 150 | 73 | ||
251 | 151 | ## Contact Information | 74 | ## Contact Information |
253 | 152 | amir sanjar <amir.sanjar@canonical.com> | 75 | |
254 | 76 | * Amir Sanjar <amir.sanjar@canonical.com> | ||
255 | 77 | * Cory Johns <cory.johns@canonical.com> | ||
256 | 78 | * Kevin Monroe <kevin.monroe@canonical.com> | ||
257 | 79 | |||
258 | 153 | 80 | ||
259 | 154 | ## Hadoop | 81 | ## Hadoop |
260 | 82 | |||
261 | 155 | - [Apache Hadoop](http://hadoop.apache.org/) home page | 83 | - [Apache Hadoop](http://hadoop.apache.org/) home page |
262 | 156 | - [Apache Hadoop bug trackers](http://hadoop.apache.org/issue_tracking.html) | 84 | - [Apache Hadoop bug trackers](http://hadoop.apache.org/issue_tracking.html) |
263 | 157 | - [Apache Hadoop mailing lists](http://hadoop.apache.org/mailing_lists.html) | 85 | - [Apache Hadoop mailing lists](http://hadoop.apache.org/mailing_lists.html) |
264 | 158 | 86 | ||
265 | === modified file 'resources.yaml' | |||
266 | --- resources.yaml 2015-03-06 22:28:48 +0000 | |||
267 | +++ resources.yaml 2015-03-11 18:02:10 +0000 | |||
268 | @@ -8,8 +8,8 @@ | |||
269 | 8 | six: | 8 | six: |
270 | 9 | pypi: six | 9 | pypi: six |
271 | 10 | charmhelpers: | 10 | charmhelpers: |
274 | 11 | pypi: http://bazaar.launchpad.net/~bigdata-dev/bigdata-data/trunk/download/cory.johns%40canonical.com-20150306222841-0ibvrfaungfdtkn8/charmhelpers0.2.2.ta-20150304033309-4fa7ewnosqavnwms-1/charmhelpers-0.2.2.tar.gz | 11 | pypi: http://bazaar.launchpad.net/~bigdata-dev/bigdata-data/trunk/download/cory.johns%40canonical.com-20150310214330-f2bk32gk92iinrx8/charmhelpers0.2.2.ta-20150304033309-4fa7ewnosqavnwms-1/charmhelpers-0.2.2.tar.gz |
275 | 12 | hash: 787fc1cc70fc89e653b08dd192ec702d844709e450ff67577b7c5e99b6bbf39b | 12 | hash: a1cafa5e315d3a33db15a8e18f56b4c64d47c2c1c6fcbdba81e42bb00642971c |
276 | 13 | hash_type: sha256 | 13 | hash_type: sha256 |
277 | 14 | optional_resources: | 14 | optional_resources: |
278 | 15 | hadoop-aarch64: | 15 | hadoop-aarch64: |
we will also need to return hostname of computes nodes in future