Merge lp:~bigdata-dev/charms/trusty/apache-hadoop-compute-slave/readme into lp:~bigdata-dev/charms/trusty/apache-hadoop-compute-slave/trunk

Proposed by Cory Johns
Status: Merged
Merged at revision: 42
Proposed branch: lp:~bigdata-dev/charms/trusty/apache-hadoop-compute-slave/readme
Merge into: lp:~bigdata-dev/charms/trusty/apache-hadoop-compute-slave/trunk
Diff against target: 278 lines (+110/-110)
3 files modified
README.dev.md (+72/-0)
README.md (+36/-108)
resources.yaml (+2/-2)
To merge this branch: bzr merge lp:~bigdata-dev/charms/trusty/apache-hadoop-compute-slave/readme
Reviewer Review Type Date Requested Status
amir sanjar (community) Approve
Review via email: mp+252617@code.launchpad.net

Description of the change

New READMEs and minor relation cleanups

To post a comment you must log in.
42. By Cory Johns

Disambiguate the interface in README.dev

Revision history for this message
amir sanjar (asanjar) wrote :

we will also need to return hostname of computes nodes in future

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== added file 'README.dev.md'
--- README.dev.md 1970-01-01 00:00:00 +0000
+++ README.dev.md 2015-03-11 18:02:10 +0000
@@ -0,0 +1,72 @@
1## Overview
2
3This charm provides computation and storage resources for an Apache Hadoop
4deployment, and is intended to be used only as a part of that deployment.
5This document describes how this charm connects to and interacts with the
6other components of the deployment.
7
8
9## Provided Relations
10
11### datanode (interface: dfs-slave)
12
13This relation connects this charm to the apache-hadoop-hdfs-master charm.
14It is a bi-directional interface, with the following keys being exchanged:
15
16* Sent to hdfs-master:
17
18 * `private-address`: Address of this unit, to be registered as a DataNode
19
20* Received from hdfs-master:
21
22 * `private-address`: Address of the HDFS master unit, to provide the NameNode
23 * `ready`: A flag indicating that HDFS is ready to register DataNodes
24
25Ports will soon be added to both of these.
26
27
28### nodemanager (interface: mapred-slave)
29
30This relation connects this charm to the apache-hadoop-yarn-master charm.
31It is a bi-directional interface, with the following keys being exchanged:
32
33* Sent to yarn-master:
34
35 * `private-address`: Address of this unit, to be registered as a NodeManager
36
37* Received from yarn-master:
38
39 * `private-address`: Address of the YARN master unit, to provide the ResourceManager
40 * `ready`: A flag indicating that YARN is ready to register NodeManagers
41
42Ports will soon be added to both of these.
43
44
45## Required Relations
46
47*There are no required relations for this charm.*
48
49
50## Manual Deployment
51
52The easiest way to deploy the core Apache Hadoop platform is to use one of
53the [apache-core-batch-processing-* bundles](https://jujucharms.com/q/bigdata-dev/apache?type=bundle).
54However, to manually deploy the base Apache Hadoop platform without using one of the
55bundles, you can use the following:
56
57 juju deploy apache-hadoop-hdfs-master hdfs-master
58 juju deploy apache-hadoop-hdfs-secondary secondary-namenode
59 juju deploy apache-hadoop-yarn-master yarn-master
60 juju deploy apache-hadoop-compute-slave compute-slave -n3
61 juju deploy apache-hadoop-client client
62 juju add-relation yarn-master hdfs-master
63 juju add-relation secondary-namenode hdfs-master
64 juju add-relation compute-slave yarn-master
65 juju add-relation compute-slave hdfs-master
66 juju add-relation client yarn-master
67 juju add-relation client hdfs-master
68
69This will create a scalable deployment with separate nodes for each master,
70and a three unit compute slave (NodeManager and DataNode) cluster. The master
71charms also support co-locating using the `--to` option to `juju deploy` for
72more dense deployments.
073
=== modified file 'README.md'
--- README.md 2015-02-13 22:34:21 +0000
+++ README.md 2015-03-11 18:02:10 +0000
@@ -1,118 +1,39 @@
1## Overview1## Overview
22
3This charm is a component of the Apache Hadoop platform. It is intended
4to be deployed with the other components using the bundle:
5`bundle:~bigdata-charmers/apache-hadoop`
6
7**What is Apache Hadoop?**
8
9The Apache Hadoop software library is a framework that allows for the3The Apache Hadoop software library is a framework that allows for the
10distributed processing of large data sets across clusters of computers4distributed processing of large data sets across clusters of computers
11using a simple programming model.5using a simple programming model.
126
13It is designed to scale up from single servers to thousands of machines,7This charm deploys a compute / slave node running the NodeManager
14each offering local computation and storage. Rather than rely on hardware8and DataNode components of
15to deliver high-avaiability, the library itself is designed to detect9[Apache Hadoop 2.4.1](http://hadoop.apache.org/docs/r2.4.1/),
16and handle failures at the application layer, so delivering a10which provides computation and storage resources to the platform.
17highly-availabile service on top of a cluster of computers, each of
18which may be prone to failures.
19
20Apache Hadoop 2.4.1 consists of significant improvements over the previous stable
21release (hadoop-1.x).
22
23Here is a short overview of the improvments to both HDFS and MapReduce.
24
25 - **HDFS Federation**
26 In order to scale the name service horizontally, federation uses multiple
27 independent Namenodes/Namespaces. The Namenodes are federated, that is, the
28 Namenodes are independent and don't require coordination with each other.
29 The datanodes are used as common storage for blocks by all the Namenodes.
30 Each datanode registers with all the Namenodes in the cluster. Datanodes
31 send periodic heartbeats and block reports and handles commands from the
32 Namenodes.
33
34 More details are available in the HDFS Federation document:
35 <http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/Federation.html>
36
37 - **MapReduce NextGen aka YARN aka MRv2**
38 The new architecture introduced in hadoop-0.23, divides the two major functions of the
39 JobTracker: resource management and job life-cycle management into separate components.
40 The new ResourceManager manages the global assignment of compute resources to
41 applications and the per-application ApplicationMaster manages the application‚
42 scheduling and coordination.
43 An application is either a single job in the sense of classic MapReduce jobs or a DAG of
44 such jobs.
45
46 The ResourceManager and per-machine NodeManager daemon, which manages the user
47 processes on that machine, form the computation fabric.
48
49 The per-application ApplicationMaster is, in effect, a framework specific
50 library and is tasked with negotiating resources from the ResourceManager and
51 working with the NodeManager(s) to execute and monitor the tasks.
52
53 More details are available in the YARN document:
54 <http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/YARN.html>
5511
56## Usage12## Usage
5713
58This charm manages the compute slave nodes, which include the DataNode and14This charm is intended to be deployed via one of the
59NodeManager components. It is intended to be used with `apache-hadoop-hdfs-master`15[bundles](https://jujucharms.com/q/bigdata-dev/apache?type=bundle).
60and `apache-hadoop-yarn-master`.16For example:
6117
62### Simple Usage: Single YARN / HDFS master deployment18 juju quickstart u/bigdata-dev/apache-analytics-sql
6319
64In this configuration, the YARN and HDFS master components run on the same20This will deploy the Apache Hadoop platform with Apache Hive available to
65machine. This is useful for lower-resource deployments::21perform SQL-like queries against your data.
6622
67 juju deploy apache-hadoop-hdfs-master hdfs-master23You can also manually load and run map-reduce jobs via the client:
68 juju deploy apache-hadoop-hdfs-checkpoint secondary-namenode --to 124
69 juju deploy apache-hadoop-yarn-master yarn-master --to 125 juju scp my-job.jar client/0:
70 juju deploy apache-hadoop-compute-slave compute-slave26 juju ssh client/0
71 juju deploy apache-hadoop-client client27 hadoop jar my-job.jar
72 juju add-relation yarn-master hdfs-master28
73 juju add-relation secondary-namenode hdfs-master29
74 juju add-relation compute-slave yarn-master30### Scaling
75 juju add-relation compute-slave hdfs-master31
76 juju add-relation client yarn-master32The compute-slave node is the "workhorse" of the Apache Hadoop platform.
77 juju add-relation client hdfs-master33To scale your deployment's performance, you can simply add more compute-slave
7834units. For example, to add three mode units:
79Note that the machine number (`--to 1`) should match the machine number35
80for the `hdfs-master` charm. If you previously deployed other services36 juju add-unit compute-slave -n 3
81in your environment, you may need to adjust the machine number appropriately.
82
83
84### Scale Out Usage: Separate HDFS, YARN, and compute nodes
85
86In this configuration the HDFS and YARN deployments operate on
87different service units as separate services::
88
89 juju deploy apache-hadoop-hdfs-master hdfs-master
90 juju deploy apache-hadoop-hdfs-checkpoint secondary-namenode
91 juju deploy apache-hadoop-yarn-master yarn-master
92 juju deploy apache-hadoop-compute-slave compute-slave -n 3
93 juju deploy apache-hadoop-client client
94 juju add-relation yarn-master hdfs-master
95 juju add-relation secondary-namenode hdfs-master
96 juju add-relation compute-slave yarn-master
97 juju add-relation compute-slave hdfs-master
98 juju add-relation client yarn-master
99 juju add-relation client hdfs-master
100
101The `-n 3` option can be adjusted according to the number of compute nodes
102you need. You can also add additional compute nodes later::
103
104 juju add-unit compute-slave -n 2
105
106
107### To deploy a Hadoop service with elasticsearch service::
108 # deploy ElasticSearch locally:
109 **juju deploy elasticsearch elasticsearch**
110 # elasticsearch-hadoop.jar file will be added to LIBJARS path
111 # Recommanded to use hadoop -libjars option to included elk jar file
112 **juju add-unit -n elasticsearch**
113 # deploy hive service by any senarios mentioned above
114 # associate Hive with elasticsearch
115 **juju add-relation {hadoop master}:elasticsearch elasticsearch:client**
11637
11738
118## Deploying in Network-Restricted Environments39## Deploying in Network-Restricted Environments
@@ -121,12 +42,14 @@
121access. To deploy in this environment, you will need a local mirror to serve42access. To deploy in this environment, you will need a local mirror to serve
122the packages and resources required by these charms.43the packages and resources required by these charms.
12344
45
124### Mirroring Packages46### Mirroring Packages
12547
126You can setup a local mirror for apt packages using squid-deb-proxy.48You can setup a local mirror for apt packages using squid-deb-proxy.
127For instructions on configuring juju to use this, see the49For instructions on configuring juju to use this, see the
128[Juju Proxy Documentation](https://juju.ubuntu.com/docs/howto-proxies.html).50[Juju Proxy Documentation](https://juju.ubuntu.com/docs/howto-proxies.html).
12951
52
130### Mirroring Resources53### Mirroring Resources
13154
132In addition to apt packages, the Apache Hadoop charms require a few binary55In addition to apt packages, the Apache Hadoop charms require a few binary
@@ -144,14 +67,19 @@
14467
145You can fetch the resources for all of the Apache Hadoop charms68You can fetch the resources for all of the Apache Hadoop charms
146(`apache-hadoop-hdfs-master`, `apache-hadoop-yarn-master`,69(`apache-hadoop-hdfs-master`, `apache-hadoop-yarn-master`,
147`apache-hadoop-compute-slave`, `apache-hadoop-client`, etc) into a single70`apache-hadoop-hdfs-secondary`, `apache-hadoop-client`, etc) into a single
148directory and serve them all with a single `juju resources serve` instance.71directory and serve them all with a single `juju resources serve` instance.
14972
15073
151## Contact Information74## Contact Information
152amir sanjar <amir.sanjar@canonical.com>75
76* Amir Sanjar <amir.sanjar@canonical.com>
77* Cory Johns <cory.johns@canonical.com>
78* Kevin Monroe <kevin.monroe@canonical.com>
79
15380
154## Hadoop81## Hadoop
82
155- [Apache Hadoop](http://hadoop.apache.org/) home page83- [Apache Hadoop](http://hadoop.apache.org/) home page
156- [Apache Hadoop bug trackers](http://hadoop.apache.org/issue_tracking.html)84- [Apache Hadoop bug trackers](http://hadoop.apache.org/issue_tracking.html)
157- [Apache Hadoop mailing lists](http://hadoop.apache.org/mailing_lists.html)85- [Apache Hadoop mailing lists](http://hadoop.apache.org/mailing_lists.html)
15886
=== modified file 'resources.yaml'
--- resources.yaml 2015-03-06 22:28:48 +0000
+++ resources.yaml 2015-03-11 18:02:10 +0000
@@ -8,8 +8,8 @@
8 six:8 six:
9 pypi: six9 pypi: six
10 charmhelpers:10 charmhelpers:
11 pypi: http://bazaar.launchpad.net/~bigdata-dev/bigdata-data/trunk/download/cory.johns%40canonical.com-20150306222841-0ibvrfaungfdtkn8/charmhelpers0.2.2.ta-20150304033309-4fa7ewnosqavnwms-1/charmhelpers-0.2.2.tar.gz11 pypi: http://bazaar.launchpad.net/~bigdata-dev/bigdata-data/trunk/download/cory.johns%40canonical.com-20150310214330-f2bk32gk92iinrx8/charmhelpers0.2.2.ta-20150304033309-4fa7ewnosqavnwms-1/charmhelpers-0.2.2.tar.gz
12 hash: 787fc1cc70fc89e653b08dd192ec702d844709e450ff67577b7c5e99b6bbf39b12 hash: a1cafa5e315d3a33db15a8e18f56b4c64d47c2c1c6fcbdba81e42bb00642971c
13 hash_type: sha25613 hash_type: sha256
14optional_resources:14optional_resources:
15 hadoop-aarch64:15 hadoop-aarch64:

Subscribers

People subscribed via source and target branches

to all changes: