Merge lp:~bigdata-dev/charms/trusty/apache-zeppelin/trunk into lp:charms/trusty/apache-zeppelin

Proposed by Kevin W Monroe
Status: Merged
Merged at revision: 19
Proposed branch: lp:~bigdata-dev/charms/trusty/apache-zeppelin/trunk
Merge into: lp:charms/trusty/apache-zeppelin
Diff against target: 341 lines (+80/-60)
2 files modified
resources/flume-tutorial/note.json (+79/-59)
tests/100-deploy-spark-hdfs-yarn (+1/-1)
To merge this branch: bzr merge lp:~bigdata-dev/charms/trusty/apache-zeppelin/trunk
Reviewer Review Type Date Requested Status
Kevin W Monroe Approve
Review via email: mp+271903@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Kevin W Monroe (kwmonroe) :
review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'resources/flume-tutorial/note.json'
2--- resources/flume-tutorial/note.json 2015-08-26 12:27:56 +0000
3+++ resources/flume-tutorial/note.json 2015-09-22 03:37:16 +0000
4@@ -1,7 +1,7 @@
5 {
6 "paragraphs": [
7 {
8- "text": "%md\n## Welcome to Realtime Syslog Analytic Tutorial Powered by Juju.\n### In this live tutorial we will demonstrat three main phases of any big data solution:\n#### 1. Data Ingestion - Apache Flume-syslog -\u003e Apache flume-hdfs\n#### 2. Data Processing - Apache Spark+YARN\n#### 3. Data Visualization - SparkSQL",
9+ "text": "%md\n## Welcome to the Realtime Syslog Analytics tutorial, powered by Juju.\n### In this live tutorial we will demonstrate three phases of a big data solution:\n#### 1. Data Ingestion: Flume-Syslog -\u003e Flume-HDFS\n#### 2. Data Processing: Spark+YARN\n#### 3. Data Visualization: SparkSQL+Zeppelin",
10 "config": {
11 "colWidth": 12.0,
12 "graph": {
13@@ -12,7 +12,8 @@
14 "values": [],
15 "groups": [],
16 "scatter": {}
17- }
18+ },
19+ "tableHide": false
20 },
21 "settings": {
22 "params": {},
23@@ -23,17 +24,17 @@
24 "result": {
25 "code": "SUCCESS",
26 "type": "HTML",
27- "msg": "\u003ch2\u003eWelcome to Realtime Syslog Analytic Tutorial Powered by Juju.\u003c/h2\u003e\n\u003ch3\u003eIn this live tutorial we will demonstrat three main phases of any big data solution:\u003c/h3\u003e\n\u003ch4\u003e1. Data Ingestion - Apache Flume-syslog -\u003e Apache flume-hdfs\u003c/h4\u003e\n\u003ch4\u003e2. Data Processing - Apache Spark+YARN\u003c/h4\u003e\n\u003ch4\u003e3. Data Visualization - SparkSQL\u003c/h4\u003e\n"
28+ "msg": "\u003ch2\u003eWelcome to the Realtime Syslog Analytics tutorial, powered by Juju.\u003c/h2\u003e\n\u003ch3\u003eIn this live tutorial we will demonstrate three phases of a big data solution:\u003c/h3\u003e\n\u003ch4\u003e1. Data Ingestion: Flume-Syslog -\u003e Flume-HDFS\u003c/h4\u003e\n\u003ch4\u003e2. Data Processing: Spark+YARN\u003c/h4\u003e\n\u003ch4\u003e3. Data Visualization: SparkSQL+Zeppelin\u003c/h4\u003e\n"
29 },
30 "dateCreated": "Aug 20, 2015 3:14:39 PM",
31- "dateStarted": "Aug 25, 2015 9:34:23 AM",
32- "dateFinished": "Aug 25, 2015 9:34:23 AM",
33+ "dateStarted": "Sep 18, 2015 6:25:43 PM",
34+ "dateFinished": "Sep 18, 2015 6:25:43 PM",
35 "status": "FINISHED",
36 "progressUpdateIntervalMs": 500
37 },
38 {
39- "title": "Data Ingestion",
40- "text": "import sys.process._\n// Generate syslog messages by running an spakk\n\"/home/ubuntu/sparkpi.sh\" !!\n// Verify that FLume has collected and sent the syslog messages to HDFS\n\"hadoop fs -ls -R /user/flume/flume-syslog\" !!",
41+ "title": "Generate Data and Verify Ingestion",
42+ "text": "%sh\n# Generate syslog messages by trying to ssh to the hdfs-master unit.\n# This will likely result in a \u0027publickey denied\u0027 error, but it will\n# be enough to trigger a syslog event on the hdfs-master.\nfor i in `seq 1 10`;\ndo\n ssh -oStrictHostKeyChecking\u003dno hdfs-master-0 uptime \u003e/dev/null 2\u003e\u00261\n sleep 1\ndone\n\n# Check if Flume has collected and sent the syslog messages to HDFS.\n# If no output is seen from this command, wait a few minutes and try\n# again. The amount of time between Flume ingesting the event and it\n# being available in HDFS is controlled by the \u0027roll_interval\u0027\n# configuration option in the flume-hdfs charm.\nhadoop fs -ls -R /user/flume/flume-syslog | tail",
43 "config": {
44 "colWidth": 12.0,
45 "graph": {
46@@ -45,7 +46,9 @@
47 "groups": [],
48 "scatter": {}
49 },
50- "title": true
51+ "title": true,
52+ "tableHide": false,
53+ "editorHide": false
54 },
55 "settings": {
56 "params": {},
57@@ -56,16 +59,17 @@
58 "result": {
59 "code": "SUCCESS",
60 "type": "TEXT",
61- "msg": "" },
62+ "msg": "drwxr-xr-x - flume supergroup 0 2015-09-22 03:19 /user/flume/flume-syslog/2015-09-22\n-rw-r--r-- 3 flume supergroup 302 2015-09-22 03:12 /user/flume/flume-syslog/2015-09-22/FlumeData.1442891213622\n-rw-r--r-- 3 flume supergroup 2328 2015-09-22 03:19 /user/flume/flume-syslog/2015-09-22/FlumeData.1442891678998\n"
63+ },
64 "dateCreated": "Aug 20, 2015 6:09:43 PM",
65- "dateStarted": "Aug 24, 2015 10:51:34 PM",
66- "dateFinished": "Aug 24, 2015 10:52:11 PM",
67+ "dateStarted": "Sep 22, 2015 3:29:15 AM",
68+ "dateFinished": "Sep 22, 2015 3:29:28 AM",
69 "status": "FINISHED",
70 "progressUpdateIntervalMs": 500
71 },
72 {
73- "title": "Data Processing in python",
74- "text": "%pyspark\nsc.textFile(\"/user/flume/flume-syslog/*/*/*\").filter(lambda l: \"sshd\" in l).collect()",
75+ "title": "Simple Data Processing with Scala",
76+ "text": "// Output the number of sshd syslog events\nsc.textFile(\"/user/flume/flume-syslog/*/*\").filter(line \u003d\u003e line.contains(\"sshd\")).count()",
77 "config": {
78 "colWidth": 12.0,
79 "graph": {
80@@ -90,16 +94,17 @@
81 "result": {
82 "code": "SUCCESS",
83 "type": "TEXT",
84- "msg": "" },
85+ "msg": "res12: Long \u003d 40\n"
86+ },
87 "dateCreated": "Aug 20, 2015 6:11:00 PM",
88- "dateStarted": "Aug 24, 2015 10:54:10 PM",
89- "dateFinished": "Aug 24, 2015 10:54:15 PM",
90+ "dateStarted": "Sep 22, 2015 3:29:45 AM",
91+ "dateFinished": "Sep 22, 2015 3:29:46 AM",
92 "status": "FINISHED",
93 "progressUpdateIntervalMs": 500
94 },
95 {
96- "title": "Data Processing In Scala",
97- "text": "import org.joda.time.DateTime\nimport org.joda.time.format.{DateTimeFormatterBuilder, DateTimeFormat}\nimport scala.util.Try\nval reSystemLog \u003d \"\"\"^\\\u003c\\d+\\\u003e([A-Za-z0-9, ]+\\d{2}:\\d{2}:\\d{2}(?:\\.\\d{3})?)\\s+(\\S+)\\s+([^\\[]+)\\[(\\d+)\\]\\s*:?\\s*(.*)\"\"\".r\ncase class SyslogMessage(timestamp: String, host: Option[String], process: String, pid: Int, message: String)\n\nval lines \u003d sc.textFile(\"/user/flume/flume-syslog/*/*/*\")\nval events \u003d lines.flatMap {\n case reSystemLog(timestamp,hostname, proc, pidS, msg) \u003d\u003e\n for {pid \u003c- Try(pidS.toInt).toOption} yield SyslogMessage(timestamp,Some(hostname), proc, pid, msg)\n case _ \u003d\u003e None\n }.toDF()\n\nevents.registerTempTable(\"syslog\")\n",
98+ "title": "Data processing to enable future queries",
99+ "text": "import org.joda.time.DateTime\nimport org.joda.time.format.{DateTimeFormatterBuilder, DateTimeFormat}\nimport scala.util.Try\nval reSystemLog \u003d \"\"\"^\\\u003c\\d+\\\u003e([A-Za-z0-9, ]+\\d{2}:\\d{2}:\\d{2}(?:\\.\\d{3})?)\\s+(\\S+)\\s+([^\\[]+)\\[(\\d+)\\]\\s*:?\\s*(.*)\"\"\".r\ncase class SyslogMessage(timestamp: String, host: Option[String], process: String, pid: Int, message: String)\n\nval lines \u003d sc.textFile(\"/user/flume/flume-syslog/*/*\")\nval events \u003d lines.flatMap {\n case reSystemLog(timestamp,hostname, proc, pidS, msg) \u003d\u003e\n for {pid \u003c- Try(pidS.toInt).toOption} yield SyslogMessage(timestamp,Some(hostname), proc, pid, msg)\n case _ \u003d\u003e None\n }.toDF()\n\nevents.registerTempTable(\"syslog\")\n",
100 "config": {
101 "colWidth": 12.0,
102 "graph": {
103@@ -124,11 +129,11 @@
104 "result": {
105 "code": "SUCCESS",
106 "type": "TEXT",
107- "msg": "import org.joda.time.DateTime\nimport org.joda.time.format.{DateTimeFormatterBuilder, DateTimeFormat}\nimport scala.util.Try\nreSystemLog: scala.util.matching.Regex \u003d ^\\\u003c\\d+\\\u003e([A-Za-z0-9, ]+\\d{2}:\\d{2}:\\d{2}(?:\\.\\d{3})?)\\s+(\\S+)\\s+([^\\[]+)\\[(\\d+)\\]\\s*:?\\s*(.*)\ndefined class SyslogMessage\nlines: org.apache.spark.rdd.RDD[String] \u003d /user/flume/flume-syslog/*/*/* MapPartitionsRDD[509] at textFile at \u003cconsole\u003e:73\nevents: org.apache.spark.sql.DataFrame \u003d [timestamp: string, host: string, process: string, pid: int, message: string]\n"
108+ "msg": "import org.joda.time.DateTime\nimport org.joda.time.format.{DateTimeFormatterBuilder, DateTimeFormat}\nimport scala.util.Try\nreSystemLog: scala.util.matching.Regex \u003d ^\\\u003c\\d+\\\u003e([A-Za-z0-9, ]+\\d{2}:\\d{2}:\\d{2}(?:\\.\\d{3})?)\\s+(\\S+)\\s+([^\\[]+)\\[(\\d+)\\]\\s*:?\\s*(.*)\ndefined class SyslogMessage\nlines: org.apache.spark.rdd.RDD[String] \u003d /user/flume/flume-syslog/*/* MapPartitionsRDD[50] at textFile at \u003cconsole\u003e:31\nevents: org.apache.spark.sql.DataFrame \u003d [timestamp: string, host: string, process: string, pid: int, message: string]\n"
109 },
110 "dateCreated": "Aug 21, 2015 12:03:17 AM",
111- "dateStarted": "Aug 24, 2015 10:54:28 PM",
112- "dateFinished": "Aug 24, 2015 10:54:29 PM",
113+ "dateStarted": "Sep 22, 2015 3:23:23 AM",
114+ "dateFinished": "Sep 22, 2015 3:23:26 AM",
115 "status": "FINISHED",
116 "progressUpdateIntervalMs": 500
117 },
118@@ -169,7 +174,9 @@
119 }
120 }
121 },
122- "title": true
123+ "title": true,
124+ "tableHide": false,
125+ "editorHide": false
126 },
127 "settings": {
128 "params": {},
129@@ -180,26 +187,26 @@
130 "result": {
131 "code": "SUCCESS",
132 "type": "TABLE",
133- "msg": "process\tvalue\nCRON\t180\nntpdate\t1\nsshd\t6\nsu\t1\nsystemd-logind\t1\n"
134+ "msg": "process\tvalue\nCRON\t3\nsshd\t20\n"
135 },
136 "dateCreated": "Aug 24, 2015 10:31:38 PM",
137- "dateStarted": "Aug 24, 2015 10:54:37 PM",
138- "dateFinished": "Aug 24, 2015 10:54:41 PM",
139+ "dateStarted": "Sep 22, 2015 3:29:54 AM",
140+ "dateFinished": "Sep 22, 2015 3:29:57 AM",
141 "status": "FINISHED",
142 "progressUpdateIntervalMs": 500
143 },
144 {
145 "title": "Data Visualization",
146- "text": "%sql \nselect pid, count(1) value\nfrom syslog\nwhere pid \u003e 5000 and pid \u003c 20000 and timestamp \u003e ${maxDate\u003d\"Aug 24\"}\ngroup by pid \norder by pid\n",
147+ "text": "%sql \nselect host, count(1) value\nfrom syslog\nwhere timestamp \u003e ${maxDate\u003d\"Sep 15\"}\ngroup by host\n",
148 "config": {
149 "colWidth": 4.0,
150 "graph": {
151- "mode": "pieChart",
152+ "mode": "table",
153 "height": 300.0,
154 "optionOpen": false,
155 "keys": [
156 {
157- "name": "pid",
158+ "name": "host",
159 "index": 0.0,
160 "aggr": "sum"
161 }
162@@ -213,11 +220,6 @@
163 ],
164 "groups": [],
165 "scatter": {
166- "xAxis": {
167- "name": "pid",
168- "index": 0.0,
169- "aggr": "sum"
170- },
171 "yAxis": {
172 "name": "value",
173 "index": 1.0,
174@@ -225,14 +227,17 @@
175 }
176 }
177 },
178- "title": true
179+ "title": true,
180+ "tableHide": false
181 },
182 "settings": {
183- "params": {},
184+ "params": {
185+ "maxDate": "\"Sep 15\""
186+ },
187 "forms": {
188 "maxDate": {
189 "name": "maxDate",
190- "defaultValue": "\"Aug 24\"",
191+ "defaultValue": "\"Sep 15\"",
192 "hidden": false
193 }
194 }
195@@ -242,33 +247,33 @@
196 "result": {
197 "code": "SUCCESS",
198 "type": "TABLE",
199- "msg": "pid\tvalue\n5073\t2\n5074\t1\n5218\t2\n5219\t1\n5374\t2\n5375\t1\n5485\t2\n5881\t2\n5882\t1\n"
200+ "msg": "host\tvalue\nhdfs-master-0\t23\n"
201 },
202 "dateCreated": "Aug 21, 2015 1:11:17 AM",
203- "dateStarted": "Aug 24, 2015 10:54:43 PM",
204- "dateFinished": "Aug 24, 2015 10:54:45 PM",
205+ "dateStarted": "Sep 22, 2015 3:30:03 AM",
206+ "dateFinished": "Sep 22, 2015 3:30:05 AM",
207 "status": "FINISHED",
208 "progressUpdateIntervalMs": 500
209 },
210 {
211 "title": "Data Visualization",
212- "text": "%sql \nselect timestamp, count(1) value\nfrom syslog\nwhere timestamp \u003e ${maxDate\u003d\"Aug 24\"} and process \u003d\u003d \"sshd\"\ngroup by timestamp\norder by timestamp",
213+ "text": "%sql \nselect process, timestamp, message\nfrom syslog\nwhere timestamp \u003e ${maxDate\u003d\"Sep 15\"}\n",
214 "config": {
215 "colWidth": 4.0,
216 "graph": {
217- "mode": "pieChart",
218+ "mode": "table",
219 "height": 300.0,
220 "optionOpen": false,
221 "keys": [
222 {
223- "name": "timestamp",
224+ "name": "process",
225 "index": 0.0,
226 "aggr": "sum"
227 }
228 ],
229 "values": [
230 {
231- "name": "value",
232+ "name": "timestamp",
233 "index": 1.0,
234 "aggr": "sum"
235 }
236@@ -276,27 +281,23 @@
237 "groups": [],
238 "scatter": {
239 "xAxis": {
240- "name": "timestamp",
241+ "name": "process",
242 "index": 0.0,
243 "aggr": "sum"
244- },
245- "yAxis": {
246- "name": "value",
247- "index": 1.0,
248- "aggr": "sum"
249 }
250 }
251 },
252- "title": true
253+ "title": true,
254+ "tableHide": false
255 },
256 "settings": {
257 "params": {
258- "maxDate": "\"Aug 20\""
259+ "maxDate": "\"Sep 15\""
260 },
261 "forms": {
262 "maxDate": {
263 "name": "maxDate",
264- "defaultValue": "\"Aug 24\"",
265+ "defaultValue": "\"Sep 15\"",
266 "hidden": false
267 }
268 }
269@@ -306,32 +307,51 @@
270 "result": {
271 "code": "SUCCESS",
272 "type": "TABLE",
273- "msg": "timestamp\tvalue\nAug 21 11:20:45\t2\nAug 21 19:58:30\t2\nAug 24 21:59:47\t2\n"
274+ "msg": "process\ttimestamp\tmessage\nsshd\tSep 22 03:14:23\terror: Could not load host key: /etc/ssh/ssh_host_ed25519_key\nsshd\tSep 22 03:14:23\tConnection closed by 172.31.13.239 [preauth]\nsshd\tSep 22 03:14:24\terror: Could not load host key: /etc/ssh/ssh_host_ed25519_key\nsshd\tSep 22 03:14:24\tConnection closed by 172.31.13.239 [preauth]\nsshd\tSep 22 03:14:25\terror: Could not load host key: /etc/ssh/ssh_host_ed25519_key\nsshd\tSep 22 03:14:25\tConnection closed by 172.31.13.239 [preauth]\nsshd\tSep 22 03:14:26\terror: Could not load host key: /etc/ssh/ssh_host_ed25519_key\nsshd\tSep 22 03:14:26\tConnection closed by 172.31.13.239 [preauth]\nsshd\tSep 22 03:14:27\terror: Could not load host key: /etc/ssh/ssh_host_ed25519_key\nsshd\tSep 22 03:14:27\tConnection closed by 172.31.13.239 [preauth]\nsshd\tSep 22 03:14:28\terror: Could not load host key: /etc/ssh/ssh_host_ed25519_key\nsshd\tSep 22 03:14:28\tConnection closed by 172.31.13.239 [preauth]\nsshd\tSep 22 03:14:29\terror: Could not load host key: /etc/ssh/ssh_host_ed25519_key\nsshd\tSep 22 03:14:29\tConnection closed by 172.31.13.239 [preauth]\nsshd\tSep 22 03:14:30\terror: Could not load host key: /etc/ssh/ssh_host_ed25519_key\nsshd\tSep 22 03:14:30\tConnection closed by 172.31.13.239 [preauth]\nsshd\tSep 22 03:14:31\terror: Could not load host key: /etc/ssh/ssh_host_ed25519_key\nsshd\tSep 22 03:14:32\tConnection closed by 172.31.13.239 [preauth]\nsshd\tSep 22 03:14:33\terror: Could not load host key: /etc/ssh/ssh_host_ed25519_key\nsshd\tSep 22 03:14:33\tConnection closed by 172.31.13.239 [preauth]\nCRON\tSep 22 03:17:01\tpam_unix(cron:session): session opened for user root by (uid\u003d0)\nCRON\tSep 22 03:17:01\t(root) CMD ( cd / \u0026\u0026 run-parts --report /etc/cron.hourly)\nCRON\tSep 22 03:17:01\tpam_unix(cron:session): session closed for user root\n"
275 },
276 "dateCreated": "Aug 21, 2015 8:29:46 AM",
277- "dateStarted": "Aug 24, 2015 10:54:54 PM",
278- "dateFinished": "Aug 24, 2015 10:54:55 PM",
279+ "dateStarted": "Sep 22, 2015 3:30:26 AM",
280+ "dateFinished": "Sep 22, 2015 3:30:26 AM",
281 "status": "FINISHED",
282 "progressUpdateIntervalMs": 500
283 },
284 {
285- "config": {},
286+ "text": "",
287+ "config": {
288+ "colWidth": 12.0,
289+ "graph": {
290+ "mode": "table",
291+ "height": 300.0,
292+ "optionOpen": false,
293+ "keys": [],
294+ "values": [],
295+ "groups": [],
296+ "scatter": {}
297+ },
298+ "tableHide": false
299+ },
300 "settings": {
301 "params": {},
302 "forms": {}
303 },
304 "jobName": "paragraph_1440473909272_653880463",
305 "id": "20150824-223829_186145308",
306+ "result": {
307+ "code": "SUCCESS",
308+ "type": "TEXT"
309+ },
310 "dateCreated": "Aug 24, 2015 10:38:29 PM",
311- "status": "READY",
312+ "dateStarted": "Sep 18, 2015 5:59:44 PM",
313+ "dateFinished": "Sep 18, 2015 6:03:23 PM",
314+ "status": "FINISHED",
315 "progressUpdateIntervalMs": 500
316 }
317 ],
318- "name": "Real-time Analytic Tutorial",
319+ "name": "Zeppelin Flume/HDFS Tutorial",
320 "id": "flume-tutorial",
321 "angularObjects": {},
322 "config": {
323 "looknfeel": "default"
324 },
325 "info": {}
326-}
327+}
328\ No newline at end of file
329
330=== modified file 'tests/100-deploy-spark-hdfs-yarn'
331--- tests/100-deploy-spark-hdfs-yarn 2015-09-16 21:28:31 +0000
332+++ tests/100-deploy-spark-hdfs-yarn 2015-09-22 03:37:16 +0000
333@@ -34,7 +34,7 @@
334
335 cls.d.setup(timeout=3600)
336 cls.d.sentry.wait(timeout=3600)
337- cls.unit = cls.d.sentry.unit['zeppelin/0']
338+ cls.unit = cls.d.sentry.unit['spark/0']
339
340 ###########################################################################
341 # Validate that the Spark HistoryServer is running

Subscribers

People subscribed via source and target branches