Juju Charms Collection
apache-zeppelin package

Merge lp:~bigdata-dev/charms/trusty/apache-zeppelin/trunk into lp:charms/trusty/apache-zeppelin

Trusty Tahr (14.04)
trunk
Merge into trunk

Proposed by Kevin W Monroe on 2015-09-22

Status:	Merged
Merged at revision:	19
Proposed branch:	lp:~bigdata-dev/charms/trusty/apache-zeppelin/trunk
Merge into:	lp:charms/trusty/apache-zeppelin
Diff against target:	341 lines (+80/-60) 2 files modified resources/flume-tutorial/note.json (+79/-59) tests/100-deploy-spark-hdfs-yarn (+1/-1)
To merge this branch:	bzr merge lp:~bigdata-dev/charms/trusty/apache-zeppelin/trunk
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
Kevin W Monroe			Approve on 2015-09-22
Review via email: mp+271903@code.launchpad.net

Revision history for this message

Kevin W Monroe (kwmonroe) on 2015-09-22:

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Big Data Charmers

Juju Big Data Development

Juju Charms Collection
apache-zeppelin package

Merge lp:~bigdata-dev/charms/trusty/apache-zeppelin/trunk into lp:charms/trusty/apache-zeppelin

Commit message

Description of the change

Preview Diff

Subscribers

 === modified file 'resources/flume-tutorial/note.json'
 --- resources/flume-tutorial/note.json	2015-08-26 12:27:56 +0000
 +++ resources/flume-tutorial/note.json	2015-09-22 03:37:16 +0000
@@ -1,7 +1,7 @@
+ {
    "paragraphs": [
+     {
--      "text": "%md\n## Welcome to Realtime Syslog Analytic Tutorial Powered by Juju.\n### In this live tutorial we will demonstrat three main phases of any big data solution:\n#### 1. Data Ingestion - Apache Flume-syslog -\u003e Apache flume-hdfs\n#### 2. Data Processing - Apache Spark+YARN\n#### 3. Data Visualization - SparkSQL",
++      "text": "%md\n## Welcome to the Realtime Syslog Analytics tutorial, powered by Juju.\n### In this live tutorial we will demonstrate three phases of a big data solution:\n#### 1. Data Ingestion: Flume-Syslog -\u003e Flume-HDFS\n#### 2. Data Processing: Spark+YARN\n#### 3. Data Visualization: SparkSQL+Zeppelin",
        "config": {
          "colWidth": 12.0,
          "graph": {
@@ -12,7 +12,8 @@
            "values": [],
            "groups": [],
            "scatter": {}
--        }
++        },
++        "tableHide": false
        },
        "settings": {
          "params": {},
@@ -23,17 +24,17 @@
        "result": {
          "code": "SUCCESS",
          "type": "HTML",
--        "msg": "\u003ch2\u003eWelcome to Realtime Syslog Analytic Tutorial Powered by Juju.\u003c/h2\u003e\n\u003ch3\u003eIn this live tutorial we will demonstrat three main phases of any big data solution:\u003c/h3\u003e\n\u003ch4\u003e1. Data Ingestion - Apache Flume-syslog -\u003e Apache flume-hdfs\u003c/h4\u003e\n\u003ch4\u003e2. Data Processing - Apache Spark+YARN\u003c/h4\u003e\n\u003ch4\u003e3. Data Visualization - SparkSQL\u003c/h4\u003e\n"
++        "msg": "\u003ch2\u003eWelcome to the Realtime Syslog Analytics tutorial, powered by Juju.\u003c/h2\u003e\n\u003ch3\u003eIn this live tutorial we will demonstrate three phases of a big data solution:\u003c/h3\u003e\n\u003ch4\u003e1. Data Ingestion: Flume-Syslog -\u003e Flume-HDFS\u003c/h4\u003e\n\u003ch4\u003e2. Data Processing: Spark+YARN\u003c/h4\u003e\n\u003ch4\u003e3. Data Visualization: SparkSQL+Zeppelin\u003c/h4\u003e\n"
        },
        "dateCreated": "Aug 20, 2015 3:14:39 PM",
--      "dateStarted": "Aug 25, 2015 9:34:23 AM",
--      "dateFinished": "Aug 25, 2015 9:34:23 AM",
++      "dateStarted": "Sep 18, 2015 6:25:43 PM",
++      "dateFinished": "Sep 18, 2015 6:25:43 PM",
        "status": "FINISHED",
        "progressUpdateIntervalMs": 500
      },
+     {
--      "title": "Data Ingestion",
--      "text": "import sys.process._\n// Generate syslog messages by running an spakk\n\"/home/ubuntu/sparkpi.sh\" !!\n// Verify that FLume has collected and sent the syslog messages to HDFS\n\"hadoop fs -ls -R /user/flume/flume-syslog\" !!",
++      "title": "Generate Data and Verify Ingestion",
++      "text": "%sh\n# Generate syslog messages by trying to ssh to the hdfs-master unit.\n# This will likely result in a \u0027publickey denied\u0027 error, but it will\n# be enough to trigger a syslog event on the hdfs-master.\nfor i in `seq 1 10`;\ndo\n  ssh -oStrictHostKeyChecking\u003dno hdfs-master-0 uptime \u003e/dev/null 2\u003e\u00261\n  sleep 1\ndone\n\n# Check if Flume has collected and sent the syslog messages to HDFS.\n# If no output is seen from this command, wait a few minutes and try\n# again. The amount of time between Flume ingesting the event and it\n# being available in HDFS is controlled by the \u0027roll_interval\u0027\n# configuration option in the flume-hdfs charm.\nhadoop fs -ls -R /user/flume/flume-syslog | tail",
        "config": {
          "colWidth": 12.0,
          "graph": {
@@ -45,7 +46,9 @@
            "groups": [],
            "scatter": {}
          },
--        "title": true
++        "title": true,
++        "tableHide": false,
++        "editorHide": false
        },
        "settings": {
          "params": {},
@@ -56,16 +59,17 @@
        "result": {
          "code": "SUCCESS",
          "type": "TEXT",
--        "msg": ""      },
++        "msg": "drwxr-xr-x   - flume supergroup          0 2015-09-22 03:19 /user/flume/flume-syslog/2015-09-22\n-rw-r--r--   3 flume supergroup        302 2015-09-22 03:12 /user/flume/flume-syslog/2015-09-22/FlumeData.1442891213622\n-rw-r--r--   3 flume supergroup       2328 2015-09-22 03:19 /user/flume/flume-syslog/2015-09-22/FlumeData.1442891678998\n"
++      },
        "dateCreated": "Aug 20, 2015 6:09:43 PM",
--      "dateStarted": "Aug 24, 2015 10:51:34 PM",
--      "dateFinished": "Aug 24, 2015 10:52:11 PM",
++      "dateStarted": "Sep 22, 2015 3:29:15 AM",
++      "dateFinished": "Sep 22, 2015 3:29:28 AM",
        "status": "FINISHED",
        "progressUpdateIntervalMs": 500
      },
+     {
--      "title": "Data Processing in python",
--      "text": "%pyspark\nsc.textFile(\"/user/flume/flume-syslog/*/*/*\").filter(lambda l: \"sshd\" in l).collect()",
++      "title": "Simple Data Processing with Scala",
++      "text": "// Output the number of sshd syslog events\nsc.textFile(\"/user/flume/flume-syslog/*/*\").filter(line \u003d\u003e line.contains(\"sshd\")).count()",
        "config": {
          "colWidth": 12.0,
          "graph": {
@@ -90,16 +94,17 @@
        "result": {
          "code": "SUCCESS",
          "type": "TEXT",
--        "msg": "" },
++        "msg": "res12: Long \u003d 40\n"
++      },
        "dateCreated": "Aug 20, 2015 6:11:00 PM",
--      "dateStarted": "Aug 24, 2015 10:54:10 PM",
--      "dateFinished": "Aug 24, 2015 10:54:15 PM",
++      "dateStarted": "Sep 22, 2015 3:29:45 AM",
++      "dateFinished": "Sep 22, 2015 3:29:46 AM",
        "status": "FINISHED",
        "progressUpdateIntervalMs": 500
      },
+     {
--      "title": "Data Processing In Scala",
--      "text": "import org.joda.time.DateTime\nimport org.joda.time.format.{DateTimeFormatterBuilder, DateTimeFormat}\nimport scala.util.Try\nval reSystemLog \u003d \"\"\"^\\\u003c\\d+\\\u003e([A-Za-z0-9, ]+\\d{2}:\\d{2}:\\d{2}(?:\\.\\d{3})?)\\s+(\\S+)\\s+([^\\[]+)\\[(\\d+)\\]\\s*:?\\s*(.*)\"\"\".r\ncase class SyslogMessage(timestamp: String, host: Option[String], process: String, pid: Int, message: String)\n\nval lines \u003d sc.textFile(\"/user/flume/flume-syslog/*/*/*\")\nval events \u003d lines.flatMap {\n      case reSystemLog(timestamp,hostname, proc, pidS, msg) \u003d\u003e\n        for {pid \u003c- Try(pidS.toInt).toOption} yield SyslogMessage(timestamp,Some(hostname), proc, pid, msg)\n      case _ \u003d\u003e None\n    }.toDF()\n\nevents.registerTempTable(\"syslog\")\n",
++      "title": "Data processing to enable future queries",
++      "text": "import org.joda.time.DateTime\nimport org.joda.time.format.{DateTimeFormatterBuilder, DateTimeFormat}\nimport scala.util.Try\nval reSystemLog \u003d \"\"\"^\\\u003c\\d+\\\u003e([A-Za-z0-9, ]+\\d{2}:\\d{2}:\\d{2}(?:\\.\\d{3})?)\\s+(\\S+)\\s+([^\\[]+)\\[(\\d+)\\]\\s*:?\\s*(.*)\"\"\".r\ncase class SyslogMessage(timestamp: String, host: Option[String], process: String, pid: Int, message: String)\n\nval lines \u003d sc.textFile(\"/user/flume/flume-syslog/*/*\")\nval events \u003d lines.flatMap {\n      case reSystemLog(timestamp,hostname, proc, pidS, msg) \u003d\u003e\n        for {pid \u003c- Try(pidS.toInt).toOption} yield SyslogMessage(timestamp,Some(hostname), proc, pid, msg)\n      case _ \u003d\u003e None\n    }.toDF()\n\nevents.registerTempTable(\"syslog\")\n",
        "config": {
          "colWidth": 12.0,
          "graph": {
@@ -124,11 +129,11 @@
        "result": {
          "code": "SUCCESS",
          "type": "TEXT",
--        "msg": "import org.joda.time.DateTime\nimport org.joda.time.format.{DateTimeFormatterBuilder, DateTimeFormat}\nimport scala.util.Try\nreSystemLog: scala.util.matching.Regex \u003d ^\\\u003c\\d+\\\u003e([A-Za-z0-9, ]+\\d{2}:\\d{2}:\\d{2}(?:\\.\\d{3})?)\\s+(\\S+)\\s+([^\\[]+)\\[(\\d+)\\]\\s*:?\\s*(.*)\ndefined class SyslogMessage\nlines: org.apache.spark.rdd.RDD[String] \u003d /user/flume/flume-syslog/*/*/* MapPartitionsRDD[509] at textFile at \u003cconsole\u003e:73\nevents: org.apache.spark.sql.DataFrame \u003d [timestamp: string, host: string, process: string, pid: int, message: string]\n"
++        "msg": "import org.joda.time.DateTime\nimport org.joda.time.format.{DateTimeFormatterBuilder, DateTimeFormat}\nimport scala.util.Try\nreSystemLog: scala.util.matching.Regex \u003d ^\\\u003c\\d+\\\u003e([A-Za-z0-9, ]+\\d{2}:\\d{2}:\\d{2}(?:\\.\\d{3})?)\\s+(\\S+)\\s+([^\\[]+)\\[(\\d+)\\]\\s*:?\\s*(.*)\ndefined class SyslogMessage\nlines: org.apache.spark.rdd.RDD[String] \u003d /user/flume/flume-syslog/*/* MapPartitionsRDD[50] at textFile at \u003cconsole\u003e:31\nevents: org.apache.spark.sql.DataFrame \u003d [timestamp: string, host: string, process: string, pid: int, message: string]\n"
        },
        "dateCreated": "Aug 21, 2015 12:03:17 AM",
--      "dateStarted": "Aug 24, 2015 10:54:28 PM",
--      "dateFinished": "Aug 24, 2015 10:54:29 PM",
++      "dateStarted": "Sep 22, 2015 3:23:23 AM",
++      "dateFinished": "Sep 22, 2015 3:23:26 AM",
        "status": "FINISHED",
        "progressUpdateIntervalMs": 500
      },
@@ -169,7 +174,9 @@
+             }
+           }
          },
--        "title": true
++        "title": true,
++        "tableHide": false,
++        "editorHide": false
        },
        "settings": {
          "params": {},
@@ -180,26 +187,26 @@
        "result": {
          "code": "SUCCESS",
          "type": "TABLE",
--        "msg": "process\tvalue\nCRON\t180\nntpdate\t1\nsshd\t6\nsu\t1\nsystemd-logind\t1\n"
++        "msg": "process\tvalue\nCRON\t3\nsshd\t20\n"
        },
        "dateCreated": "Aug 24, 2015 10:31:38 PM",
--      "dateStarted": "Aug 24, 2015 10:54:37 PM",
--      "dateFinished": "Aug 24, 2015 10:54:41 PM",
++      "dateStarted": "Sep 22, 2015 3:29:54 AM",
++      "dateFinished": "Sep 22, 2015 3:29:57 AM",
        "status": "FINISHED",
        "progressUpdateIntervalMs": 500
      },
+     {
        "title": "Data Visualization",
--      "text": "%sql \nselect pid, count(1) value\nfrom syslog\nwhere pid \u003e 5000 and pid \u003c 20000 and timestamp \u003e ${maxDate\u003d\"Aug 24\"}\ngroup by pid \norder by pid\n",
++      "text": "%sql \nselect host, count(1) value\nfrom syslog\nwhere timestamp \u003e ${maxDate\u003d\"Sep 15\"}\ngroup by host\n",
        "config": {
          "colWidth": 4.0,
          "graph": {
--          "mode": "pieChart",
++          "mode": "table",
            "height": 300.0,
            "optionOpen": false,
            "keys": [
+             {
--              "name": "pid",
++              "name": "host",
                "index": 0.0,
                "aggr": "sum"
+             }
@@ -213,11 +220,6 @@
            ],
            "groups": [],
            "scatter": {
--            "xAxis": {
--              "name": "pid",
--              "index": 0.0,
--              "aggr": "sum"
--            },
              "yAxis": {
                "name": "value",
                "index": 1.0,
@@ -225,14 +227,17 @@
+             }
+           }
          },
--        "title": true
++        "title": true,
++        "tableHide": false
        },
        "settings": {
--        "params": {},
++        "params": {
++          "maxDate": "\"Sep 15\""
++        },
          "forms": {
            "maxDate": {
              "name": "maxDate",
--            "defaultValue": "\"Aug 24\"",
++            "defaultValue": "\"Sep 15\"",
              "hidden": false
+           }
+         }
@@ -242,33 +247,33 @@
        "result": {
          "code": "SUCCESS",
          "type": "TABLE",
--        "msg": "pid\tvalue\n5073\t2\n5074\t1\n5218\t2\n5219\t1\n5374\t2\n5375\t1\n5485\t2\n5881\t2\n5882\t1\n"
++        "msg": "host\tvalue\nhdfs-master-0\t23\n"
        },
        "dateCreated": "Aug 21, 2015 1:11:17 AM",
--      "dateStarted": "Aug 24, 2015 10:54:43 PM",
--      "dateFinished": "Aug 24, 2015 10:54:45 PM",
++      "dateStarted": "Sep 22, 2015 3:30:03 AM",
++      "dateFinished": "Sep 22, 2015 3:30:05 AM",
        "status": "FINISHED",
        "progressUpdateIntervalMs": 500
      },
+     {
        "title": "Data Visualization",
--      "text": "%sql \nselect timestamp, count(1) value\nfrom syslog\nwhere timestamp \u003e ${maxDate\u003d\"Aug 24\"} and process \u003d\u003d \"sshd\"\ngroup by timestamp\norder by timestamp",
++      "text": "%sql \nselect process, timestamp, message\nfrom syslog\nwhere timestamp \u003e ${maxDate\u003d\"Sep 15\"}\n",
        "config": {
          "colWidth": 4.0,
          "graph": {
--          "mode": "pieChart",
++          "mode": "table",
            "height": 300.0,
            "optionOpen": false,
            "keys": [
+             {
--              "name": "timestamp",
++              "name": "process",
                "index": 0.0,
                "aggr": "sum"
+             }
            ],
            "values": [
+             {
--              "name": "value",
++              "name": "timestamp",
                "index": 1.0,
                "aggr": "sum"
+             }
@@ -276,27 +281,23 @@
            "groups": [],
            "scatter": {
              "xAxis": {
--              "name": "timestamp",
++              "name": "process",
                "index": 0.0,
                "aggr": "sum"
--            },
--            "yAxis": {
--              "name": "value",
--              "index": 1.0,
--              "aggr": "sum"
+             }
+           }
          },
--        "title": true
++        "title": true,
++        "tableHide": false
        },
        "settings": {
          "params": {
--          "maxDate": "\"Aug 20\""
++          "maxDate": "\"Sep 15\""
          },
          "forms": {
            "maxDate": {
              "name": "maxDate",
--            "defaultValue": "\"Aug 24\"",
++            "defaultValue": "\"Sep 15\"",
              "hidden": false
+           }
+         }
@@ -306,32 +307,51 @@
        "result": {
          "code": "SUCCESS",
          "type": "TABLE",
--        "msg": "timestamp\tvalue\nAug 21 11:20:45\t2\nAug 21 19:58:30\t2\nAug 24 21:59:47\t2\n"
++        "msg": "process\ttimestamp\tmessage\nsshd\tSep 22 03:14:23\terror: Could not load host key: /etc/ssh/ssh_host_ed25519_key\nsshd\tSep 22 03:14:23\tConnection closed by 172.31.13.239 [preauth]\nsshd\tSep 22 03:14:24\terror: Could not load host key: /etc/ssh/ssh_host_ed25519_key\nsshd\tSep 22 03:14:24\tConnection closed by 172.31.13.239 [preauth]\nsshd\tSep 22 03:14:25\terror: Could not load host key: /etc/ssh/ssh_host_ed25519_key\nsshd\tSep 22 03:14:25\tConnection closed by 172.31.13.239 [preauth]\nsshd\tSep 22 03:14:26\terror: Could not load host key: /etc/ssh/ssh_host_ed25519_key\nsshd\tSep 22 03:14:26\tConnection closed by 172.31.13.239 [preauth]\nsshd\tSep 22 03:14:27\terror: Could not load host key: /etc/ssh/ssh_host_ed25519_key\nsshd\tSep 22 03:14:27\tConnection closed by 172.31.13.239 [preauth]\nsshd\tSep 22 03:14:28\terror: Could not load host key: /etc/ssh/ssh_host_ed25519_key\nsshd\tSep 22 03:14:28\tConnection closed by 172.31.13.239 [preauth]\nsshd\tSep 22 03:14:29\terror: Could not load host key: /etc/ssh/ssh_host_ed25519_key\nsshd\tSep 22 03:14:29\tConnection closed by 172.31.13.239 [preauth]\nsshd\tSep 22 03:14:30\terror: Could not load host key: /etc/ssh/ssh_host_ed25519_key\nsshd\tSep 22 03:14:30\tConnection closed by 172.31.13.239 [preauth]\nsshd\tSep 22 03:14:31\terror: Could not load host key: /etc/ssh/ssh_host_ed25519_key\nsshd\tSep 22 03:14:32\tConnection closed by 172.31.13.239 [preauth]\nsshd\tSep 22 03:14:33\terror: Could not load host key: /etc/ssh/ssh_host_ed25519_key\nsshd\tSep 22 03:14:33\tConnection closed by 172.31.13.239 [preauth]\nCRON\tSep 22 03:17:01\tpam_unix(cron:session): session opened for user root by (uid\u003d0)\nCRON\tSep 22 03:17:01\t(root) CMD (   cd / \u0026\u0026 run-parts --report /etc/cron.hourly)\nCRON\tSep 22 03:17:01\tpam_unix(cron:session): session closed for user root\n"
        },
        "dateCreated": "Aug 21, 2015 8:29:46 AM",
--      "dateStarted": "Aug 24, 2015 10:54:54 PM",
--      "dateFinished": "Aug 24, 2015 10:54:55 PM",
++      "dateStarted": "Sep 22, 2015 3:30:26 AM",
++      "dateFinished": "Sep 22, 2015 3:30:26 AM",
        "status": "FINISHED",
        "progressUpdateIntervalMs": 500
      },
+     {
--      "config": {},
++      "text": "",
++      "config": {
++        "colWidth": 12.0,
++        "graph": {
++          "mode": "table",
++          "height": 300.0,
++          "optionOpen": false,
++          "keys": [],
++          "values": [],
++          "groups": [],
++          "scatter": {}
++        },
++        "tableHide": false
++      },
        "settings": {
          "params": {},
          "forms": {}
        },
        "jobName": "paragraph_1440473909272_653880463",
        "id": "20150824-223829_186145308",
++      "result": {
++        "code": "SUCCESS",
++        "type": "TEXT"
++      },
        "dateCreated": "Aug 24, 2015 10:38:29 PM",
--      "status": "READY",
++      "dateStarted": "Sep 18, 2015 5:59:44 PM",
++      "dateFinished": "Sep 18, 2015 6:03:23 PM",
++      "status": "FINISHED",
        "progressUpdateIntervalMs": 500
+     }
    ],
--  "name": "Real-time Analytic Tutorial",
++  "name": "Zeppelin Flume/HDFS Tutorial",
    "id": "flume-tutorial",
    "angularObjects": {},
    "config": {
      "looknfeel": "default"
    },
    "info": {}
--}
++}
 \ No newline at end of file
 === modified file 'tests/100-deploy-spark-hdfs-yarn'
 --- tests/100-deploy-spark-hdfs-yarn	2015-09-16 21:28:31 +0000
 +++ tests/100-deploy-spark-hdfs-yarn	2015-09-22 03:37:16 +0000
@@ -34,7 +34,7 @@
          cls.d.setup(timeout=3600)
          cls.d.sentry.wait(timeout=3600)
--        cls.unit = cls.d.sentry.unit['zeppelin/0']
++        cls.unit = cls.d.sentry.unit['spark/0']
  ###########################################################################
  # Validate that the Spark HistoryServer is running

Juju Charms Collectionapache-zeppelin package

Merge lp:~bigdata-dev/charms/trusty/apache-zeppelin/trunk into lp:charms/trusty/apache-zeppelin

Commit message

Description of the change

Preview Diff

Subscribers

Juju Charms Collection
apache-zeppelin package