Big Data Development Kit (Hadoop / Spark / Zeppelin / IntelliJ) in Amazon AWS
Usage:
Launch an instance on Amazon WebServices
Specify in the initialization of the vm
#!/bin/bash
wget -O- http://bit.ly/1PjnNB3 |\
PASSWORD="changeme"\
TOKEN="duck-dns-token"\
HOST="duckdnshost"\
EMAIL="your@email"\
bash
NOTE!
- You need to specify Amazon Linux 64 bit
- You need at least an image small (2g) better 4g for running all the services
- You need at least 20 GB space
- Create a secondary block on sdb and the /app folder with all your data will be mounted there, thus preserved from termination
- change the password (change the string within the quotes with your password) to the one you want
- register an hostname in www.duckdns.org and get the token, and replace them in the TOKEN and HOST variables
- specify your email (user for let's encrypt service)
- if you have a backup of let's encrypt (for example in Dropbox) specify LETGZ="...." to the url of your backup
- before you launch the instance add a rule to open the HTTPS ports to the world
Docker kit for Hadoop, Spark and Zeppelin
Devenv with IntelliJ, SBT and Ammonite accessible via web
First, get a docker machine and configure your docker to access it. Refer to docker documentation to learn how to do it.
The script sh build.sh <password>
builds the enviroment.
Start it with docker-compose up -d
.
That is all.
Access the shell with http://youserver:3000 and the desktop with http://yourserver:6080
In the kit there is Intellij free edition, a terminal with sbt and ammonite
Inside the kit you have also Zeppelin, internally accessible as
http://zeppelin.loc:8000, Hadoop accessible as hdfs://hadoop.loc:8020 and Spark on http://spark.loc
(to fix) You can also ssh (without password) on hadoop.loc, spark.loc and zeppelin.loc