Solution Architecture

Deployment

Every component was deployed using Kubernetes
We are using ArgoCD as our CD tool and also to monitor our applications

Tools used in this project

Terraform (read 'GKE/AKS deployment')
MinIO
ArgoCD
Docker (registry and containerization)
Kubernetes
Apache Spark
Apache Airflow
Delta Lake
Great Expectations (coming soon)
Potgress (coming soon)

Requirements

First you have setted up your environment and your cluster, make sure you have installed:

kubernetes cli
kubens & kubectx
an available cluster to interact with (In our case AKS) (you can deploy your own using the instructions passed inside /iac/aks/akd_dev/readme.md)
helm
ArgoCD CLI
GitBash (if you are using windows)

Creating Namespaces on your cluster

By doing this you can divide your resources logically If you already know how Azure works, it will bring the same facilities that a resource group brings to you

kubectl create namespace orchestrator
kubectl create namespace processing
kubectl create namespace deepstorage
kubectl create namespace cicd

Adding HELM repos

Here we are going to add the helm repos that we are going to use in this solution

helm repo add apache-airflow https://airflow.apache.org/
helm repo add spark-operator https://googlecloudplatform.github.io/spark-on-k8s-operator
helm repo add argo https://argoproj.github.io/argo-helm
helm repo add minio https://operator.min.io/ 
helm repo update

Argo CD

We are going to use ArgoCD as our CD tool, by doing this we can:

Garentee the desired state on our cluster accordingly with the properties passed on the file inside our git respository
Work with GitOps

Argo CD - Installation

helm install argocd argo/argo-cd --namespace cicd --version 3.26.8

If you want to expose this service through a load balancer

Argo CD - Login - Port-Forward

1. Run

kubectl port-forward service/argocd-server -n cicd 8080:443

open the browser on http://localhost:8080 and accept the certificate

1. After reaching the UI the first time you can login with username: admin and the random password generated during the installation. You can find the password by running:

kubectl -n cicd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d

Argo CD - Login - Load Balancer

kubectl patch svc argocd-server -n cicd -p '{"spec": {"type": "LoadBalancer"}}'

Once you retrieve your load balancer ip

kubens cicd && kubectl get services -l app.kubernetes.io/name=argocd-server,app.kubernetes.io/instance=argocd -o jsonpath="{.items[0].status.loadBalancer.ingress[0].ip}"

Now you need to get the password to login into argocd UI

ARGOCD_LB=<your-loadbalancer-ip>
kubens cicd && kubectl get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d | xargs -t -I {} argocd login $ARGOCD_LB --username admin --password {} --insecure

Hints ArgoCD - Windows user

If you are using windows probably the instructions above will not work for you. So instead of using these commands try to:

Apply the yaml "argo-load-balancer.yaml" and then get the ip using get services

kubectl apply -f argo-load-balancer.yaml
kubectl get services -n cicd

Once you have access you the ArgoCD UI, you will need to get the password for the admin account. So you can try to:

Open your GitBash

kubectl -n cicd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d

If you want to access your ArgoCD without the LoadBalancer you can use port-foward too, like this:

kubectl port-forward service/argocd-server -n cicd 8080:443

Setting Up ArgoCD

Login into your ArgoCD

kubectl get services
argocd login <your-loadbalancer-ip>

Now create cluster role binding for admin user [sa]

kubectl create clusterrolebinding cluster-admin-binding --clusterrole=cluster-admin --user=system:serviceaccount:cicd:argocd-application-controller -n cicd

Register your cluster in ArgoCD

CLUSTER=<your-cluster-name>
argocd cluster add $CLUSTER --in-cluster

Register your repository in ArgoCD

argocd repo add https://github.com/ntc-Felix/energy-study-case --username <username> --password <password>

Installing MinIO Operator with ArgoCD

kubectl apply -f ./repository/app-manifests/deepstorage/minio-operator.yaml

Installing MinIO app With ArgoCD

kubectl apply -f ./repository/app-manifests/deepstorage/minio.yaml

Installing Spark Operator

helm repo add spark-operator https://googlecloudplatform.github.io/

helm install spark-operator spark-operator/spark-operator --namespace processing --set image.tag=v1beta2-1.3.3-3.1.1,enableWebhook=true,logLevel=3

Apply cluster role binding to ensure permissions

kubectl apply -f ./repository/yamls/spark-operator/crb-spark-operator-processing.yaml

Add Spark Applications to your cluster

staging area

kubectl apply -f ./energy-spark/staging/diesel/load_to_staging_diesel.yaml
kubectl apply -f ./energy-spark/staging/oil/load_to_staging_oil.yaml

bronze area

kubectl apply -f ./energy-spark/bronze/diesel/load_to_bronze_diesel.yaml
kubectl apply -f ./energy-spark/bronze/oil/load_to_bronze_oil.yaml

silver area

kubectl apply -f ./energy-spark/silver/load_to_silver.yaml

gold area

kubectl apply -f ./energy-spark/gold/load_to_gold.yaml

Installing AIRFLOW

install airflow with helm pattern

kubectl apply -f ./repository/app-manifests/orchestrator/airflow-helm.yaml

Use port-forward to access the UI

k port-forward services/airflow-webserver 8000:8080 -n orchestrator

Garantee access to Spark from Airflow

kubectl apply -f ./repository/yamls/airflow/crb-spark-operator-airflow-orchestrator.yaml
kubectl apply -f ./repository/yamls/airflow/crb-spark-operator-airflow-processing.yaml

Make your Kubernetes Connection by accessing the tab "Admin" > "Connections"
- name it as "kubeconnect"
- select the service Kubernetes
- mark the box "in-cluster"

Great Expecations

Duplication checks
Not-null checks
Format checks
Consistency checks
Distribution/Anomaly checks
Freshness checks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Solution Architecture

Deployment

Tools used in this project

Requirements

Creating Namespaces on your cluster

Adding HELM repos

Argo CD

Argo CD - Installation

Argo CD - Login - Port-Forward

Argo CD - Login - Load Balancer

Hints ArgoCD - Windows user

Setting Up ArgoCD

Installing MinIO Operator with ArgoCD

Installing MinIO app With ArgoCD

Installing Spark Operator

Add Spark Applications to your cluster

Installing AIRFLOW

Great Expecations

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
docker/airflow/yaml		docker/airflow/yaml
docs		docs
energy-spark		energy-spark
iac		iac
repository		repository
.gitignore		.gitignore
readme.md		readme.md

felixroz/energy-study-case

Folders and files

Latest commit

History

Repository files navigation

Solution Architecture

Deployment

Tools used in this project

Requirements

Creating Namespaces on your cluster

Adding HELM repos

Argo CD

Argo CD - Installation

Argo CD - Login - Port-Forward

Argo CD - Login - Load Balancer

Hints ArgoCD - Windows user

Setting Up ArgoCD

Installing MinIO Operator with ArgoCD

Installing MinIO app With ArgoCD

Installing Spark Operator

Add Spark Applications to your cluster

Installing AIRFLOW

Great Expecations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages