Docker and Kubernetes

In order to practice, I rephrase the post from

Kubernetes is an open-source framework for container cluster management based on Docker.

As a developer, one could use Kubernetes on their cloud clusters to achieve faster and reliable deployment. Also, Kubernetes supports auto-scaling (load balancing), which is important for production.

What will be covered

  • Use Docker to build and run images.
  • Create a cluster on AWS or GCP.
  • Deploy Docker images to AWS or GCP.
  • Kubernetes configuration examples.

Build a Docker image

Before the image can be deployed to the cloud cluster, and image has to be built first. You cannot deploy the image by a Dockerfile.

Here I use the ubuntu image as our base image, you can have your custom image.

Build an image (ubuntu) from DockerHub registry and tag it as my-img

docker run ubuntu -t my-img

Deploy the image to a cloud cluster

  • Set account.
  • Get credential for the container service on cloud.
  • Create a cluster.
  • You have to tag the image with the registry format.
  • Push the image to container registry AWS (ecr) or GCP (gkc).

For ECR:


Push image to registry (i.e. online Docker image hosting site)

# aws
# Attach ECR access in AWS console at IAM policy 
# you can get the account id by
aws sts get-caller-identity --output text --query 'Account'
aws ecr create-repository --repository-name my-img

$(aws ecr get-login) # Yes, you need to type $()
docker tag my-img
docker push my-img

# gcp
docker tag my-img
gcloud docker -- push

Install Kubernets

# aws
export KUBERNETES_PROVIDER=aws; wget -q -O - | bash

# gcp
gcloud components install kubectl

Create cluster

aws ecs --create-cluster my-cluster

# gcp
gcloud container clusters create-cluster my-cluster
gcloud container clusters get-credentials my-cluster
gcloud container clusters list
gcloud config set container/cluster my-cluster # set as default cluster

Use Kubernetes to deploy pod/container on the cluster

kubectl run my-node --image=[image] --port=8080
kubectl get deployment my-node
#my-node    3         3         3            0           1s

Alternatively, you can deploy container on the cluster with a file

Create a file, and name as pod.yaml

apiVersion: v1
kind: Pod
  name: my-node
    - name: my-app
        - containerPort: 8080

Then deploy the pod from a file (similar to run a Docker container from a file)

kubectl -f pod.yaml

Export the deployment to the Internet (make a service)

kubectl expose deployment my-node
kubectl get service
kubectl get service my-node

Scale the number of replicas

kubectl scale deployment my-node --replicas=5
kubectl get deployment my-pod
#my-node    5         5         5            0           20m

Auto scaling

# number of pods between 2 to 10, target average CPU utilization at 80%
kubectl autoscale deployment [DEPLOYMENT] --min=2 --max=10 --cpu-percent=80


kubectl delete service my-node
kubectl delete deployment my-node

# aws
aws ecr delete-repository --repository-name my-cluster
aws ecr describe-repositories 
aws ecs delete-cluster my-cluster
aws ecs list-clusters

# gcp
gcloud container images delete [image]
gcloud container images list
gcloud container clusters delete my-cluser
gcloud container clusters list

Run Kubernetes locally

Normally, Kubernetes starts to work after you set up the cluster. However, you can test Kubernete locally (e.g. on you laptop).

You can install minikube and replace the kutectl commands by minikube.

From Docker-compose to Kubernetes

You may know Docker-compose, which allows you to configure multi-container services in single docker-compose yaml. However, its support for cluster monitoring and management is limited.

You could switch to Kubernetes, which separates the configuration into Pods, Services, etc. Fortunately, you can use install kompose to do all the conversion for you.

For example

# convert and run the docker-compose.yaml to Kubernetes unit.
kompose up

# Alternative, you can get the files into a folder
kompose convert 

Spark + Kubernetes

Consider checking the example provided by Kubernetes to know how to set up your Spark cluster by Kubernetes configuration.

git clone
cd kubernetes/examples/spark
# create namespace
kubectl create -f namespace-spark-cluster.yaml

# Launch Spark master service
kubectl create -f spark-master-controller.yaml # replication controller for pod
kubectl create -f spark-master-service.yaml # expose as a service

# Launch Spark slave (only communicate with master)
kubectl create -f spark-worker-controller.yaml 

# You may use zeppelin to submit Spark jobs
kubectl create -f examples/spark/zeppelin-controller.yaml

# Check whether Spark is working
kubectl get pods -l component=zeppelin
kubectl exec [zeppelin-xxxxx] -it pyspark

Other references