GKE Cluster configuration for deploying Mach5 Search
This document contains the GKE cluster configuration requirements for deploying Mach5 Search.
Kubernetes Version
Verified Kubernetes version for the AKS cluster to deploy Mach5 Search:
- 1.32.6 or above
Prerequisites
- A GCS bucket that would be used for data and OTLP logs storage by Mach5 Search
- A service account with the following permissions:
- Compute Instance Admin (v1)
- Service Account User
- Storage Object Admin
GKE cluster
Cluster configuration for Mach5:
- Make sure to provision the cluster with Ephemeral Storage Local SSD (–ephemeral-storage-local-ssd) option. This approach puts the emptyDir volumes and images to the node ephemeral storage on Local SSD.
- The following properties must be set for the cluster:
- Workload Identity: Enabled (–workload-pool “projectID.svc.id.goog“)
- Metadata: --metadata disable-legacy-endpoints=false
- Enable cluster autoscaling for all node pools described in the section below. (–enable-autoscaling)
- Location policy - Any (–location-policy “ANY”)
- Size limits type - Total limits (–total-min-nodes “0” --total-max-nodes “n”)
Sample gcloud GKE cluster creation command for reference: (Change as per your needs)
gcloud beta container --project "projectID" clusters create "mach5-cluster" \
--zone "us-central1-c" \
--no-enable-basic-auth \
--cluster-version "1.31.4-gke.1372000" \
--release-channel "regular" \
--machine-type "n2-standard-2" \
--image-type "COS_CONTAINERD" \
--disk-type "pd-balanced" \
--disk-size "200" \
--node-labels mach5-main-role=true \
--scopes "https://www.googleapis.com/auth/cloud-platform" \
--metadata disable-legacy-endpoints=false \
--service-account "serviceAccount@projectID.iam.gserviceaccount.com" \
--num-nodes "1" \
--logging=SYSTEM,WORKLOAD \
--monitoring=SYSTEM \
--enable-ip-alias \
--network "projects/projectID/global/networks/default" \
--subnetwork "projects/projectID/regions/us-central1/subnetworks/default" \
--no-enable-intra-node-visibility \
--default-max-pods-per-node "110" \
--enable-autoscaling \
--total-min-nodes "1" \
--total-max-nodes "5" \
--location-policy "ANY" \
--no-enable-master-authorized-networks \
--addons HorizontalPodAutoscaling,HttpLoadBalancing,GcePersistentDiskCsiDriver \
--enable-autoupgrade \
--enable-autorepair \
--max-surge-upgrade 1 \
--max-unavailable-upgrade 0 \
--workload-pool "mach5-dev.svc.id.goog" \
--enable-shielded-nodes \
--node-locations "us-central1-c"
GKE Node Pools
Mach5 Search uses node-pools in GKE for scalability, efficient resource utilization and better performance of different parts of the system. In order to do so, cluster autoscaling must be enabled for all the node pools.
Node Pool configuration in Mach5:
Node group name | Desired, Min Size | Max size | Instance Type | Labels | Tags |
---|---|---|---|---|---|
mach5-nodes | 1,1 | 1 | n2-standard-8 | mach5-main-role = “true” | “k8s.io/cluster-autoscaler/cluster-name” = “owned”, “k8s.io/cluster-autoscaler/enabled” = “true”, “k8s.io/cluster-autoscaler/node-template/label/group” = “mach5-nodes“, “k8s.io/cluster-autoscaler/node-template/label/mach5-main-role” = “true” |
mach5-ccs-nodes | 1,1 | 1 | n2-highcpu-2 | mach5-ccs-role = “true” | “k8s.io/cluster-autoscaler/cluster-name” = “owned”, “k8s.io/cluster-autoscaler/enabled” = “true”, “k8s.io/cluster-autoscaler/node-template/label/group” = “mach5-ccs-nodes“, “k8s.io/cluster-autoscaler/node-template/label/mach5-ccs-role” = “true” |
mach5-ingestor-nodes | 0,0 | 10 | n2d-standard-8 | mach5-ingestor-role = “true” | “k8s.io/cluster-autoscaler/cluster-name” = “owned”, “k8s.io/cluster-autoscaler/enabled” = “true”, “k8s.io/cluster-autoscaler/node-template/label/group” = “mach5-ingestor-nodes“, “k8s.io/cluster-autoscaler/node-template/label/mach5-ingestor-role” = “true” |
mach5-compactor-nodes | 0,0 | 10 | n2d-standard-8 | mach5-compactor-role = “true” | “k8s.io/cluster-autoscaler/cluster-name” = “owned”, “k8s.io/cluster-autoscaler/enabled” = “true”, “k8s.io/cluster-autoscaler/node-template/label/group” = “mach5-compactor-nodes“, “k8s.io/cluster-autoscaler/node-template/label/mach5-compactor-role” = “true” |
mach5-warehouse-nodes | 0,0 | 10 | n2-highmem-8 | mach5-warehouse-worker-role = “true” | “k8s.io/cluster-autoscaler/cluster-name” = “owned”, “k8s.io/cluster-autoscaler/enabled” = “true”, “k8s.io/cluster-autoscaler/node-template/label/group” = “mach5-warehouse-nodes“, “k8s.io/cluster-autoscaler/node-template/label/mach5-warehouse-worker-role” = “true” |
mach5-warehouse-head-nodes | 0,0 | 10 | e2-standard-8 | mach5-warehouse-head-role = “true | “k8s.io/cluster-autoscaler/cluster-name” = “owned”, “k8s.io/cluster-autoscaler/enabled” = “true”, “k8s.io/cluster-autoscaler/node-template/label/group” = “mach5-warehouse-head-nodes“, “k8s.io/cluster-autoscaler/node-template/label/mach5-warehouse-head-role” = “true” |
Notes:
- Make sure to propagate all the node group tags to the corresponding node group Auto scaling group too.
Mach5 Search Helm charts
Following helm charts need to be installed in the EKS cluster for deploying Mach5 Search:
Name | Repository | Version |
---|---|---|
Mach5 Search | https://us-central1-docker.pkg.dev/mach5-dev/mach5-docker-registry/mach5-search | Chart version: 5.2.0-snapshot-9d22d05 Contact Mach5 Search administrator for the access key. |
Cluster Autoscaler | https://kubernetes.github.io/autoscaler/cluster-autoscaler | latest |
Mach5 Cache Proxy | https://us-central1-docker.pkg.dev/mach5-dev/mach5-docker-registry/mach5-cache-proxy | chart version: 1.13.1 |
Once Mach5 Search charts are installed:
(Replace <serviceAccount>, <projectID> and <namespace> values in the commands below)
- Relate the Google Service Account (GSA) to the Kubernetes Service Account (KSA) that mach5 creates (mach5-sa) in the deployed namespace:
gcloud iam service-accounts add-iam-policy-binding
<serviceAccount>@<projectID>.iam.gserviceaccount.com \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:<projectID>.svc.id.goog[<namespace>/mach5-sa]"
- Complete the link between KSA and GSA by annotating the Kubernetes Service account:
kubectl annotate serviceaccount mach5-sa \
--namespace <namespace> \
iam.gke.io/gcp-service-account=<serviceAccount>@<projectID>.iam.gserviceaccount.com
Troubleshooting Cluster Autoscaler scaledown issues
At times, GKE’s built-in cluster autoscaler may fail to scale down nodes because certain system pods are running on them.
In such cases, it’s necessary to identify these pods and define a Pod Disruption Budget (PDB) for them.
For example, if a node isn’t scaling down due to a kube-dns pod running on it, you can apply the following Pod Disruption Budget using:
kubectl apply -f kube-dns-pod-disruption-budget.yaml -n kube-system
Here’s the content of kube-dns-pod-disruption-budget.yaml:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: kube-dns-pdb
spec:
minAvailable: 1
selector:
matchLabels:
k8s-app: kube-dns