Documentation

GKE Cluster configuration for deploying Mach5 Search

This document contains the GKE cluster configuration requirements for deploying Mach5 Search.

Kubernetes Version

Verified Kubernetes version for the AKS cluster to deploy Mach5 Search:

  • 1.32.6 or above

Prerequisites

  • A GCS bucket that would be used for data and OTLP logs storage by Mach5 Search
  • A service account with the following permissions:
    • Compute Instance Admin (v1)
    • Service Account User
    • Storage Object Admin

GKE cluster

Cluster configuration for Mach5:

  • Make sure to provision the cluster with Ephemeral Storage Local SSD (–ephemeral-storage-local-ssd) option. This approach puts the emptyDir volumes and images to the node ephemeral storage on Local SSD.
  • The following properties must be set for the cluster:
    • Workload Identity: Enabled (–workload-pool “projectID.svc.id.goog“)
    • Metadata: --metadata disable-legacy-endpoints=false
  • Enable cluster autoscaling for all node pools described in the section below. (–enable-autoscaling)
    • Location policy - Any (–location-policy “ANY”)
    • Size limits type - Total limits (–total-min-nodes “0” --total-max-nodes “n”)

Sample gcloud GKE cluster creation command for reference: (Change as per your needs)

gcloud beta container --project "projectID" clusters create "mach5-cluster" \
  --zone "us-central1-c" \
  --no-enable-basic-auth \
  --cluster-version "1.31.4-gke.1372000" \
  --release-channel "regular" \
  --machine-type "n2-standard-2" \
  --image-type "COS_CONTAINERD" \
  --disk-type "pd-balanced" \
  --disk-size "200" \
  --node-labels mach5-main-role=true \
  --scopes "https://www.googleapis.com/auth/cloud-platform" \
  --metadata disable-legacy-endpoints=false \
  --service-account "serviceAccount@projectID.iam.gserviceaccount.com" \
  --num-nodes "1" \
  --logging=SYSTEM,WORKLOAD \
  --monitoring=SYSTEM \
  --enable-ip-alias \
  --network "projects/projectID/global/networks/default" \
  --subnetwork "projects/projectID/regions/us-central1/subnetworks/default" \
  --no-enable-intra-node-visibility \
  --default-max-pods-per-node "110" \
  --enable-autoscaling \
  --total-min-nodes "1" \
  --total-max-nodes "5" \
  --location-policy "ANY" \
  --no-enable-master-authorized-networks \
  --addons HorizontalPodAutoscaling,HttpLoadBalancing,GcePersistentDiskCsiDriver \
  --enable-autoupgrade \
  --enable-autorepair \
  --max-surge-upgrade 1 \
  --max-unavailable-upgrade 0 \
  --workload-pool "mach5-dev.svc.id.goog" \
  --enable-shielded-nodes \
  --node-locations "us-central1-c"

GKE Node Pools

Mach5 Search uses node-pools in GKE for scalability, efficient resource utilization and better performance of different parts of the system. In order to do so, cluster autoscaling must be enabled for all the node pools.

Node Pool configuration in Mach5:

Node group nameDesired, Min SizeMax sizeInstance TypeLabelsTags
mach5-nodes1,11n2-standard-8mach5-main-role = “true”“k8s.io/cluster-autoscaler/cluster-name” = “owned”, “k8s.io/cluster-autoscaler/enabled” = “true”, “k8s.io/cluster-autoscaler/node-template/label/group” = “mach5-nodes“, “k8s.io/cluster-autoscaler/node-template/label/mach5-main-role” = “true”
mach5-ccs-nodes1,11n2-highcpu-2mach5-ccs-role = “true”“k8s.io/cluster-autoscaler/cluster-name” = “owned”, “k8s.io/cluster-autoscaler/enabled” = “true”, “k8s.io/cluster-autoscaler/node-template/label/group” = “mach5-ccs-nodes“, “k8s.io/cluster-autoscaler/node-template/label/mach5-ccs-role” = “true”
mach5-ingestor-nodes0,010n2d-standard-8mach5-ingestor-role = “true”“k8s.io/cluster-autoscaler/cluster-name” = “owned”, “k8s.io/cluster-autoscaler/enabled” = “true”, “k8s.io/cluster-autoscaler/node-template/label/group” = “mach5-ingestor-nodes“, “k8s.io/cluster-autoscaler/node-template/label/mach5-ingestor-role” = “true”
mach5-compactor-nodes0,010n2d-standard-8mach5-compactor-role = “true”“k8s.io/cluster-autoscaler/cluster-name” = “owned”, “k8s.io/cluster-autoscaler/enabled” = “true”, “k8s.io/cluster-autoscaler/node-template/label/group” = “mach5-compactor-nodes“, “k8s.io/cluster-autoscaler/node-template/label/mach5-compactor-role” = “true”
mach5-warehouse-nodes0,010n2-highmem-8mach5-warehouse-worker-role = “true”“k8s.io/cluster-autoscaler/cluster-name” = “owned”, “k8s.io/cluster-autoscaler/enabled” = “true”, “k8s.io/cluster-autoscaler/node-template/label/group” = “mach5-warehouse-nodes“, “k8s.io/cluster-autoscaler/node-template/label/mach5-warehouse-worker-role” = “true”
mach5-warehouse-head-nodes0,010e2-standard-8mach5-warehouse-head-role = “true“k8s.io/cluster-autoscaler/cluster-name” = “owned”, “k8s.io/cluster-autoscaler/enabled” = “true”, “k8s.io/cluster-autoscaler/node-template/label/group” = “mach5-warehouse-head-nodes“, “k8s.io/cluster-autoscaler/node-template/label/mach5-warehouse-head-role” = “true”

Notes:

  • Make sure to propagate all the node group tags to the corresponding node group Auto scaling group too.

Mach5 Search Helm charts

Following helm charts need to be installed in the EKS cluster for deploying Mach5 Search:

NameRepositoryVersion
Mach5 Searchhttps://us-central1-docker.pkg.dev/mach5-dev/mach5-docker-registry/mach5-searchChart version: 5.2.0-snapshot-9d22d05 Contact Mach5 Search administrator for the access key.
Cluster Autoscalerhttps://kubernetes.github.io/autoscaler/cluster-autoscalerlatest
Mach5 Cache Proxyhttps://us-central1-docker.pkg.dev/mach5-dev/mach5-docker-registry/mach5-cache-proxychart version: 1.13.1

Once Mach5 Search charts are installed:
(Replace <serviceAccount>, <projectID> and <namespace> values in the commands below)

  • Relate the Google Service Account (GSA) to the Kubernetes Service Account (KSA) that mach5 creates (mach5-sa) in the deployed namespace:
gcloud iam service-accounts add-iam-policy-binding 
<serviceAccount>@<projectID>.iam.gserviceaccount.com \
        --role roles/iam.workloadIdentityUser \
        --member "serviceAccount:<projectID>.svc.id.goog[<namespace>/mach5-sa]"
  • Complete the link between KSA and GSA by annotating the Kubernetes Service account:
kubectl annotate serviceaccount mach5-sa \
    --namespace <namespace> \
  iam.gke.io/gcp-service-account=<serviceAccount>@<projectID>.iam.gserviceaccount.com

Troubleshooting Cluster Autoscaler scaledown issues

At times, GKE’s built-in cluster autoscaler may fail to scale down nodes because certain system pods are running on them.

In such cases, it’s necessary to identify these pods and define a Pod Disruption Budget (PDB) for them.

For example, if a node isn’t scaling down due to a kube-dns pod running on it, you can apply the following Pod Disruption Budget using:

kubectl apply -f kube-dns-pod-disruption-budget.yaml -n kube-system

Here’s the content of kube-dns-pod-disruption-budget.yaml:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
    name: kube-dns-pdb
spec:
    minAvailable: 1
    selector:
      matchLabels:
        k8s-app: kube-dns