GKE Cluster configuration for deploying Mach5 Search

This document contains the GKE cluster configuration requirements for deploying Mach5 Search.

Kubernetes Version

Verified Kubernetes version for the GKE cluster to deploy Mach5 Search:

1.32.6 or above

Prerequisites

A GCS bucket that would be used for data and OTLP logs storage by Mach5 Search
A service account with the following permissions:
- Compute Instance Admin (v1)
- Service Account User
- Storage Object Admin

GKE cluster

Cluster configuration for Mach5:

Make sure to provision the cluster with Ephemeral Storage Local SSD (–ephemeral-storage-local-ssd) option. This approach puts the emptyDir volumes and images to the node ephemeral storage on Local SSD.
The following properties must be set for the cluster:
- Workload Identity: Enabled (–workload-pool “projectID.svc.id.goog“)
- Metadata: --metadata disable-legacy-endpoints=false
Enable cluster autoscaling for all node pools described in the section below. (–enable-autoscaling)
- Location policy - Any (–location-policy “ANY”)
- Size limits type - Total limits (–total-min-nodes “0” --total-max-nodes “n”)

Sample gcloud GKE cluster creation command for reference: (Change as per your needs)

gcloud beta container --project "projectID" clusters create "mach5-cluster" \
  --zone "us-central1-c" \
  --no-enable-basic-auth \
  --cluster-version "1.31.4-gke.1372000" \
  --release-channel "regular" \
  --machine-type "n2-standard-2" \
  --image-type "COS_CONTAINERD" \
  --disk-type "pd-balanced" \
  --disk-size "200" \
  --node-labels mach5-main-role=true \
  --scopes "https://www.googleapis.com/auth/cloud-platform" \
  --metadata disable-legacy-endpoints=false \
  --service-account "serviceAccount@projectID.iam.gserviceaccount.com" \
  --num-nodes "1" \
  --logging=SYSTEM,WORKLOAD \
  --monitoring=SYSTEM \
  --enable-ip-alias \
  --network "projects/projectID/global/networks/default" \
  --subnetwork "projects/projectID/regions/us-central1/subnetworks/default" \
  --no-enable-intra-node-visibility \
  --default-max-pods-per-node "110" \
  --enable-autoscaling \
  --total-min-nodes "1" \
  --total-max-nodes "5" \
  --location-policy "ANY" \
  --no-enable-master-authorized-networks \
  --addons HorizontalPodAutoscaling,HttpLoadBalancing,GcePersistentDiskCsiDriver \
  --enable-autoupgrade \
  --enable-autorepair \
  --max-surge-upgrade 1 \
  --max-unavailable-upgrade 0 \
  --workload-pool "mach5-dev.svc.id.goog" \
  --enable-shielded-nodes \
  --node-locations "us-central1-c"

GKE Node Pools

Mach5 Search uses node-pools in GKE for scalability, efficient resource utilization and better performance of different parts of the system. In order to do so, cluster autoscaling must be enabled for all the node pools.

Node Pool configuration in Mach5:

Node group name	Desired, Min Size	Max size	Instance Type	Labels	Tags
mach5-nodes	1,1	1	n2-standard-8	mach5-main-role = “true”	“k8s.io/cluster-autoscaler/cluster-name” = “owned”, “k8s.io/cluster-autoscaler/enabled” = “true”, “k8s.io/cluster-autoscaler/node-template/label/group” = “mach5-nodes“, “k8s.io/cluster-autoscaler/node-template/label/mach5-main-role” = “true”
mach5-fdb-nodes	1,1	5	n2-highcpu-2	mach5-fdb-role = “true”	“k8s.io/cluster-autoscaler/cluster-name” = “owned”, “k8s.io/cluster-autoscaler/enabled” = “true”, “k8s.io/cluster-autoscaler/node-template/label/group” = “mach5-fdb-nodes“, “k8s.io/cluster-autoscaler/node-template/label/mach5-fdb-role” = “true”
mach5-ingestor-nodes	0,0	10	n2d-standard-8	mach5-ingestor-role = “true”	“k8s.io/cluster-autoscaler/cluster-name” = “owned”, “k8s.io/cluster-autoscaler/enabled” = “true”, “k8s.io/cluster-autoscaler/node-template/label/group” = “mach5-ingestor-nodes“, “k8s.io/cluster-autoscaler/node-template/label/mach5-ingestor-role” = “true”
mach5-compactor-nodes	0,0	10	n2d-standard-8	mach5-compactor-role = “true”	“k8s.io/cluster-autoscaler/cluster-name” = “owned”, “k8s.io/cluster-autoscaler/enabled” = “true”, “k8s.io/cluster-autoscaler/node-template/label/group” = “mach5-compactor-nodes“, “k8s.io/cluster-autoscaler/node-template/label/mach5-compactor-role” = “true”
mach5-warehouse-nodes	0,0	10	n2-highmem-8	mach5-warehouse-worker-role = “true”	“k8s.io/cluster-autoscaler/cluster-name” = “owned”, “k8s.io/cluster-autoscaler/enabled” = “true”, “k8s.io/cluster-autoscaler/node-template/label/group” = “mach5-warehouse-nodes“, “k8s.io/cluster-autoscaler/node-template/label/mach5-warehouse-worker-role” = “true”
mach5-warehouse-head-nodes	0,0	10	e2-standard-8	mach5-warehouse-head-role = “true	“k8s.io/cluster-autoscaler/cluster-name” = “owned”, “k8s.io/cluster-autoscaler/enabled” = “true”, “k8s.io/cluster-autoscaler/node-template/label/group” = “mach5-warehouse-head-nodes“, “k8s.io/cluster-autoscaler/node-template/label/mach5-warehouse-head-role” = “true”

Notes:

Make sure to propagate all the node group tags to the corresponding node group Auto scaling group too.

Mach5 Search Helm charts

Following helm charts need to be installed in the GKE cluster for deploying Mach5 Search:

Name	Repository	Version
Mach5 Search	https://us-central1-docker.pkg.dev/mach5-dev/mach5-docker-registry/mach5-search	Chart version: $version Contact Mach5 Search administrator for the access key and the version of Mach5 to be used
Cluster Autoscaler	https://kubernetes.github.io/autoscaler/cluster-autoscaler	latest
Mach5 Cache Proxy	https://us-central1-docker.pkg.dev/mach5-dev/mach5-docker-registry/mach5-cache-proxy	chart version: 1.13.1

Helm chart installation instructions can be found at: Mach5 Helm Charts

Once Mach5 Search charts are installed:
(Replace <serviceAccount>, <projectID> and <namespace> values in the commands below)

Relate the Google Service Account (GSA) to the Kubernetes Service Account (KSA) that mach5 creates (mach5-sa) in the deployed namespace:

gcloud iam service-accounts add-iam-policy-binding 
<serviceAccount>@<projectID>.iam.gserviceaccount.com \
        --role roles/iam.workloadIdentityUser \
        --member "serviceAccount:<projectID>.svc.id.goog[<namespace>/mach5-sa]"

Complete the link between KSA and GSA by annotating the Kubernetes Service account:

kubectl annotate serviceaccount mach5-sa \
    --namespace <namespace> \
  iam.gke.io/gcp-service-account=<serviceAccount>@<projectID>.iam.gserviceaccount.com

Troubleshooting Cluster Autoscaler scaledown issues

At times, GKE’s built-in cluster autoscaler may fail to scale down nodes because certain system pods are running on them.

In such cases, it’s necessary to identify these pods and define a Pod Disruption Budget (PDB) for them.

For example, if a node isn’t scaling down due to a kube-dns pod running on it, you can apply the following Pod Disruption Budget using:

kubectl apply -f kube-dns-pod-disruption-budget.yaml -n kube-system

Here’s the content of kube-dns-pod-disruption-budget.yaml:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
    name: kube-dns-pdb
spec:
    minAvailable: 1
    selector:
      matchLabels:
        k8s-app: kube-dns

Getting Started

Deployment & Setup

Ingesting Data

Querying Data

Advanced Features

Dashboards & Visualization

Integrations

API Reference(Mach5)

Support