Documentation

EKS Cluster configuration for deploying Mach5 Search

This document contains the EKS cluster configuration requirements for deploying Mach5 Search.

Kubernetes Version

Verified Kubernetes version for the EKS cluster to deploy Mach5 Search:

  • 1.32
  • 1.31
  • 1.29
  • 1.28

Amazon EBS CSI driver

Mach5 Search needs the Amazon EBS CSI add-on added to your cluster. It manages the lifecycle of Amazon EBS volumes as storage for the Kubernetes Volumes that we create.

EKS Node Groups

Mach5 Search uses node-groups in EKS for scalability, efficient resource utilization and better performance of different parts of the system.

Managed node group configuration in Mach5:

Node group nameDesired, Min SizeMax sizeInstance TypeLabelsPre bootstrap stepsTags
mach5-nodes1,11m6a.2xlargemach5-main-role = “true”pre_bootstrap_user_data = <<-EOT setup-local-disks raid0 EOT“k8s.io/cluster-autoscaler/cluster-name” = “owned”, “k8s.io/cluster-autoscaler/enabled” = “true”, “k8s.io/cluster-autoscaler/node-template/label/group” = “mach5-nodes“, “k8s.io/cluster-autoscaler/node-template/label/mach5-main-role” = “true”
mach5-ccs-nodes1,11c5ad.largemach5-ccs-role = “true”pre_bootstrap_user_data = <<-EOT setup-local-disks raid0 EOT“k8s.io/cluster-autoscaler/cluster-name” = “owned”, “k8s.io/cluster-autoscaler/enabled” = “true”, “k8s.io/cluster-autoscaler/node-template/label/group” = “mach5-ccs-nodes“, “k8s.io/cluster-autoscaler/node-template/label/mach5-ccs-role” = “true”
mach5-ingestor-nodes0,010m6id.2xlargemach5-ingestor-role = “true”pre_bootstrap_user_data = <<-EOT setup-local-disks raid0 EOT“k8s.io/cluster-autoscaler/cluster-name” = “owned”, “k8s.io/cluster-autoscaler/enabled” = “true”, “k8s.io/cluster-autoscaler/node-template/label/group” = “mach5-ingestor-nodes“, “k8s.io/cluster-autoscaler/node-template/label/mach5-ingestor-role” = “true”
mach5-compactor-nodes0,010m6id.2xlargemach5-compactor-role = “true”pre_bootstrap_user_data = <<-EOT setup-local-disks raid0 EOT“k8s.io/cluster-autoscaler/cluster-name” = “owned”, “k8s.io/cluster-autoscaler/enabled” = “true”, “k8s.io/cluster-autoscaler/node-template/label/group” = “mach5-compactor-nodes“, “k8s.io/cluster-autoscaler/node-template/label/mach5-compactor-role” = “true”
mach5-warehouse-nodes0,010i4i.2xlargemach5-warehouse-worker-role = “true”pre_bootstrap_user_data = <<-EOT setup-local-disks raid0 EOT“k8s.io/cluster-autoscaler/cluster-name” = “owned”, “k8s.io/cluster-autoscaler/enabled” = “true”, “k8s.io/cluster-autoscaler/node-template/label/group” = “mach5-warehouse-nodes“, “k8s.io/cluster-autoscaler/node-template/label/mach5-warehouse-worker-role” = “true”
mach5-warehouse-head-nodes0,010t3a.2xlargemach5-warehouse-head-role = “trueNA“k8s.io/cluster-autoscaler/cluster-name” = “owned”, “k8s.io/cluster-autoscaler/enabled” = “true”, “k8s.io/cluster-autoscaler/node-template/label/group” = “mach5-warehouse-head-nodes“, “k8s.io/cluster-autoscaler/node-template/label/mach5-warehouse-head-role” = “true”

Important notes:

  • Pre-bootstrap command is needed in the above node group configurations to be able to configure local instance SSD as root volume.
  • Make sure to propagate all the node group tags to the corresponding node group Auto scaling group too.

Log rotation in the nodes

To enable log rotation for Kubernetes pods, you can configure the following post_bootstrap_user_data script for each of the node groups mentioned above:

#!/usr/bin/env bash
SRC_CONF="/etc/systemd/system/kubelet.service.d/30-kubelet-extra-args.conf"
DST_CONF="/etc/systemd/system/kubelet.service.d/40-kubelet-extra-args.conf"
if [ -f "$SRC_CONF" ]; then
  content=$(cat "$SRC_CONF")
  modified_content=$(echo "$content" | sed "s/'$/ 
--container-log-max-size=${var.log_max_size} 
--container-log-max-files=${var.log_max_files}'/")
  echo "$modified_content" | tee "$DST_CONF"
else
  echo '[Service]
Environment="KUBELET_EXTRA_ARGS=--container-log-max-size=${var.log_max_size} 
--container-log-max-files=${var.log_max_files}"
    ' | tee $DST_CONF
fi
systemctl daemon-reexec
systemctl daemon-reload
systemctl restart kubelet

Node group IAM role

Following policies must be attached to the IAM role to be attached to each node group:

  • arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
  • arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
  • arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
  • arn:aws:iam::aws:policy/AmazonEC2FullAccess
  • arn:aws:iam::aws:policy/AWSMarketplaceMeteringFullAccess
  • arn:aws:iam::aws:policy/AWSMarketplaceMeteringRegisterUsage
  • arn:aws:iam::aws:policy/AmazonS3FullAccess
    • Alternatively, attach a custom policy for limited S3 permissions
    • Terraform code snippet to create the custom policy:
data "aws_iam_policy_document" "mach5_vm_inline_policy_document" {
    statement {
        actions = [
        "s3:GetBucketAcl",
        "s3:GetBucketEncryption",
        "s3:GetBucketLocation",
        "s3:GetBucketPolicy",
        "s3:GetBucketPolicyStatus",
        "s3:GetBucketPublicAccessBlock",
        "s3:GetBucketVersioning",
        "s3:ListBucket"
]
effect = "Allow"
resources = [
    "arn:aws:s3:::${var.mach5_search_s3_bucket}"
]
}

    statement {
      actions = [
"s3:*"
]
    effect = "Allow"
      resources = [
      "arn:aws:s3:::${var.mach5_search_s3_bucket}/*"
    ]
}
}

S3 Bucket

Mach5 Search needs:

  • An S3 bucket in the same AWS region as the EKS cluster
  • This bucket would be used for data and OTLP logs storage by Mach5 Search
  • Add a lifecycle rule to all the objects in the bucket to delete incomplete multipart uploads into the bucket.
    Terraform snippet to add the lifecycle rule to the bucket:
resource "aws_s3_bucket_lifecycle_configuration" "abort_incomplete_multipart_upload" {
  bucket = aws_s3_bucket.mach5_s3_bucket.id
  rule {
    id     = "abort-incomplete-multipart-upload"
    status = "Enabled"
    abort_incomplete_multipart_upload {
      days_after_initiation = 1
    }
  }

depends_on = [ aws_s3_bucket.mach5_s3_bucket ]
}
  • To reduce NAT gateway costs, make sure to have an S3 VPC endpoint configured