Blog

Mach5: A Modern Integrated Search and Analytics platform

Dec 16, 2024

10 min read

Share it on

Introduction Cost Drivers of Infrastructure Traditional Infrastructure: Expensive, Rigid, Complex Mach5 Approach: Cost-effective, Adaptive, Converged Operational Cost Breakdown Conclusion

Need Help ?

Our team of experts is ready to assist you with your integration.

Training Sessions

Get your team up to speed with personalized training.

Contact Sales

Introduction

Mach5 revolutionizes search infrastructure with its cutting-edge architecture, establishing a new benchmark in performance and efficiency compared to conventional solutions like OpenSearch and Elasticsearch.

Mach5 introduces an innovative disaggregated storage architecture, which sets it apart from traditional cluster-based implementations with tightly coupled storage and compute. This architectural advancement delivers three transformative benefits:

Significant infrastructure cost reduction

Adaptability to varying workloads through dynamic horizontal scalability, and

Converged architecture allowing seamless compatibility with various standard APIs

In this blog, we will examine how Mach5 delivers substantial cost benefits while ensuring better performance compared to OpenSearch and Elasticsearch.

Cost Drivers of Infrastructure:

Object stores have become the de facto solution for businesses and individuals seeking to park large volumes of data. They offer:

Infrastructure Provisioning : Effective capacity planning and optimal resource allocation

Data Replication : Costs associated with data replication for high availability and durability, and data loss prevention.

Compute Resources : Requirements for processing power and memory to sustain operations.

In the following sections, we will examine how each of these cost drivers impacts the performance and efficiency of a tightly coupled Elasticsearch/OpenSearch architecture compared to Mach5's disaggregated architecture.

Traditional Infrastructure: Expensive, Rigid, Complex:

Traditional architectures leverage a distributed cluster model, where data is partitioned into shards and distributed across multiple nodes. Each node manages both the storage and processing of its assigned data. However, due to the tight coupling of compute and storage, scaling one resource necessitates scaling the other—even when only one component requires additional capacity.

This architectural rigidity introduces significant challenges for modern workloads, especially when demand patterns are unpredictable or highly variable. To understand the implications of this design, let’s examine its impact on the key cost drivers:

1. Infrastructure Provisioning:

The architecture requires nodes to be allocated based on anticipated peak usage, creating two significant challenges:

Over - provisioning : results in significant resource underutilization during normal operations, leading to unnecessarily high infrastructure costs.

Under - provisioning : leads to service degradation or outages during peak loads when clusters cannot handle traffic spikes. Users experience request timeouts, increased latency, or HTTP 429 (Too Many Requests) errors during high-demand periods.

This creates a major operational and financial challenge in environments with variable workloads, particularly when there's a substantial gap between average and peak usage. Organizations must balance resource allocation to maintain optimal performance while controlling costs. The result is often complex capacity planning and deployment strategies that lead to either costly over-provisioning or risky under-provisioning—both of which compromise service quality.

2. Data Replication:

Traditional node-coupled storage architectures require data replication across multiple nodes to maintain high availability and ensure robust fault tolerance. This architectural requirement multiplies storage and compute costs by a factor of 2-3x, depending on the replication configuration. The cascading effect of this design choice significantly amplifies the total storage footprint and associated costs, particularly as data volumes grow over time. Large-scale deployments are particularly affected, as storage costs often constitute a significant portion of the total infrastructure budget. This compounding cost factor becomes an ever-present burden for organizations attempting to scale.

3. Compute Resources:

The architecture enforces a fixed ratio between compute and storage resources, creating a rigid coupling that necessitates scaling the entire node, including storage capacity, when only additional compute power is required. This inflexible architectural constraint severely limits the ability to optimize resource allocation independently, resulting in inefficient resource utilization patterns and increased operational costs. The inability to scale compute resources separately from storage creates scenarios where organizations must over-provision entire nodes just to meet specific computational demands, resulting in wasted resources and increased expenses.

The Mach5 Approach: Cost-effective, Adaptive, Converged:

Is it possible to design a system that addresses the above challenges encountered with traditional search and analytics platforms?

YES!

Mach5 answers this question with an innovative, disaggregated architecture powered by Kubernetes. While Mach5 leverages Kubernetes for autoscaling, its core advantages stem from two key architectural elements working in tandem:

A disaggregated architecture that completely separates compute and storage resources
A specialized storage and indexing system that:
- Eliminates replication overhead
- Uses cloud object storage instead of node-attached storage
- Enables true independent scaling of compute resources for specific workloads

This comprehensive architectural approach enables significant cost reductions—21x in total infrastructure costs and 30-45x lower storage costs per GB—which would be impossible to achieve by simply adding Kubernetes to a traditional search infrastructure. You can learn more about our architecture here

To better understand how this architecture transforms infrastructure management, let’s evaluate its impact on the same cost drivers:

1. Infrastructure Provisioning:

Unlike traditional systems that rely on static peak estimates, Mach5 uses Kubernetes’ native autoscaling capabilities to orchestrate resources dynamically. The system continuously monitors usage patterns in real-time, automatically adjusting resources to match demand. This adaptive approach ensures sustained, high-performance operations while reducing infrastructure costs by at least a factor of 10. By allocating and deallocating resources in response to actual usage, the platform eliminates the need to maintain excess capacity during off-peak hours, delivering optimal efficiency at all times. This dynamic scalability ensures that organizations no longer need to compromise between cost and performance, delivering unmatched flexibility.

2. Data Replication:

Mach5 leverages cloud object storage such as Amazon S3, Google Cloud Storage (GCS), Azure Blob Storage, and MinIO instead of conventional node-attached storage. By adopting this cloud-first architecture, the platform eliminates the costly replication overhead typically seen in traditional search solutions like OpenSearch and Elasticsearch. At the same time, it ensures robust data durability through the advanced redundancy mechanisms provided by these storage systems. This streamlined approach results in an extraordinary 30-45 X reduction in storage costs per gigabyte or raw data, offering unparalleled value and cost efficiency for organizations of all sizes.

3. Compute Resources:

The disaggregated architecture enhances resource management by enabling compute and storage to scale independently based on demand. This allows compute resources to be precisely matched to specific workloads, such as data ingestion or querying, without the need for over-provisioning. By isolating workloads and optimizing compute power for each task, the system ensures that resources are used efficiently and only when needed. This level of granularity not only improves performance but also drives significant cost savings, as organizations only pay for the compute capacity they actually use, eliminating the need for costly, underutilized infrastructure.

Operational Cost Breakdown:

To understand the impact of these architectural choices, let’s compare the infrastructure costs of Mach5 and OpenSearch in a large-scale deployment scenario.

Storage Costs Comparison

Here’s a breakdown of the average cost to store per GB, taking into account various replication scenarios:

	OpenSearch	Mach5
Space used	Raw data size * (1 + number of replicas) * 1.45	Raw data size / 3
Cost / GB - month	$0.08	$0.023
Total cost / GB(pri + 1 rep)	$0.232	$0.0076
Total cost / GB(pri + 2 rep)	$0.348	$0.0076

The table reveals stark differences in storage efficiency and cost between OpenSearch deployments and Mach5's architecture:

OpenSearch requires 2–3x more storage due to replication requirements and has a higher base cost per GB.

Mach5 achieves significant savings through efficient compression and elimination of needless replication.

Unlike OpenSearch's costs, which increase linearly with each replica, Mach5's cost per GB stays constant regardless of replication.

Real-World Deployment Cost Comparison:

Let’s take a closer look at the costs involved in a real-world deployment of OpenSearch vs Mach5:

Amount of Raw data under Management: 240 TB

Number of queries per day: 1 million

	OpenSearch(Primary + 1 Replica)	Mach5
Stored data size	696 TB	80 TB
Machine type	i3.8xlarge.search ($3.994 / hour)	Search : i4i.8xlarge ($2.746 / hr) Ingestion : m6id.2xlarge ($0.4746 / hr)
# of machines	~ 100	4 + 2 (ingestion)
Cost / hour (on-demand)	$399.4	$11.9332
Storage cost	Included	$2.616 / hour (S3 storage)
Mach5 Licensing cost		$36,864
Total cost / year	$3,498,744	$127,451 + $36,864 = $164,315

Key Insights:

Storage Cost Comparison :Mach5’s cloud-based storage architecture is 30 - 45x cheaper (depending on replication factor) per GB compared to OpenSearch due to efficient compression and the elimination of replication overhead.

Total Infrastructure Cost Comparison : In this scenario, Mach5’s total annual cost is $164,315, compared to OpenSearch $3,498,744, resulting in a 21x reductionThis significant saving comes from better resource utilization, fewer required machines, and optimized storage.

Conclusion

The analysis confirms that Mach5 offers exceptional cost efficiency compared to traditional search infrastructures like OpenSearch and Elasticsearch. With its disaggregated storage architecture and dynamic resource scaling, Mach5 delivers:

A 21x reductionin total infrastructure costs for large-scale deployments.

30 - 45x lowerstorage costs per GB

These benefits stem from Mach5’s fundamental architectural advantages, including:

Elimination of replication overhead

Independent scaling of compute and storage resources

Real-time resource provisioning based on actual demand

For organizations seeking to optimize search infrastructure costs while maintaining high performance, Mach5 provides a transformative solution that redefines cost efficiency at scale.

References:

OpenSearch sizing: AWS OpenSearch Sizing Guide

April 6, 2025Case Study

How Mach5 Search helps Permiso.io streamline security analytics at scale

By Tanisha S Kataria

Jan 30, 2025Blog

Key Issues in Building a Low-Latency Search Engine on Object Storage

By Vinayak Borkar

Oct 31, 2022Blog

Why Mach5 Search?

By Zachary Heilbron

Ready to see an auto-scaling search
and analytics platform that saves costs?

Schedule a demo

Mach5: A Modern Integrated Search and Analytics platform

TABLE OF CONTENTS

Need Help ?

Training Sessions

Introduction

Cost Drivers of Infrastructure:

Traditional Infrastructure: Expensive, Rigid, Complex:

1. Infrastructure Provisioning:

2. Data Replication:

3. Compute Resources:

The Mach5 Approach: Cost-effective, Adaptive, Converged:

1. Infrastructure Provisioning:

2. Data Replication:

3. Compute Resources:

Operational Cost Breakdown:

Storage Costs Comparison

Real-World Deployment Cost Comparison:

Key Insights:

Conclusion

References:

How Mach5 Search helps Permiso.io streamline security analytics at scale

Key Issues in Building a Low-Latency Search Engine on Object Storage

Why Mach5 Search?

Ready to see an auto-scaling searchand analytics platform that saves costs?

Ready to see an auto-scaling search
and analytics platform that saves costs?