Company
Contact Us

Why Mach5 Search?

10/31/2022
Zachary Heilbron

OpenSearch and ElasticSearch have become de facto standards for search-based applications. Their simple API makes it easy to get started and to use as a base to build complex, interactive applications. This is witnessed by the mass adoption of the ELK stack by businesses in all sectors, with uses ranging from Observability applications processing logs, traces, and metrics to SIEMs and to Security Analytics solutions.

Data volumes have been growing exponentially year-over-year due to a growth in users, devices, micro-services, applications, and IT and infrastructure environments. Historical data has shown incredible value, especially when used in conjunction with AI and ML. The growing rate of new data and storing that data for longer timeframes presents new challenges for businesses using the ELK stack in the form of increased costs and complex operation and management. Often, businesses end up sacrificing insights and value as they are forced to reduce logging and search capabilities due to high costs. For some use-cases like forensic investigations, incident response, or even regulatory compliance--businesses may have no choice but to increase their spend in order to accommodate the increased data volume.

Data volumes have been growing exponentially year-over-year due to a growth in users, devices, micro-services, applications, and IT and infrastructure environments.

A common workaround that we see businesses adopt to combat these challenges is to maintain separate data tiers. For example, hot data might be served by the ELK stack, while cold or archival data is moved off to another system or moved to a separate indexing tier using Index State/Lifecycle Management (IS/LM) and/or Snapshotting. Both approaches to data tiering introduce enormous complexity and/or cost. By moving archival data to a separate system, an entirely separate data system must be built/purchased, operated, and maintained in addition to developing, testing, and maintaining an entirely separate code path. Meanwhile, ILM and Snapshotting require configuring a complex data pipeline to orchestrate when and how to move data between tiers.

The Problem with Legacy Architectures

We have spent a decade building and using technologies managing massive amounts of security and operational data, and witnessed first-hand the downsides of a legacy architecture (like the ELK stack) that uses aggregated storage and a fixed set of resources. The architecture forces storage and compute resources to be provisioned upfront, which requires the non-trivial task of forecasting workloads ahead of time. Demand on a system is rarely constant, varying due to natural phenomena like world events or the hours of a daily work schedule. Aggregated storage architectures operate in an “always-on” mode, regardless of workload, costing money even if resources go unused. Moreover, the system must be provisioned to handle the peak usage or risk being unavailable at the most disruptive moment, resulting in resource wastage during workload troughs. To further exacerbate the problem, this provisioning exercise must be repeated every time the workload scales up or down significantly.

Today’s versions of OpenSearch/ElasticSearch have been retrofitted with “auto-scaling” functionality, but as anyone who’s tried setting it up knows, it requires deep expertise in the product to configure node roles and IS/LM correctly and introduces costly, on-going maintenance and complexity.

A natural question arises: Is there a better way? Can we reduce costs and operational complexity by an order of magnitude, without having to rewrite all existing applications against a new API?

Can we reduce costs and operational complexity by an order of magnitude, without having to rewrite all existing applications against a new API?

A Modern Architecture

Mach5 Search is built using a disaggregated storage architecture, where storage and compute resources are scaled independently of one another. Disaggregated storage allows optimizing for low quiescent costs by scaling down compute resources while not in use and using cheap, cloud storage and efficient caching mechanisms to avoid storing only hot data. This allows for a pay-per-usage pricing model–pay only for what you use when you use it–offering significant cost savings. Moreover, users and operators can then focus their attention on the core business logic and data configuration while completely eschewing any IS/LM, snapshotting, partition management, or data tiering configuration–all of which are handled automatically.

We believe this is a giant step forward for those who need insights from their log data without the headache and without breaking the bank, while future-proofing your analytics stack.

Don’t take our word for it. Talk to us about running Mach5 in your cloud today.

© 2021-2024 Mach5 Software, Inc. All rights reserved

When you visit or interact with our sites, services or tools, we or our authorised service providers may use cookies for storing information to help provide you with a better, faster and safer experience and for marketing purposes.