v6.1.0 Release Notes

Migration Guide

General Updates

  • Upgrading from v5.9.x or earlier: follow the v5.10.0 migration guide before upgrading to v6.1.0. See migration guide from v4.7.0 to v6.1.0.
  • Rename orphanedFileGracePeriodDurationsecondsorphanedFileGracePeriodDurationSeconds in any custom config before upgrading.
  • Upgrade to the latest mach5-sdk-ts to use the new gRPC query API.

API Updates

  • Update the ingest pipeline API payloads for S3, Iceberg, and Kafka as:

S3 pipeline

Move the scan-tuning fields inside the connector config:

Previous
{
  "source_config": {
    "config": {
      "type": "s3"
    }
  },
  "scan_filter_mpl": 1,
  "scan_filter_batch_size": 8192,
  "segment_bin_capacity_bytes": 104857600,
  "max_ingest_workflows_limit": 4,
  "segment_cache_size": 134217728,
  "append_batch_size": 104857600,
  "ignore_mapping_errors": false,
  "max_files_per_ingestor": 100,
  "workflow_timeout_seconds": 7200
}
6.1 snippet
{
  "source_config": {
    "config": {
      "type": "s3",
      "scan_mode": "enumerated",
      "scan_filter_mpl": 1,
      "scan_filter_batch_size": 8192,
      "segment_bin_capacity_bytes": 104857600
    }
  },
  "max_ingest_workflows_limit": 4,
  "segment_cache_size": 134217728,
  "append_batch_size": 104857600,
  "ignore_mapping_errors": false,
  "workflow_timeout_seconds": 7200
}

Iceberg pipeline

Drop the scan tuning keys and max_files_per_ingestor entirely.

Previous
{
  "source_config": {
    "config": {
      "type": "iceberg"
    }
  },
  "scan_filter_mpl": 1,
  "scan_filter_batch_size": 8192,
  "segment_bin_capacity_bytes": 104857600,
  "max_ingest_workflows_limit": 4,
  "segment_cache_size": 134217728,
  "append_batch_size": 104857600,
  "ignore_mapping_errors": false,
  "workflow_timeout_seconds": 7200,
  "max_files_per_ingestor": 100
}
6.1 snippet
{
  "source_config": {
    "config": {
      "type": "iceberg"
    }
  },
  "max_ingest_workflows_limit": 4,
  "segment_cache_size": 134217728,
  "append_batch_size": 104857600,
  "ignore_mapping_errors": false,
  "workflow_timeout_seconds": 7200
}

Kafka pipeline

Remove the scan filters, segment_bin_capacity_bytes, and max_files_per_ingestor

Previous
{
  "source_config": {
    "config": {
      "type": "kafka"
    }
  },
  "scan_filter_mpl": 1,
  "scan_filter_batch_size": 8192,
  "segment_bin_capacity_bytes": 104857600,
  "max_ingest_workflows_limit": 4,
  "segment_cache_size": 134217728,
  "append_batch_size": 104857600,
  "ignore_mapping_errors": false,
  "workflow_timeout_seconds": 7200,
  "max_files_per_ingestor": 100
}
6.1 snippet
{
  "source_config": {
    "config": {
      "type": "kafka"
    }
  },
  "max_ingest_workflows_limit": 4,
  "segment_cache_size": 134217728,
  "append_batch_size": 104857600,
  "ignore_mapping_errors": false,
  "workflow_timeout_seconds": 7200
}

What’s Changed

Query Language & MDX Enhancements

  • typeof() and to-typed() add runtime type inspection and explicit casting for finer control over value interpretation across heterogeneous sources.
  • Composite aggregations surface missing-value buckets via missing_bucket; Terms aggregations now produce correct results for boolean and IP fields.
  • mdx-cli now accepts a query as a command-line argument, simplifying scripting without intermediate files.

SQL Connectivity

  • Mach5 now offers PostgreSQL compatibility, so you can connect using any PostgreSQL-compatible client or BI tool.
  • Trino connectivity reaches production readiness with stability and correctness fixes for distributed query workloads.
  • SQL jobs now carry owner identity and track lifecycle state end-to-end, enabling resource governance in multi-tenant environments.

Garbage Collection

  • Orphaned file cleanup is more robust: several edge cases leaving files uncollected are fixed, and the deletion grace period is increased to 7 days.
  • A task leak in the fdb-reconciler causing unbounded memory growth on retry loops has been resolved.

Performance, Reliability & Infrastructure

  • Full-text indexing throughput is improved via batching and parallelism at the segment and term level.
  • Memory estimates now include projected columns, and both FSM and ART structures have a reduced baseline footprint.
  • Hash joins can spill to disk under memory pressure, preventing OOM failures on large join workloads.
  • Several correctness fixes land in this release: disk cache hot path and cstore snapshot consistency
  • Dremel null handling across all read modes, and three concurrency issues (PostOffice race, MutationGapError, job cancellation deadlock).
  • nginx worker_connections is now configurable via values.yaml for high-concurrency deployments.

External Integrations

  • OpenSearch and Elasticsearch can be connected as external data sources with query pushdown support.
  • Azure Blob Storage is now a supported ingest target with full Iceberg table support. GCS users can ingest Iceberg tables via native BigLake bucket integration.
  • BigQuery SQL pushdown is corrected for several patterns that previously fell back to client-side execution.

UI (Dex) & API changes

  • Cell outputs are updated in-place as data arrives rather than being fully recreated, reducing flicker.
  • Browser sessions refresh more reliably, reducing stale-session errors in long-running notebooks.
  • Editor fixes: Ctrl/Cmd+A selects the full active cell, connection resolution works in the ingest pipeline edit view, and dynamic value rendering is consistent.
  • Ingest pipeline source configs must be updated to the new per-type proto format when creating through API.
  • When creating a warehouse, you can choose how memory is managed during query execution. This helps balance performance, reliability, and cost based on your workload.
  • Route indices to different stores using namespace-based regex patterns.