Prefetching Data Structures for Query Optimization
Overview
To optimize query performance on an index, we can prefetch the underlying field-specific or global data structures required by different query types. By understanding which structures are used in which contexts, we can avoid unnecessary I/O and reduce latency.
This document outlines the data structures needed during various phases of query execution.
Data Structures by Query Type
Index Probe Query
An index probe is performed on every segment to identify matching documents for a field term, especially for prefix search like queries.
- Required Data Structures (per field):
TermDictionaryFstTree
Postings
Term Query
A term query uses a Bloom filter to determine whether a segment should be included for an index probe.
- Required Data Structures (per field):
BloomFilter
TermDictionaryFstTree
Postings
Aggregation and Sort Query
These operations require access to column-stored field data, typically used for sorting or aggregating numeric or keyword data.
- Required Data Structures (per field):
ColumnStoredData
ColumnStoredSkip
Final Hit Collection
Once document hits are determined, their full rows are retrieved for result formation or output.
- Required Data Structures (global — not per field):
RowStoredData
RowStoredSkip
Field-Specific Data Structures
Text Fields
Text fields require special structures for scoring and positional access.
- Required Data Structures (per field):
Norms
Positions
Geo Type Fields
Geo queries (e.g., bounding box, radius search) rely on spatial indexes.
- Required Data Structures (per field):
TermDictionaryBkd
TermDictionaryMetas
TermDictionaryBlocks
Postings
Range Queries
Range-capable fields (typically numeric or date types) use sketch-based approximations.
- Required Data Structures (per field):
Sketch
Special Case: “Must Not” Term Optimization
For must_not
queries, an optimization may bypass the index probe entirely. Instead, a column lookup is used based on a metric threshold.
- Required Data Structures (per field):
- From Index Probe:
TermDictionaryFstTree
Postings
- From Aggregation/Sort:
ColumnStoredData
ColumnStoredSkip
- From Index Probe:
Summary Table
Query Type / Operation | Field-Specific | Required Data Structures |
---|---|---|
Index Probe | ✅ | TermDictionaryFstTree , Postings |
Term Query | ✅ | BloomFilter , TermDictionaryFstTree , Postings |
Aggregation / Sort | ✅ | ColumnStoredData , ColumnStoredSkip |
Final Hit Collection | ❌ | RowStoredData , RowStoredSkip |
Text Fields | ✅ | Norms , Positions |
Geo Fields | ✅ | TermDictionaryBkd , TermDictionaryMetas , TermDictionaryBlocks , Postings |
Range Queries | ✅ | Sketch |
Must Not Term Optimization | ✅ | TermDictionaryFstTree , Postings , ColumnStoredData , ColumnStoredSkip |
Notes
-
All data structures except
RowStoredData
andRowStoredSkip
are field-specific. -
Prefetching should be context-aware: only load structures relevant to the query type and fields involved.
-
This enables fine-grained resource management and query performance improvements.
Query to Extract Data Structure Metadata
To find out the data structure metadata for a field, use the following. These queries are to be run using Notebook resource
within the Mach5 admin UI.
Field wise metadata
index_segment_metadata('index_name')
| summarize sum(component_length) by field_name
| render piechart
Summarize metadata
index_segment_metadata('index_name')
| summarize sum(component_length) by component_type
| render piechart
Prefetch metadata
set ldop=16;
index_segment_field_prefetch("index-name", "field-name", "metadata”)
Summarize metadata by segment
index_segment_metadata('index_name')
| summarize sum(component_length) by segment_name
Number of segments
index_segment_metadata('index_name')
| summarize sum(component_length) by segment_name
| summarize count()