Documentation

Trino Opensearch Connector with Mach5 Search

This guide explains how to set up the Trino OpenSearch connector with Mach5 Search, deploy it using Helm, and run queries (including joins) across Mach5 Search indices.

Install helm charts

For ease of usability, you can deploy the Trino connector charts in the same namespace as the Mach5 Search deployment.

 helm repo add trino https://trinodb.github.io/charts/
 helm repo update
 helm upgrade --install my-trino-cluster trino/trino --namespace mach5 -f trino-values.yaml

Trino values file

# trino-values.yaml

# General Trino configuration
server:
  workers: 2 # You can adjust the number of worker nodes here
  nodeScheduler:
    includeCoordinator: false

# Catalogs configuration
additionalCatalogs:
  opensearch: |
    connector.name=opensearch
    opensearch.host=warehouse-os-f2311022-19d1-44b3-9fa5-8f2f6da80b #CHANGE_ME to point to the warehouse-os service
    opensearch.port=9200

Port-forward to access trino

 kubectl --namespace mach5 port-forward svc/my-trino-cluster-trino 8022:8080

Download trino cli

The Trino CLI (Command-Line Interface) is the client application we can use to interact with the above Trino cluster. It can be used to run SQL queries and view query results directly in the console.

curl -L https://repo1.maven.org/maven2/io/trino/trino-cli/476/trino-cli-476-executable.jar -o trino
chmod +x trino

# Command to show all catalogs
./trino --server http://localhost:8022 --catalog opensearch --schema default --debug --execute "SHOW catalogs;"

# Command to list all the tables in the opensearch.default catalog
./trino --server http://localhost:8022 --catalog opensearch --schema default --debug --execute "SHOW TABLES FROM opensearch.default;"


# Command to get the number of records in an index
./trino --server http://localhost:8022 --catalog opensearch --schema default --debug --execute "select count(*) from opensearch.default.table_name;"

Restart all trino pods

You would need to restart the trino pods if the opensearch pod settings are updated, or if the opensearch pod is restarted. In this scenario, use the command below to restart all the trino pods.

kubectl delete pod -l app.kubernetes.io/name=trino -n mach5

Join example with Trino

Suppose we have the following indices with a few documents in Mach5:

# Users index
PUT users
{
  "mappings": {
    "properties": {
      "user_id": { "type": "keyword" },
      "name": { "type": "keyword" },
      "email": { "type": "keyword" }
    }
  }
}

# Orders index
PUT orders
{
  "mappings": {
    "properties": {
      "order_id": { "type": "keyword" },
      "user_id": { "type": "keyword" },
      "amount": { "type": "double" }
    }
  }
}

# Insert users
POST users/_bulk
{ "index": {} }
{ "user_id": "u1", "name": "Alice", "email": "alice@example.com" }
{ "index": {} }
{ "user_id": "u2", "name": "Bob", "email": "bob@example.com" }

# Insert orders
POST orders/_bulk
{ "index": {} }
{ "order_id": "o1", "user_id": "u1", "amount": 120.5 }
{ "index": {} }
{ "order_id": "o2", "user_id": "u2", "amount": 75.0 }
{ "index": {} }
{ "order_id": "o3", "user_id": "u1", "amount": 200.0 }

Run a join query on the 2 indices created above, using Trino CLI as follows:

trino:default> SELECT u.user_id, u.name, o.order_id, o.amount FROM opensearch.default.users u JOIN opensearch.default.orders o ON u.user_id = o.user_id WHERE o.amount > 100;
 user_id | name  | order_id | amount 
---------+-------+----------+--------
 u1      | Alice | o3       |  200.0 
 u1      | Alice | o1       |  120.5 
(2 rows)

We showed how the Trino Opensearch connector allows running SQL queries, including joins across multiple indices, directly on search data, enabling teams to use familiar SQL syntax for performing analytics on Mach5 Search.