m5c data contracts
Data contracts are app-facing data interfaces. They describe the records an app expects, independent of vendor-specific source fields.
Apps should target contracts. Modules and mappings adapt raw sources to those contracts.
Where contract files live
m5c discovers a data contract from any file named contract.yaml.
A typical contracts package looks like this:
packages/contracts/
identity-authentication/
contract.yaml
tests/
conformance/
okta-successful-login.normalized.json
repo-activity/
contract.yaml
Use one directory per contract so fixtures, notes, and generated reports stay close to the contract definition.
Build a contract
Build a contract in this order:
- Name the app-level concept, not the source vendor.
- Define the grain: what one row means.
- Add the smallest set of required fields.
- Add keys and event-time semantics.
- Add capabilities that apps and detections can require.
- Add conformance fixtures that prove example rows satisfy the contract.
- Run
m5c validateandm5c test.
Contract document
kind: DataContract
apiVersion: semantic-catalog.mach5.io/v1alpha1
metadata:
name: identity.authentication_event.v1
version: 1.0.0
display_name: OCSF Authentication Event
description: Authentication and identity events normalized to an OCSF-inspired shape.
spec:
lifecycle:
status: draft
owner: mach5
compatibility: semver
external_standard:
name: OCSF
version: 1.4.0
class_uid: 3002
class_name: Authentication
grain:
kind: event
unit: authentication_attempt
cardinality: one_row_per_event
event_boundary: source_event
shape:
openness: open
fields:
event_uid:
type: string
required: true
nullable: false
role: id
time:
type: datetime
required: true
nullable: false
semantics: event_time
actor.user.email_addr:
type: email
required: false
nullable: true
aliases: [user.email]
actor.user.uid:
type: string
required: false
nullable: true
aliases: [user.id]
src_endpoint.ip:
type: ip
required: false
nullable: true
aliases: [source.ip]
status_id:
type: enum
required: true
nullable: false
enum_ref: status_id
raw_event_ref:
type: string
required: false
nullable: true
role: raw_evidence
keys:
primary:
fields: [event_uid]
uniqueness: required
dedupe:
fields: [event_uid]
window: 30d
collision_policy: keep_latest_observed
time:
event_time:
field: time
required: true
enums:
status_id:
type: int
unknown_policy: allow_unknown
values:
"0": { name: Unknown }
"1": { name: Success }
"2": { name: Failure }
capabilities:
principal_available:
condition: exists(actor.user.uid) || exists(actor.user.email_addr)
source_ip_available:
condition: exists(src_endpoint.ip)
status_available:
condition: exists(status_id)
raw_evidence_reference_available:
condition: exists(raw_event_ref)
quality:
assertions:
- name: event_uid_present
expr: exists(event_uid)
severity: error
- name: time_present
expr: exists(time)
severity: error
lineage:
raw_evidence:
required: true
fields: [raw_event_ref]
sensitivity:
classification: security_operational
pii_fields:
- actor.user.email_addr
- src_endpoint.ip
handling:
default: internal
compatibility:
breaking_changes:
- remove_required_field
- change_field_type
- change_grain
- remove_required_capability
conformance:
fixtures:
- name: okta_successful_login
input_ref: tests/conformance/okta-successful-login.normalized.json
expect:
event_uid: okta-evt-001
status_id: 1
expect_capabilities:
principal_available: true
status_available: true
Top-level fields
| Field | Required | Meaning |
|---|---|---|
kind | Yes | Must be DataContract. |
apiVersion | Recommended | Current examples use semantic-catalog.mach5.io/v1alpha1. |
metadata.name | Yes | Stable contract name. Prefer semantic, versioned names such as identity.authentication_event.v1. |
metadata.version | Recommended | Semver contract version, such as 1.0.0. |
metadata.display_name | Optional | Human-readable name. |
metadata.description | Optional | Short explanation of the contract. |
spec.grain | Yes | Defines what one row means. |
spec.shape.fields | Yes | Field definitions. Must contain at least one field. |
spec.keys | Recommended | Primary, natural, and dedupe keys. |
spec.time | Recommended | Event, observed, and ingest time semantics. |
spec.capabilities | Recommended | Named predicates providers and detections can reference. |
spec.conformance | Recommended | Fixture-backed examples run by m5c test. |
Naming contracts
Name contracts by app semantics, not by source product.
| Better | Avoid |
|---|---|
identity.authentication_event.v1 | okta_login_event |
repo.activity.v1 | github_webhook |
cloud.audit_activity.v1 | gcp_audit_log |
security.raw_event.v1 | json_blob |
The contract version in the name, such as .v1, represents the app-facing interface generation. metadata.version is the semver version of this contract document.
Grain
spec.grain should answer one question: what does one row represent?
grain:
kind: event
unit: authentication_attempt
cardinality: one_row_per_event
event_boundary: source_event
Use precise grain text before adding fields. If the grain is unclear, mappings and detections will become ambiguous.
Shape and fields
Fields live under spec.shape.fields. Field names may be dotted to represent semantic paths.
actor.user.email_addr:
type: email
required: false
nullable: true
aliases: [user.email]
Supported scalar type names include:
| Type | Expected JSON value |
|---|---|
string, datetime, date, time, duration, ip, cidr, uri, email, uuid, binary | String |
enum | String or number |
bool | Boolean |
int, long | Integer number |
float, double, decimal | Number |
json, any | Any JSON value |
array<...> | Array |
map<...>, struct | Object |
Common field options:
| Field option | Meaning |
|---|---|
required | A row is invalid when the field is missing. |
nullable | A row may contain the field with a JSON null value. |
repeated | The field value must be an array. |
deprecated | Field remains accepted but should not be used by new apps. |
sensitive | Field contains sensitive data. |
role | Field purpose, such as id, time, or raw_evidence. |
semantics | Additional semantic label, such as event_time. |
description | Human-readable field explanation. |
examples | Example JSON values. |
enum_ref | Name of an enum in spec.enums. |
default | Default value documentation. |
const | Constant value documentation. |
aliases | Alternate names used by sources, mappings, or users. |
source_standard_ref | Reference into an external standard such as OCSF. |
Keys
Use keys to describe identity and deduplication.
keys:
primary:
fields: [event_uid]
uniqueness: required
dedupe:
fields: [event_uid]
window: 30d
collision_policy: keep_latest_observed
m5c test verifies that primary and dedupe key fields exist in conformance fixtures.
Time semantics
Time fields are objects with a field name and optional required flag.
time:
event_time:
field: time
required: true
ingest_time:
field: ingest_time
required: false
If a time field is marked required, m5c test verifies that conformance fixtures contain it.
Enums
Use enums for stable values that rules, dashboards, or agents will branch on.
enums:
status_id:
type: int
unknown_policy: allow_unknown
values:
"0": { name: Unknown }
"1": { name: Success }
"2": { name: Failure }
Enum keys are strings in YAML. Values contain at least name and can include description.
Capabilities
Capabilities are named predicates over a row. Modules and detections reference them as contract.capability, for example identity.authentication_event.v1.principal_available.
capabilities:
principal_available:
condition: exists(actor.user.uid) || exists(actor.user.email_addr)
V1 capability expressions support:
exists(field.name)field.name == value&&,||, and!- parentheses
Use capabilities for meaningful coverage questions, such as whether principal, source IP, repository, or raw evidence fields are available.
Conformance fixtures
Conformance fixtures are normalized rows that should already match the contract. They are not raw source records.
conformance:
fixtures:
- name: okta_successful_login
input_ref: tests/conformance/okta-successful-login.normalized.json
expect:
event_uid: okta-evt-001
status_id: 1
expect_capabilities:
principal_available: true
status_available: true
The fixture file is resolved relative to contract.yaml.
Example fixture:
{
"event_uid": "okta-evt-001",
"time": "2026-05-26T00:00:00Z",
"actor.user.email_addr": "alice@example.com",
"status_id": 1,
"raw_event_ref": "raw-okta-system-log/okta-evt-001"
}
m5c test validates required fields, nullability, field types, keys, time requirements, lineage requirements, quality assertions, expected values, and expected capabilities.
Validate and test
m5c validate apps/security-analytics --workspace --offline
m5c test apps/security-analytics --workspace
validate checks contract structure. test runs conformance fixtures and reports fixture counts and failures.
Common mistakes
| Mistake | Fix |
|---|---|
| Naming a contract after a vendor | Name the contract after app semantics. Use modules and mappings for vendors. |
Using timestamp when fixtures contain strings | Use datetime; V1 validates datetime-shaped fields as strings. |
Writing time.event_time: time | Use event_time: { field: time, required: true }. |
| Defining enum values as a list | Use a map of enum keys to { name: ... } entries. |
| Requiring too many fields | Require stable fields needed by apps; express optional coverage as capabilities. |
| Using raw source fixtures for conformance | Use normalized contract-shaped fixtures. Raw fixtures belong in mapping tests. |
Best practices
- Define grain before fields.
- Keep required fields small and durable.
- Use capabilities to represent optional-but-important provider coverage.
- Use enums for fields that drive rules, dashboards, or agent decisions.
- Add conformance fixtures whenever you add a new provider or mapping.
- Treat breaking changes as a new contract generation or semver-major change.