m5c data modules
Data modules describe providers or source modules that can satisfy contracts. A module connects source identity, contract coverage, capabilities, mappings, and tests.
Modules are passive source descriptions. Ingestion, checkpoints, and runtime execution are owned by consumers and deployed resources.
Where module files live
m5c discovers a data module from any file named module.yaml.
A typical module layout is:
packages/modules/okta-system-log/
module.yaml
mappings/
system-log-auth.yaml
tests/
fixtures/
okta-successful-login.raw.json
Keep source-specific mappings and fixtures under the module directory so coverage is easy to review.
Build a module
Build a module in this order:
- Name the source or provider shape.
- Declare the module kind and source information.
- List the contracts the module can provide, with semver ranges.
- List the capabilities the module satisfies for each contract.
- Reference the mapping files that normalize raw records.
- Add fixture-backed module tests.
- Run
m5c validateandm5c test.
Module document
kind: DataModule
metadata:
name: okta-system-log
owner: mach5
spec:
version: 0.1.0
module_kind: provider
source:
kind: connector
connection_kind: okta
provides:
contracts:
identity.authentication_event.v1: ">=1.0.0 <2.0.0"
security.raw_event.v1: ">=1.0.0 <2.0.0"
capabilities:
identity.authentication_event.v1:
- principal_available
- source_ip_available
- status_available
- raw_evidence_reference_available
mappings:
- ./mappings/system-log-auth.yaml
tests:
- name: okta_successful_login
fixture_ref: ./tests/fixtures/okta-successful-login.raw.json
contract: identity.authentication_event.v1
Top-level fields
| Field | Required | Meaning |
|---|---|---|
kind | Yes | Must be DataModule. |
metadata.name | Yes | Stable module name, usually source-specific. |
metadata.owner | Optional | Owning team or organization. |
spec.version | Yes | Semver module version. |
spec.module_kind | Yes | Module category, such as provider or source. |
spec.source | Optional | Source identity and connector information. |
spec.provides.contracts | Recommended | Contract names and supported semver ranges. |
spec.capabilities | Recommended | Capabilities satisfied per contract. |
spec.mappings | Recommended | Relative paths to ContractMapping YAML files. |
spec.tests | Recommended | Module-level fixture references. |
Source identity
spec.source describes where the raw records come from.
source:
kind: connector
connection_kind: okta
| Field | Meaning |
|---|---|
source.kind | Broad source category, such as connector, saas, cloud, object_store, or custom. |
source.connection_kind | Connector or connection family, such as okta, github, s3, or gcs. |
This source metadata is separate from a mapping’s source.kind. The module describes the provider; the mapping describes one raw record shape and how it becomes a contract row.
Provided contracts
spec.provides.contracts is a map from contract name to semver range.
provides:
contracts:
identity.authentication_event.v1: ">=1.0.0 <2.0.0"
The key must match a local DataContract.metadata.name. The value must be a valid semver requirement.
Use this to say, “this module can provide rows compatible with this contract version range.”
Capabilities
Capabilities are listed under each contract name.
capabilities:
identity.authentication_event.v1:
- principal_available
- source_ip_available
- status_available
Each capability should be defined in the target contract under spec.capabilities. The full capability reference used by detections and feature gates is:
identity.authentication_event.v1.principal_available
Capabilities should describe actual coverage. If the source cannot reliably supply source IP, do not list source_ip_available.
Mappings
spec.mappings lists mapping files relative to module.yaml.
mappings:
- ./mappings/system-log-auth.yaml
m5c validate checks that each referenced mapping file exists. The mapping itself declares the target contract and field-level normalization logic.
A module may provide one contract with one mapping, one contract with multiple source-shape mappings, or multiple contracts with multiple mappings.
Module tests
Module tests identify raw fixtures that demonstrate coverage.
tests:
- name: okta_successful_login
fixture_ref: ./tests/fixtures/okta-successful-login.raw.json
contract: identity.authentication_event.v1
fixture_ref is resolved relative to module.yaml. m5c test checks that referenced fixture files exist. Field-level output assertions belong in the mapping’s own tests block.
Relationship to mappings
A module tells reviewers what the source can provide:
provides:
contracts:
identity.authentication_event.v1: ">=1.0.0 <2.0.0"
capabilities:
identity.authentication_event.v1:
- principal_available
mappings:
- ./mappings/system-log-auth.yaml
The mapping tells m5c how to produce the contract row:
spec:
source:
kind: okta.system_log
format: json
target:
contract: identity.authentication_event.v1
Keep both. The module is the coverage manifest; the mapping is executable normalization logic.
Validate module coverage
m5c validate apps/security-analytics --workspace --offline
m5c test apps/security-analytics --workspace
Validation checks semver versions, provided contracts, semver ranges, and mapping file references. Tests check module fixture references and run mapping/conformance tests elsewhere in the workspace.
Common mistakes
| Mistake | Fix |
|---|---|
Using aliases under provides.contracts | Use contract names as keys and semver ranges as values. |
| Listing a contract that does not exist locally | Add the contract package or correct the contract name. |
| Listing capabilities not backed by the source | Only list capabilities the module can actually satisfy. |
Confusing module source.connection_kind with mapping source.kind | Use connection_kind for connector family; use mapping source.kind for raw record shape. |
Forgetting ./ or correct relative paths | Resolve mappings and fixtures relative to module.yaml. |
Best practices
- Keep modules provider-specific and contracts provider-neutral.
- Use semver ranges that match the contract generation the module supports.
- Use capabilities to show partial support honestly.
- Reference mappings explicitly so coverage is reviewable.
- Add fixtures for every supported source variant.
- Keep module tests lightweight and mapping tests precise.