m5c data modules

Data modules describe providers or source modules that can satisfy contracts. A module connects source identity, contract coverage, capabilities, mappings, and tests.

Modules are passive source descriptions. Ingestion, checkpoints, and runtime execution are owned by consumers and deployed resources.

Where module files live

m5c discovers a data module from any file named module.yaml.

A typical module layout is:

packages/modules/okta-system-log/
  module.yaml
  mappings/
    system-log-auth.yaml
  tests/
    fixtures/
      okta-successful-login.raw.json

Keep source-specific mappings and fixtures under the module directory so coverage is easy to review.

Build a module

Build a module in this order:

  1. Name the source or provider shape.
  2. Declare the module kind and source information.
  3. List the contracts the module can provide, with semver ranges.
  4. List the capabilities the module satisfies for each contract.
  5. Reference the mapping files that normalize raw records.
  6. Add fixture-backed module tests.
  7. Run m5c validate and m5c test.

Module document

kind: DataModule
metadata:
  name: okta-system-log
  owner: mach5
spec:
  version: 0.1.0
  module_kind: provider
  source:
    kind: connector
    connection_kind: okta
  provides:
    contracts:
      identity.authentication_event.v1: ">=1.0.0 <2.0.0"
      security.raw_event.v1: ">=1.0.0 <2.0.0"
  capabilities:
    identity.authentication_event.v1:
      - principal_available
      - source_ip_available
      - status_available
      - raw_evidence_reference_available
  mappings:
    - ./mappings/system-log-auth.yaml
  tests:
    - name: okta_successful_login
      fixture_ref: ./tests/fixtures/okta-successful-login.raw.json
      contract: identity.authentication_event.v1

Top-level fields

FieldRequiredMeaning
kindYesMust be DataModule.
metadata.nameYesStable module name, usually source-specific.
metadata.ownerOptionalOwning team or organization.
spec.versionYesSemver module version.
spec.module_kindYesModule category, such as provider or source.
spec.sourceOptionalSource identity and connector information.
spec.provides.contractsRecommendedContract names and supported semver ranges.
spec.capabilitiesRecommendedCapabilities satisfied per contract.
spec.mappingsRecommendedRelative paths to ContractMapping YAML files.
spec.testsRecommendedModule-level fixture references.

Source identity

spec.source describes where the raw records come from.

source:
  kind: connector
  connection_kind: okta
FieldMeaning
source.kindBroad source category, such as connector, saas, cloud, object_store, or custom.
source.connection_kindConnector or connection family, such as okta, github, s3, or gcs.

This source metadata is separate from a mapping’s source.kind. The module describes the provider; the mapping describes one raw record shape and how it becomes a contract row.

Provided contracts

spec.provides.contracts is a map from contract name to semver range.

provides:
  contracts:
    identity.authentication_event.v1: ">=1.0.0 <2.0.0"

The key must match a local DataContract.metadata.name. The value must be a valid semver requirement.

Use this to say, “this module can provide rows compatible with this contract version range.”

Capabilities

Capabilities are listed under each contract name.

capabilities:
  identity.authentication_event.v1:
    - principal_available
    - source_ip_available
    - status_available

Each capability should be defined in the target contract under spec.capabilities. The full capability reference used by detections and feature gates is:

identity.authentication_event.v1.principal_available

Capabilities should describe actual coverage. If the source cannot reliably supply source IP, do not list source_ip_available.

Mappings

spec.mappings lists mapping files relative to module.yaml.

mappings:
  - ./mappings/system-log-auth.yaml

m5c validate checks that each referenced mapping file exists. The mapping itself declares the target contract and field-level normalization logic.

A module may provide one contract with one mapping, one contract with multiple source-shape mappings, or multiple contracts with multiple mappings.

Module tests

Module tests identify raw fixtures that demonstrate coverage.

tests:
  - name: okta_successful_login
    fixture_ref: ./tests/fixtures/okta-successful-login.raw.json
    contract: identity.authentication_event.v1

fixture_ref is resolved relative to module.yaml. m5c test checks that referenced fixture files exist. Field-level output assertions belong in the mapping’s own tests block.

Relationship to mappings

A module tells reviewers what the source can provide:

provides:
  contracts:
    identity.authentication_event.v1: ">=1.0.0 <2.0.0"
capabilities:
  identity.authentication_event.v1:
    - principal_available
mappings:
  - ./mappings/system-log-auth.yaml

The mapping tells m5c how to produce the contract row:

spec:
  source:
    kind: okta.system_log
    format: json
  target:
    contract: identity.authentication_event.v1

Keep both. The module is the coverage manifest; the mapping is executable normalization logic.

Validate module coverage

m5c validate apps/security-analytics --workspace --offline
m5c test apps/security-analytics --workspace

Validation checks semver versions, provided contracts, semver ranges, and mapping file references. Tests check module fixture references and run mapping/conformance tests elsewhere in the workspace.

Common mistakes

MistakeFix
Using aliases under provides.contractsUse contract names as keys and semver ranges as values.
Listing a contract that does not exist locallyAdd the contract package or correct the contract name.
Listing capabilities not backed by the sourceOnly list capabilities the module can actually satisfy.
Confusing module source.connection_kind with mapping source.kindUse connection_kind for connector family; use mapping source.kind for raw record shape.
Forgetting ./ or correct relative pathsResolve mappings and fixtures relative to module.yaml.

Best practices

  • Keep modules provider-specific and contracts provider-neutral.
  • Use semver ranges that match the contract generation the module supports.
  • Use capabilities to show partial support honestly.
  • Reference mappings explicitly so coverage is reviewable.
  • Add fixtures for every supported source variant.
  • Keep module tests lightweight and mapping tests precise.

Analytics Cookies

Help us understand website usage.

Necessary storage remembers your choice. With your consent, Mach5 also uses PostHog analytics to measure website traffic and interactions.

Change this anytime from Cookie Settings in the footer. Privacy Notice.