Concepts

Orchestration

How the Datamotive control plane schedules, sequences, and monitors replication and recovery jobs.

Product: Datamotive Platform
Version: v1.0
Documentation status: Published
Last updated: Updated May 25, 2026
Reading time: 2 min read

Orchestration is the control layer that decides when jobs run, which node executes them, how they are sequenced, and what happens when they fail.

Job types

The orchestration engine schedules four categories of jobs:

Replication cycles: incremental block transfer from source to target, triggered on the RPO interval
Failover jobs: boot shadow VMs in failover or drill mode, executed on operator demand
Consistency checks: validate recovery point integrity by hashing target blocks against the source index
Maintenance jobs: upgrade appliances, rotate encryption keys, prune expired recovery points

Scheduling model

Each plan carries a schedule definition: an interval (for replication) or a cron expression (for maintenance tasks). The control plane evaluates schedules every 30 seconds and enqueues jobs whose next-run time has passed.

Jobs are dispatched to nodes (Replication Appliances or Cloud Connectors) via a persistent message queue. The node acknowledges receipt, executes the job, and streams status updates back to the control plane in real time. If the node does not acknowledge within 60 seconds, the control plane re-enqueues the job to an available node.

Multi-node load balancing

When multiple nodes are registered in the same site, the control plane distributes jobs across them based on current load (active job count and recent CPU utilization reported by each node). This allows horizontal scaling: adding a second node to a site automatically shares the replication workload.

Failover sequencing

Failover orchestration is more complex than replication scheduling. The engine must:

Validate the selected recovery point and confirm all required snapshot IDs are present at the target
Execute pre-failover hooks (if configured), such as stopping application services or flushing write buffers
Boot VM groups in configured order, waiting for each group to pass health checks before starting the next
Apply network customizations per VM
Execute post-failover hooks, such as re-registering VMs with a load balancer or updating DNS

This sequencing is defined in the plan's group configuration and is idempotent: if a failover job is interrupted, it can be safely retried from the last completed step.

Observability

Every orchestrated job produces a structured log stream accessible in the Datamotive console under Jobs. Log entries include timestamps, the node that executed the step, bytes processed, and any errors with suggested remediation steps.

The control plane also emits job lifecycle events to the configured webhook or notification channel (Slack, email, PagerDuty) for integration with external monitoring systems.

Related docs

Was this page helpful?