Concepts
Orchestration
How the Datamotive control plane schedules, sequences, and monitors replication and recovery jobs.
- Product
- Datamotive Platform
- Version
- v1.0
- Documentation status
- Published
- Last updated
- Updated
- Reading time
- 2 min read
Orchestration is the control layer that decides when jobs run, which node executes them, how they are sequenced, and what happens when they fail.
Job types
The orchestration engine schedules four categories of jobs:
- Replication cycles: incremental block transfer from source to target, triggered on the RPO interval
- Failover jobs: boot shadow VMs in failover or drill mode, executed on operator demand
- Consistency checks: validate recovery point integrity by hashing target blocks against the source index
- Maintenance jobs: upgrade appliances, rotate encryption keys, prune expired recovery points
Scheduling model
Each plan carries a schedule definition: an interval (for replication) or a cron expression (for maintenance tasks). The control plane evaluates schedules every 30 seconds and enqueues jobs whose next-run time has passed.
Jobs are dispatched to nodes (Replication Appliances or Cloud Connectors) via a persistent message queue. The node acknowledges receipt, executes the job, and streams status updates back to the control plane in real time. If the node does not acknowledge within 60 seconds, the control plane re-enqueues the job to an available node.
Multi-node load balancing
When multiple nodes are registered in the same site, the control plane distributes jobs across them based on current load (active job count and recent CPU utilization reported by each node). This allows horizontal scaling: adding a second node to a site automatically shares the replication workload.
Failover sequencing
Failover orchestration is more complex than replication scheduling. The engine must:
- Validate the selected recovery point and confirm all required snapshot IDs are present at the target
- Execute pre-failover hooks (if configured), such as stopping application services or flushing write buffers
- Boot VM groups in configured order, waiting for each group to pass health checks before starting the next
- Apply network customizations per VM
- Execute post-failover hooks, such as re-registering VMs with a load balancer or updating DNS
This sequencing is defined in the plan's group configuration and is idempotent: if a failover job is interrupted, it can be safely retried from the last completed step.
Observability
Every orchestrated job produces a structured log stream accessible in the Datamotive console under Jobs. Log entries include timestamps, the node that executed the step, bytes processed, and any errors with suggested remediation steps.
The control plane also emits job lifecycle events to the configured webhook or notification channel (Slack, email, PagerDuty) for integration with external monitoring systems.
Related docs
Was this page helpful?
