Skip to content
Datamotive
Easy Hybrid DRv2.3.1GARelease notes →

Reference

Scale and performance guide

Architectural, scalability, sizing, performance, cloud quota, and operational guidance for Datamotive workload portability.

Product
Easy Hybrid DR
Version
v2.3.1
Release status
GA
Documentation status
Published
Last updated
Updated
Reading time
17 min read

The Datamotive Workload Portability Platform is an enterprise-grade disaster recovery, workload mobility, and cross-cloud replication solution designed to provide scalable, predictable, and cloud-native protection for modern virtualized and cloud-hosted workloads.

Executive summary

The platform enables organizations to protect, migrate, and recover workloads across heterogeneous environments including VMware, Nutanix, Microsoft Azure and Amazon Web Services (AWS) using an agentless, block-level replication and cross-platform, cross-cloud recovery architecture optimized for enterprise environments.

Datamotive is designed to address the operational and scalability challenges associated with modern disaster recovery deployments, including:

  • Large-scale parallel workload replication
  • Highly fragmented block-change workloads
  • WAN-constrained replication environments
  • Cross-cloud recovery orchestration
  • Recovery-time scalability
  • Cost-efficient disaster recovery architectures

This guide provides detailed guidance for:

  • Datamotive component architecture
  • Replication engine scalability characteristics
  • Management and replication node sizing
  • Parallel replication and recovery limits
  • Platform-specific scaling guidance
  • Cloud API and quota considerations

The guidance in this document is intended for:

  • Enterprise infrastructure architects
  • Cloud architects
  • Disaster recovery architects
  • Cloud operations teams
  • Managed service providers
  • Datamotive deployment and support teams

Successful enterprise-scale disaster recovery deployments depend not only on replication throughput, but also on proper workload hygiene, network architecture, storage performance, and recovery orchestration design.

The Datamotive platform is designed to address these requirements through a scalable, cloud-native, horizontally extensible architecture capable of supporting enterprise-scale workload portability, migration, and disaster recovery initiatives across hybrid and multi-cloud environments.

Scope and assumptions

This document provides architectural, scalability, sizing, performance, and operational guidance for the Datamotive Workload Portability Platform across supported on-premises and public cloud environments.

Supported platforms

Datamotive supports replication, recovery, migration, and workload portability across the following infrastructure platforms.

Supported source platformSupported workloads
VMware vSphereVirtual Machines
Nutanix AHVVirtual Machines
Microsoft AzureAzure Virtual Machines
Amazon Web Services (AWS)EC2 Instances
Supported target platformRecovery / replication target
VMware vSphereSupported
Microsoft AzureSupported
Amazon Web Services (AWS)Supported
Nutanix AHVSupported

Supported use cases

The Datamotive platform is designed to support multiple workload portability and disaster recovery scenarios.

Use case areaScenarios
Disaster RecoveryOn-premises to cloud DR; cloud to cloud DR; cross-region DR; multi-cloud DR; hybrid cloud DR.
Workload MobilityCross-cloud migration; data center evacuation; cloud onboarding; infrastructure modernization; workload rebalancing.
Enterprise Server WorkloadsApplication servers; database servers; file servers; web/application tiers; multi-tier enterprise applications.

Architectural assumptions

The scalability and performance guidance in this document assumes the following baseline conditions.

AreaAssumptions
Infrastructure assumptionsEnterprise-grade networking infrastructure; stable WAN connectivity between protected and recovery environments; proper DNS and routing configuration; proper Active Directory/LDAP configuration; low packet-loss WAN connectivity.
Cloud assumptionsActive enterprise cloud subscriptions/accounts; appropriate cloud IAM permissions; cloud quotas can be increased as required.
Hypervisor and source environment assumptionsCBT is enabled where applicable; snapshot operations are functioning normally; source storage latency is within vendor-recommended limits; supported hypervisor versions are deployed.
Operational assumptionsReplication scheduling is appropriately staggered for large environments; recovery subnet/IP capacity is pre-planned; DR testing is periodically performed; sufficient recovery compute capacity is available in the target cloud/region.

Scalability assumptions

The replication and recovery guidance in this document assumes:

  • Rolling-window replication protocol v2 is enabled
  • Multi-threaded replication is enabled
  • Replication nodes are deployed using recommended sizing guidance
  • Horizontal scaling is used for large environments

Actual scalability depends on:

  • Workload change rates
  • CBT fragmentation characteristics
  • WAN bandwidth and latency
  • Cloud API performance
  • Storage throughput characteristics
  • Recovery concurrency

Scope of this document

This document focuses specifically on:

  • Datamotive component architecture
  • Replication engine scalability
  • Node sizing guidance
  • Replication concurrency
  • Recovery concurrency
  • Cloud quota planning
  • API throttling considerations
  • Capacity planning guidance
  • Platform-specific operational considerations

Architectural overview

The Datamotive Workload Portability Platform is designed as a distributed, horizontally scalable architecture that separates management, replication and recovery orchestration into independent functional components.

The platform is optimized for enterprise-scale disaster recovery and workload mobility use cases across on-premises and public cloud environments, with a strong focus on:

  • Scalability
  • Predictable recovery operations
  • WAN-efficient replication
  • Cloud-native recovery
  • Operational simplicity
  • Horizontal expansion

The architecture supports both small deployments using a minimal number of nodes and large-scale deployments spanning hundreds of workloads across multiple sites and cloud regions.

High-level architecture

The Datamotive platform consists of the following logical layers.

Datamotive high-level architecture showing management plane, protected site, replication engine, recovery site, and storage layer
Datamotive separates the management plane, protected site, replication transport layer, replication engine, recovery site, and cloud-native storage layer.

Architectural design principles

The Datamotive platform is designed around several core architectural principles.

Agentless replication

Datamotive performs replication without requiring guest operating system agents inside protected workloads.

Benefits include:

  • Simplified deployment
  • Reduced operational overhead
  • Lower workload impact
  • Reduced security footprint
  • Simplified lifecycle management

Block-level incremental replication

The platform performs block-level replication using native hypervisor or cloud incremental tracking mechanisms where available.

This approach enables:

  • Efficient WAN utilization
  • Reduced replication overhead
  • Faster replication intervals
  • Scalable incremental recovery

Rolling-window replication architecture

The Datamotive replication engine uses adaptive rolling-window transport architecture optimized for WAN transports replication fragmented enterprise workloads while ensuring reliability and data integrity.

Key characteristics include:

  • Sliding-window replication
  • Parallel chunk streaming
  • Descriptor-driven scheduling
  • Worker-controlled flow management
  • Node-level backpressure handling
  • Adaptive concurrency management

This architecture is specifically designed to avoid the inefficiencies associated with traditional stop-and-wait replication models.

Horizontal scalability

The platform is designed to scale horizontally through the addition of Replication Nodes and supporting infrastructure components.

This allows the platform to support:

  • Large-scale replication environments
  • High parallel recovery operations
  • Cross-region DR architectures
  • Multi-cloud disaster recovery

The Datamotive Replication and Recovery Nodes can also be scaled vertically to achieve the most optimum deployment model based on specific customer needs.

Cloud-native recovery

Datamotive integrates directly with cloud-native storage and compute APIs.

Recovery workflows use:

  • AWS EBS snapshots and volumes
  • Azure Managed Disks and snapshots
  • VMware vSphere SDKs
  • Nutanix Prism & Prism Central SDKs

This approach enables:

  • Reduced recovery infrastructure requirements
  • Elastic recovery scaling
  • Cloud-native recovery orchestration
  • Simplified DR operations

Control plane and data plane separation

The Datamotive architecture separates management and orchestration operations from replication data movement operations and target site / DR site recovery operations.

Control plane responsibilities

The control plane is responsible for:

  • Scheduling
  • Policy management
  • Inventory tracking
  • Recovery orchestration
  • Monitoring
  • API management
  • Metadata operations

Data plane responsibilities

The data plane is responsible for:

  • Block replication
  • Chunk streaming
  • Flow control
  • Cloud storage writes
  • Data integrity validation
  • Recovery reads

This separation provides:

  • Better scalability
  • Improved fault isolation
  • Simplified horizontal expansion
  • Independent scaling of management and replication workloads

Replication workflow overview

The Datamotive replication workflow consists of the following high-level stages.

Datamotive replication workflow from changed block identification to recovery checkpoint finalization
The Datamotive replication workflow identifies changed blocks, creates descriptors, schedules chunks, streams them, writes them, and finalizes recovery checkpoints.

Recovery overview

The recovery architecture is designed for large-scale parallel recovery operations across cloud and on-premises environments.

Recovery workflows include:

  • Disk reconstruction
  • Snapshot finalization
  • System health checks for Windows workloads
  • VM instantiation
  • Network attachment
  • Boot order orchestration
  • Recovery validation

The platform supports:

  • Full site failover
  • Partial failover
  • Test failover
  • Cross-cloud recovery
  • Granular workload recovery

Recovery orchestration is designed to operate in parallel while respecting cloud API limits, quota constraints, and storage concurrency limits.

Scalability model

Datamotive scalability is achieved through a combination of:

  • Parallel replication streams
  • Sliding-window replication
  • Vertical or Horizontal replication-node expansion
  • Distributed recovery orchestration

The architecture is designed such that:

  • Small environments can operate using minimal infrastructure
  • Large environments can scale horizontally using additional Replication Nodes
  • Recovery throughput scales independently from management operations
  • Replication throughput scales independently from orchestration operations

This model enables predictable scaling across a wide range of enterprise deployment sizes.

Component architecture

The Datamotive Workload Portability Platform is composed of multiple independently deployable infrastructure components designed to provide scalable, secure, and cloud-native disaster recovery, replication, migration, and workload mobility capabilities across on-premises and public cloud environments.

The platform architecture is designed around a clear separation of:

  • Management and orchestration operations
  • Replication and data movement operations
  • Recovery preparation workflows
  • Storage optimization services

This modular architecture enables:

  • Independent component scaling
  • Operational flexibility
  • Horizontal expansion
  • Fault isolation
  • Cloud-native deployment models
  • Multi-site and multi-cloud deployments

All Datamotive appliances are based on CIS-hardened Ubuntu Server 22 LTS images and are delivered either as:

  • Pre-packaged virtual appliances for hypervisor-based deployments
  • Cloud-native machine images for public cloud environments
Datamotive deployment architecture with protected site, recovery site, management and replication appliance, Prep Node, DeDup Node, and object storage
Datamotive appliances can be deployed across protected and recovery environments with supporting Prep and DeDup nodes in the recovery environment.

Datamotive Management Server

The Datamotive Management Server is the central orchestration and control-plane component of the platform.

It is responsible for:

  • User interface (UI), Command-line interface (CLI), RESTful APIs
  • Policy management
  • Replication orchestration
  • Recovery orchestration
  • Scheduling
  • Monitoring and alerting
  • Reporting
  • Inventory and metadata management
  • Day-0 through Day-N operational workflows

The Management Server coordinates all platform activities while maintaining operational metadata, workflow state, inventory information, and recovery orchestration logic.

The Management Server must be deployed:

  • At the protected site
  • At the recovery site

For smaller deployments, the Management Server can additionally function as a Replication Node.

Management Server deployment formats

PlatformDeployment format
VMwareOVA
NutanixOVA
AWSCloud-native AMI
AzureCloud-native VM Image
Environment scalevCPUMemoryStorage
Very Small4vCPU8 GB120GB SSD
Small4 vCPU16 GB200 GB SSD
Medium8 vCPU24 GB500 GB SSD
Large16 vCPU32 GB1 TB SSD
Very Large32 vCPU64 GB2 TB SSD

Actual sizing depends on:

  • Number of protected workloads
  • Recovery orchestration concurrency
  • Historical data retention requirements
  • Reporting requirements
  • API activity levels
  • Multi-site deployment scale

Datamotive Replication Node

The Datamotive Replication Node is the primary data-plane component responsible for executing replication data operations.

Replication Engines are deployed at both the protected site and the recovery site and are responsible for:

  • Block-level replication
  • Incremental change processing
  • Rolling-window transport management
  • Parallel stream handling
  • WAN-optimized data transfer
  • Descriptor-driven chunk scheduling
  • Data integrity validation
  • Cloud-native storage writes
  • Recovery data reads
  • Flow-control management
  • Recovery checkpoint finalization

The replication architecture is optimized for:

  • Highly fragmented CBT workloads
  • Enterprise-scale replication
  • Large numbers of parallel workloads
  • WAN-constrained environments
  • Cross-cloud replication
  • High-churn workloads
  • Cloud-native storage APIs

The Datamotive platform scales horizontally through the deployment of additional Replication Engines, enabling large-scale parallel replication across thousands of protected workloads.

Replication Engine architecture

The Datamotive replication engine uses a rolling-window replication architecture designed to maximize throughput efficiency across fragmented block-level workloads.

Key architectural characteristics include:

  • Sliding-window parallel replication
  • Target-side Worker-driven flow control
  • Descriptor-based replication scheduling
  • Node-level backpressure handling
  • Parallel chunk streaming
  • Cloud-native storage integration
  • Horizontally scalable replication workers

The replication engine is specifically designed to avoid the inefficiencies associated with traditional stop-and-wait replication models.

Platform-specific replication scaling guidance

Replication scalability depends heavily on:

  • Source platform behaviour
  • Storage performance
  • CBT fragmentation characteristics
  • WAN bandwidth
  • Recovery concurrency
  • Cloud-native storage throttling

The following sections provide recommended operational guidance for each supported platform.

VMware replication scaling

For VMware-based environments, replication scale is primarily influenced by:

  • VMware CBT fragmentation
  • Snapshot performance
  • VDDK read concurrency
  • Datastore latency
  • Snapshot chain depth
  • WAN throughput
  • Cloud storage write throughput
MetricRecommended scale
Parallel Replication Jobs~60 - 80 protected disks
Recommended Parallel VMs30-40 VMs
Recommended Sustained Throughput~1 Gbps WAN replication

These values represent conservative operational guidance for fragmented enterprise VMware workloads.

Actual scale depends on:

  • Disk size distribution
  • Average workload churn
  • CBT fragmentation characteristics
  • Storage subsystem latency
  • WAN quality

AWS replication scaling

For AWS-native replication environments, scalability is primarily influenced by:

  • EBS API throughput
  • Snapshot write concurrency
  • EC2 instance network bandwidth
  • Regional API throttling
  • EBS volume quotas
  • Cloud storage concurrency
MetricRecommended scale
Parallel Replication Jobs200-250 protected volumes
Recommended Parallel Instances100-150 instances
Recommended Throughput~5 Gbps depending on instance type

Large-scale AWS environments may additionally require:

  • Regional quota increases
  • API throttling monitoring
  • Multiple Replication Nodes
  • Dedicated recovery subnets
  • Storage concurrency tuning

Azure replication scaling

For Azure-native replication environments, scalability is primarily influenced by:

  • Azure Page Blob API throughput
  • ARM API throttling
  • Managed disk operation concurrency
  • VM network throughput
  • Subscription quotas
  • Regional vCPU quotas
MetricRecommended scale
Parallel Replication Jobs150 - 300 protected disks
Recommended Parallel Instances75 - 200 instances
Recommended ThroughputUp to 10 - 25 Gbps depending on VM size

Large Azure environments may additionally require:

  • VM-family quota increases
  • Managed-disk quota increases
  • ARM API throttling optimization
  • Horizontal replication scaling

Datamotive Prep Node

The Datamotive Prep Node is a Windows-based recovery preparation appliance deployed within the recovery environment.

The Prep Node is activated only during:

  • Windows workload recovery
  • Cross-platform migration workflows
  • Recovery preparation operations
  • Boot remediation workflows
  • File System and Registry compatibility checks

Prep Nodes are used primarily during:

  • VMware-to-cloud recoveries
  • Cross-hypervisor migrations
  • Cross-cloud recovery operations
  • Windows boot remediation workflows

Recovery scaling guidance

MetricRecommended scale
Parallel Windows Recoveries per Prep NodeUp to 20 workloads

Additional Prep Nodes can be deployed horizontally for large-scale recovery events, or can also be scaled vertically with instance types that support larger number of attached disks.

Datamotive DeDup Node

The Datamotive DeDup Node is an optional optimization component deployed within the recovery environment.

The node maintains:

  • Chunk checksum indexes
  • Previously transferred block metadata
  • Deduplicated chunk references
  • Chunk reuse mappings

When enabled, the DeDup Node significantly reduces:

  • WAN bandwidth consumption
  • Cross-cloud transfer costs
  • Replication duration
  • Repetitive data transfer overhead

The DeDup Node is particularly effective for:

  • One-time Migration use cases
  • Similar workload datasets
  • Large-scale enterprise environments

DeDup operational considerations

Deduplication effectiveness depends on:

  • Workload similarity
  • Operating system standardization
  • Retention duration
  • Replication interval frequency

The DeDup Node is optional and can be enabled selectively based on workload characteristics and WAN optimization requirements.

Replication Engine architecture

The Datamotive Replication Engine is designed as a high-performance, distributed, block-level replication framework optimized for enterprise-scale disaster recovery and workload mobility operations.

The replication architecture is specifically engineered to address the challenges associated with:

  • Highly fragmented change workloads
  • Large-scale parallel replication
  • WAN-constrained environments
  • Cross-cloud data movement
  • Cloud-native storage APIs
  • Recovery-time scalability
  • High-churn enterprise and VDI environments

At the core of the platform is the Datamotive rolling-window replication architecture, which enables efficient parallel replication while maintaining predictable throughput, scalability, and recovery consistency.

Architectural objectives

The replication engine is designed around the following primary objectives.

Predictable Throughput

Maintain sustained replication throughput across:

  • Fragmented workloads
  • Large parallel replication sets
  • WAN-based deployments
  • High-latency cloud environments

Efficient WAN Utilization

Maximize WAN efficiency by:

  • Eliminating stop-and-wait replication patterns
  • Maintaining continuous data streaming
  • Supporting parallel inflight replication windows
  • Reducing idle replication gaps

Horizontal scalability

Enable enterprise-scale replication by supporting:

  • Large numbers of protected workloads
  • Multiple replication engines
  • Distributed replication streams
  • Parallel recovery operations

Cloud-native integration

Integrate directly with:

  • AWS EBS snapshots and volumes
  • Azure Managed Disks
  • VMware datastores
  • Nutanix storage
  • GCP Persistent Disks
  • Raw Disks

This enables:

  • Elastic recovery operations
  • Reduced recovery infrastructure
  • Cloud-native orchestration

Backpressure-aware replication

The replication engine is designed to adapt dynamically to:

  • Storage write latency
  • Cloud API throttling
  • WAN variability
  • Replication-node pressure
  • Recovery concurrency

This prevents replication instability and uncontrolled resource amplification.

Evolution of the replication architecture

Traditional replication systems frequently use stop-and-wait transport models.

Typical behaviour:

  • Read chunks in batches
  • Send chunks
  • Wait for acknowledgement
  • Send next chunks in batches

While standard, this model introduces significant inefficiencies in modern enterprise environments, particularly when handling:

  • Highly fragmented CBT workloads
  • WAN latency
  • Cloud storage APIs
  • Parallel workload replication

These inefficiencies result in:

  • WAN underutilization
  • Idle pipeline gaps
  • Increased replication latency
  • Poor scaling characteristics
  • Recovery-time bottlenecks

To address these limitations, Datamotive introduced the rolling-window replication architecture.

Rolling-window replication model

The Datamotive replication engine uses a sliding-window architecture designed to maintain continuous parallel replication streams.

Instead of waiting for acknowledgement after every chunk, the replication engine maintains multiple inflight chunks simultaneously.

This enables:

  • Continuous WAN utilization
  • Reduced RTT sensitivity
  • Better cloud write parallelism
  • Improved throughput consistency
  • Better large-scale replication behaviour

Rolling-window replication flow

Datamotive rolling-window replication flow from changed block identification to recovery checkpoint finalization
The Datamotive rolling-window replication flow maintains per-disk replication windows and streams chunks in parallel through cloud-native storage writes.

Descriptor-driven replication

The replication engine uses descriptor-driven scheduling instead of sequential block streaming.

Replication descriptors contain:

  • Offset
  • Length
  • Chunk metadata
  • Replication state

This model is specifically optimized for fragmented workloads where changed blocks are highly non-sequential.

Benefits include:

  • Efficient fragmented-block handling
  • Sparse-write optimization
  • Reduced unnecessary reads
  • Better cloud-write scheduling
  • Parallel chunk distribution

Parallel chunk streaming

Replication data is transmitted using parallel chunk streams.

The replication engine maintains multiple inflight chunks simultaneously across:

  • Multiple disks
  • Multiple workloads
  • Multiple replication workers

This architecture avoids:

  • Idle WAN periods
  • Sequential replication bottlenecks
  • Storage-write serialization
  • Pipeline starvation

Default operational parameters

ParameterTypeRequiredDescription
Chunk SizesizerequiredDefault chunk size.
Default: 1 MB
Per-Disk WindowsizerequiredDefault per-disk replication window.
Default: 32 MB
Intermediate Checkpoint WindowsizerequiredDefault intermediate checkpoint window.
Default: 500 MB

These values represent balanced defaults optimized for:

  • WAN efficiency
  • Cloud write behaviour
  • Memory utilization
  • Retransmission overhead
  • Fragmented CBT handling

Worker-driven flow control

The Datamotive replication engine uses a worker-controlled pull model.

In this architecture:

  • Workers request replication windows
  • Clients stream requested chunks
  • Workers control inflight concurrency
  • Replication pressure is centrally managed

The worker-driven architecture provides:

  • Better replication fairness
  • Improved backpressure management
  • Better cloud-write coordination
  • Improved large-scale stability

Node-level backpressure management

The replication engine maintains both:

  • Per-disk replication windows
  • Node-level inflight replication windows

This architecture prevents:

  • Excessive memory amplification
  • Cloud-write overload
  • Uncontrolled replication bursts
  • Storage queue saturation

Backpressure decisions consider:

  • Cloud storage latency
  • Write queue depth
  • Chunk arrival behaviour
  • Replication-node pressure
  • Error and retry behaviour

Adaptive window scaling

The replication engine dynamically adjusts replication-window sizes based on operational conditions.

Window scaling decisions are influenced by:

  • Storage-write pressure
  • WAN throughput stability
  • Cloud API responsiveness
  • Chunk retry rates
  • Replication concurrency

Typical per disk operational window levels range between: [16MB | 24MB | 32MB | 48MB | 64MB].

This approach provides:

  • Predictable behaviour
  • Operational simplicity
  • Stable scaling characteristics
  • Controlled resource utilization

Recovery checkpoint finalization

Replication operations are finalized into recovery checkpoints after:

  • Chunk validation
  • Storage-write completion
  • Metadata synchronization
  • Intermediate validation
  • Consistency validation

Recovery checkpoints form the basis for:

  • Recovery operations
  • Failover workflows
  • Test recoveries
  • Migration operations

Architectural advantages

The Datamotive rolling-window replication architecture provides several operational advantages over traditional replication models.

AreaAdvantages
Improved WAN EfficiencyReduced idle transmission gaps; better bandwidth utilization; reduced RTT sensitivity.
Better Fragmented Workload HandlingOptimized for fragmented CBT; efficient sparse-write handling; parallel chunk scheduling.
Improved Cloud IntegrationBetter cloud-write concurrency; improved API utilization; better recovery scalability.
Better Large-Scale StabilityControlled backpressure handling; adaptive concurrency management; predictable scaling behaviour.
Horizontal ScalabilityIndependent replication-node scaling; distributed replication workloads; better operational flexibility.

Scalability and performance guidelines

The Datamotive Replication Engine is designed to dynamically adapt replication throughput based on available infrastructure capacity across the complete replication pipeline.

Unlike traditional replication architectures that operate using static or sequential transfer models, the Datamotive rolling-window replication engine continuously balances replication flow across multiple stages of the replication lifecycle.

The effective replication throughput is determined by the combined factors of:

  • Source-side read performance
  • Changed block processing time
  • Network transfer throughput
  • Chunk validation time
  • Cloud-native storage write throughput
  • Recovery checkpoint finalization

The replication engine continuously adapts its inflight replication windows and parallel replication streams based on these operational conditions.

Replication flow control model

Replication flow control model showing read time, transfer time, validation time, write time, and dynamic flow control
End-to-end replication throughput is determined by the combined performance of the read, transfer, validation, and write stages.

The Datamotive rolling-window replication engine maintains multiple inflight replication windows across parallel disks and workloads simultaneously.

The replication engine automatically adapts to:

  • Available WAN bandwidth
  • Source-side storage latency
  • CBT fragmentation characteristics
  • Cloud storage write latency
  • Parallel replication concurrency
  • Cloud API responsiveness

This allows the replication engine to maintain stable throughput while preventing:

  • Network underutilization
  • Storage-write overload
  • Replication stalls
  • Excessive inflight memory growth
  • Cloud API saturation

Performance test scenario

The following results represent observed replication performance during internal validation testing.

ParameterValue
Source PlatformAzure
Target PlatformAWS
Replication TopologyAzure to AWS
Replication Nodes1
Protected VMs40
Protected Disks80
Replication Interval20 minutes
Average Changed Data per Disk700-800 MB
WAN Bandwidth500 Mbps (Burstable to 700Mbps)
Observed Sustained Throughput~420 Mbps

Observations

The replication engine maintained sustained throughput of ~70% of practical WAN limits while handling:

  • Parallel disk replication
  • Fragmented CBT workloads
  • Multiple simultaneous VM replication streams
  • Continuous parallel cloud-native storage writes

The observed throughput remained stable across the replication intervals without requiring sequential or serialized replication behaviour.

Deployment scalevCPUMemoryNetworkRecommended workloadsRecommended parallel disks
Very Small48 GBUp to 500 MbpsUp to 3060 - 80
Small416 GB1-5 Gbps30 - 6080 - 200
Medium832 GB10 Gbps60 - 150200 - 500
Large1664 GB10 - 30Gbps150 - 300500 - 1000
Very Large32128 GB50 Gbps+300+1000+

The above numbers are recommended assuming ~500MB of change data per disk per replication interval of every 30 mins.

Compute-optimized instance families are recommended for large-scale deployments due to the highly parallel nature of replication operations.

For large enterprise environments, Datamotive recommends horizontal scaling using multiple Replication Nodes.

Environment sizeRecommended Replication Nodes
< 50 workloads1
50 - 300 workloads2-4
300 - 1000 workloads4-8
>1000 workloads8+

Horizontal scaling provides:

  • Increased aggregate throughput
  • Better API distribution
  • Improved recovery concurrency
  • Reduced node-level contention
  • Better operational isolation

Parallel recovery guidance

Recovery concurrency depends primarily on:

  • Attached Disks capacity for Prep Node
  • Cloud API throttling
  • Compute Instance quotas
  • Network provisioning concurrency

For scaling recoveries, deploying the Prep Node at a higher configuration enables more than 22 disks to be attached for recovery. It is important to plan for recoveries in batches according to Protection Plans and prepare for Compute Instance quotas on the target cloud.

Cloud quotas and API limits

Cloud-native disaster recovery and replication scalability is heavily influenced by cloud-provider quotas, API throttling behaviour, and storage provisioning limits.

For large-scale replication and recovery operations, cloud APIs frequently become one of the primary operational scaling factors, particularly during:

  • Large-scale failovers
  • Parallel recovery operations
  • Snapshot creation bursts
  • Volume provisioning operations
  • Cross-region copy operations
  • Mass VM power-on orchestration

Datamotive is designed to operate within these limits using:

  • Parallel orchestration
  • Retry-aware workflows
  • Backpressure-aware replication
  • Horizontal scaling
  • Adaptive concurrency management

However, enterprise-scale deployments should proactively review and increase cloud quotas prior to production onboarding and DR testing.

AWS quotas and API limits

Datamotive primarily interacts with the following AWS services:

ServiceUsage
EC2 APIsVM creation, power operations, orchestration
EBS APIsSnapshot and volume operations
EBS Direct APIsBlock-level snapshot writes/reads
IAM APIsAccess and orchestration
VPC APIsNetwork provisioning

Important AWS scalability limits

AWS Resource / APITypical default limitNotes
EBS Snapshots per Region100,000Adjustable
Concurrent OpenSnapshot for Write operations100 per accountAdjustable (with AWS support request)
EBS Volume Attachments per InstanceInstance dependent (typically 28 on Nitro)Recovery consideration
EBS Volume Size64 TiB maximumPer volume
Regional EC2 vCPU QuotasVaries by VM familyFrequently requires increase
EBS Direct API ThroughputAccount and API limitedImportant for parallel recovery

AWS API throttling behaviour

AWS uses token-bucket throttling for EC2 and EBS APIs.

Important characteristics:

  • Throttling occurs per API operation
  • Limits vary by API type
  • Non-mutating APIs typically have higher limits
  • Mutating APIs are throttled more aggressively

Examples of APIs commonly used during Datamotive operations:

APIOperation Type
CreateSnapshotStorage mutation
CreateVolumeStorage mutation
AttachVolumeCompute/storage mutation
RunInstancesCompute mutation
StartInstancesCompute mutation
DescribeVolumesRead operation
DescribeSnapshotsRead operation

AWS returns throttling responses using:

  • HTTP 429
  • RequestLimitExceeded
  • ThrottlingException

AWS EBS Direct API considerations

For block-level snapshot operations, EBS Direct APIs introduce additional scalability considerations.

Important operational characteristics include:

ParameterTypical Behavior
Block Size512 KB
API Throughput ScalingBased on concurrency
Parallel API CallsStrongly recommended
Snapshot Write ThroughputLimited by account/API quotas

AWS documentation indicates that approximately:

  • 1000 PutSnapshotBlock operations/sec can produce roughly 500 MB/s throughput per snapshot workflow under optimal conditions.

AWS quota increase guidance

For enterprise DR environments, Datamotive strongly recommends proactive quota increases for:

QuotaRecommendation
Regional vCPU quotasIncrease before DR onboarding
EBS snapshot quotasIncrease for long retention periods
EBS volume quotasIncrease for large recoveries
Snapshot Write limitsIncrease for cross-region DR
ENI/IP quotasValidate during recovery planning

Quota increases can be requested using:

  • AWS Service Quotas
  • AWS Enterprise Support
  • AWS Support Center

Azure quotas and API limits

Datamotive primarily interacts with:

Azure ServiceUsage
Azure Resource Manager (ARM)Resource orchestration
Azure Compute APIsVM provisioning and operations
Azure Managed Disk APIsDisk and snapshot operations
Azure Networking APIsNIC and network provisioning
Azure Subscription QuotasRegional compute allocation

Important Azure scalability limits

Azure Resource / APITypical default limitNotes
Managed Disks per Subscription/Region50,000 per disk typeAdjustable
Incremental Snapshots per Disk500Important for retention planning
Regional vCPU QuotasSubscription dependentFrequently requires increase
VM Family QuotasRegion and family specificCommon DR bottleneck
ARM API RequestsSubscription and region throttledCritical for parallel recovery
Concurrent Disk OperationsCompute/provider throttledRecovery consideration

Azure ARM API throttling

Azure Resource Manager (ARM) applies throttling at multiple levels:

  • Subscription
  • Region
  • Resource provider
  • Service principal
  • API operation type

Operations commonly impacted during large-scale recoveries include:

API CategoryExamples
VM APIsVM creation, power operations
Disk APIsManaged disk creation
Snapshot APIsSnapshot creation and deletion
Network APIsNIC and IP provisioning
Page Blob APIsRead and write content

Typical throttling symptoms include:

  • HTTP 429 responses
  • Retry-After headers
  • Delayed provisioning
  • Intermittent API failures

Azure quota increase guidance

Azure quotas are frequently the primary limiting factor for large-scale DR operations.

Datamotive recommends proactively increasing:

QuotaRecommendation
Regional vCPU quotasMandatory for enterprise DR
VM-family quotasRequired for target recovery VM sizes
Managed disk quotasIncrease for large environments
NIC quotasValidate during DR planning
Public IP quotasValidate if public recovery required

Quota increases can be requested using:

  • Azure Portal
  • Azure Quota APIs
  • Microsoft Support

Practical operational considerations

In large-scale recovery environments, practical scalability is often constrained more by API orchestration behaviour than raw infrastructure capacity.

Common operational bottlenecks include:

  • API throttling
  • Subscription quota exhaustion
  • Disk provisioning concurrency
  • Snapshot finalization delays
  • VM-family quota exhaustion
  • Network provisioning limits

These conditions become particularly important during:

  • Mass failover events
  • Parallel recovery testing
  • Regional evacuations
  • Large VDI recovery operations

Datamotive operational recommendations

AreaRecommendation
QuotasIncrease quotas before production onboarding
DR TestingPerform periodic scale testing
Recovery DesignUse staged recovery orchestration for massive failovers
API HandlingUse retry-aware orchestration
ScalingUse multiple Replication Engines for large environments
MonitoringContinuously monitor throttling behaviour
NetworkingPre-plan subnet and IP capacity

Related docs

Was this page helpful?