Skip to content

Data Migration

Data migration is one of the most underestimated workstreams in any Salesforce implementation. It touches data modeling, security, integration, and governance at once. Getting it wrong means launch delays, data quality problems, and eroded stakeholder confidence.

Every migration follows a lifecycle regardless of scale. Skipping phases is the primary cause of migration failures.

Linear flow of the seven migration phases from planning through validation, with feedback loops from validation back to test and from test back to design when issues are found.
Figure 1. Every migration follows this seven-phase lifecycle regardless of scale. The feedback loops are intentional: post-execution issues send the team back to testing, and structural problems discovered in testing can require a return to design. Skipping phases is the primary cause of migration failures.

Define scope, timeline, and success criteria before touching any data.

Planning activities include:

  • Inventory all source systems and data stores
  • Identify which data migrates (not everything should)
  • Define data ownership and migration team roles
  • Establish cutover window and downtime tolerance
  • Set success criteria: record counts, field completeness, relationship integrity
  • Plan rollback strategy

Analyze source data to understand quality, completeness, and structure before designing mappings.

Profiling checklist:

  • Record counts per entity and per source system
  • Field completeness rates (% populated)
  • Data type mismatches between source and target
  • Duplicate detection rates
  • Referential integrity (orphan records, broken relationships)
  • Character encoding issues (UTF-8, special characters)
  • Date format inconsistencies
  • Picklist value mapping requirements

Create the migration architecture: field mappings, transformation rules, load sequence, and error handling.

Design deliverables:

  • Field mapping documents (source field to target field)
  • Transformation rules (data cleansing, format conversion, value mapping)
  • Load sequence diagram (parent objects before child objects)
  • External ID strategy
  • Error handling and retry logic
  • Validation rules to temporarily disable during load

Develop migration scripts, ETL jobs, and automation.

Build considerations:

  • Configure ETL tool connections and credentials
  • Build and unit test transformation logic
  • Create External ID fields where needed
  • Prepare pre-migration scripts (disable triggers, workflows, validation rules)
  • Build post-migration scripts (re-enable automations, run sharing recalculation)

Run the migration in a sandbox, ideally multiple times.

Trial migration objectives:

  • Validate record counts match expectations
  • Verify relationship integrity (lookups resolve correctly)
  • Test data transformations produce correct results
  • Measure timing (will it fit in the cutover window?)
  • Identify and fix errors before production run
  • Train the migration team on the execution procedure

Run the production migration with the full team mobilized.

Cutover activities:

  • Freeze source systems (or capture delta changes)
  • Disable automations (triggers, workflows, flows, validation rules)
  • Run migration in defined sequence
  • Monitor progress and error logs in real-time
  • Execute delta migration for changes during cutover window
  • Re-enable automations
  • Run sharing recalculation if needed

Post-migration validation confirms data integrity and completeness.

Validation checks:

  • Record count reconciliation (source vs target)
  • Spot-check random records for field accuracy
  • Verify all relationships resolve (no orphan records)
  • Run standard reports and compare to source system reports
  • Test business processes with migrated data (create records, run automation)
  • User acceptance testing with business stakeholders

ToolBest ForVolumeComplexityCost
Data LoaderSimple loads, ad-hocUp to millionsLowFree
Bulk API 2.0High-volume programmaticMillions+MediumPlatform included
Informatica CloudComplex ETL, multiple sourcesUnlimitedHighLicensed
MuleSoftAPI-led integration + migrationUnlimitedHighLicensed
JitterbitMid-complexity ETLMillionsMediumLicensed
TalendOpen-source ETLMillionsMediumFree/Licensed
Import WizardSmall volumes, simple< 50KVery LowFree

Salesforce Data Loader is the standard tool for most migrations:

  • Insert: Creates new records
  • Update: Updates existing records (requires Salesforce ID or External ID)
  • Upsert: Insert or update based on External ID match
  • Delete: Soft delete records
  • Hard Delete: Permanently delete (requires Bulk API enabled)
  • Export / Export All: Extract data including soft-deleted records

Command-line mode supports scripting and scheduling:

# Example: command-line Data Loader for automated loads
process.bat <config-directory> <operation>

For high-volume migrations, Bulk API 2.0 does the heavy lifting:

FeatureBulk API 2.0
Record limit100 million records per 24-hour period
File formatCSV
ProcessingAsynchronous
PK ChunkingSupported (for queries)
Serial modeSupported (avoids lock contention)
ParallelismAutomatic

Load sequence matters. Parent records must exist before child records can reference them. External IDs enable upserts that handle this gracefully.

Ordered sequence of ten load phases for a standard CRM migration, starting with reference data and progressing through parent objects before children, with files and activities loaded last.
Figure 2. Parent objects must be loaded before their children so that lookup and master-detail fields can resolve correctly. Activities and files load last because their polymorphic fields (WhoId, WhatId) reference records across multiple objects that must all exist first.
  1. Users and roles first: OwnerId and sharing depend on users existing
  2. Reference data: Picklist values, record types, products, price books
  3. Parent objects before children: Account before Contact, Contact before Case
  4. Master-detail parents before children: Cannot create detail without master
  5. Junction objects last: Both parents must exist first
  6. Files and attachments last: ContentVersion records reference parent IDs
  7. Activities (Tasks/Events) last: WhoId and WhatId reference multiple object types

For complex migrations, map the full dependency chain to visualize load order constraints. The diagram below shows a typical CRM migration with dependencies.

Dependency graph grouping CRM objects into five load phases, showing which parent objects must exist before each child object can be loaded during a Salesforce data migration.
Figure 3. The dependency map reveals why load sequencing errors are so common. Objects like Opportunity Line Items have multiple upstream dependencies (Opportunity and Price Book Entry) that must both be resolved before the line item can be inserted. Build this map before writing a single migration script.

External IDs are the foundation of clean, repeatable migrations. They enable upserts and relationship resolution without Salesforce IDs.

BenefitExplanation
Upsert capabilityInsert new records, update existing ones in a single operation
Relationship resolutionReference parent records by External ID instead of Salesforce ID
Idempotent loadsRe-running a load does not create duplicates
Source system traceabilityMap back to the original system’s record ID
Delta migrationEasily identify records that changed since last load
  • Create an External ID field on every object that will be migrated
  • Use the source system’s primary key as the External ID value
  • Mark the field as unique and external ID (indexed automatically)
  • For multi-source migrations, prefix with source system identifier (e.g., SAP-12345, LEGACY-67890)

The cutover approach is a major architectural decision that affects risk, downtime, and complexity.

Decision flowchart selecting between Big Bang, Phased Migration, and Parallel Run cutover strategies based on downtime tolerance, data complexity, and risk appetite.
Figure 4. Cutover strategy is primarily a risk-vs-cost trade-off. Big Bang concentrates risk into a single window. Phased Migration spreads risk across multiple smaller events. Parallel Run minimizes data risk but doubles operating cost and requires coordinated dual data entry for the overlap period.
StrategyDescriptionProsCons
Big BangAll data migrated in a single cutover eventSimple to plan, clean cut, no dual maintenanceHigh risk, requires downtime, no fallback to old data
PhasedMigrate in stages (by object, by business unit, by geography)Lower risk per phase, progressive learningLonger timeline, data split across systems temporarily
Parallel RunBoth old and new systems run simultaneouslyLowest risk, side-by-side validationHighest cost, dual data entry, reconciliation burden
Gantt chart comparing Big Bang, Phased Migration, and Parallel Run cutover timelines showing relative duration, critical migration windows, and dual-system overlap periods.
Figure 5. The timeline comparison makes the cost of risk mitigation visible. Big Bang completes in 4 days with maximum risk concentration. Phased Migration spans 13 days with incremental risk. Parallel Run extends 3+ weeks and demands reconciliation work throughout, which is the price of having a live fallback system at all times.

Trial migrations are rehearsals that validate every aspect of the production cutover.

MetricWhy It Matters
Total load timeMust fit within the cutover window
Records per hourThroughput rate for capacity planning
Error rateTarget < 1%; investigate any errors
Relationship success% of lookups that resolved correctly
Data accuracySpot-check sample vs source system
  • Use a full-copy sandbox for realistic volume testing
  • Run trials with production-equivalent data volumes (not subsets)
  • Time every phase: accurate estimates feed directly into the cutover plan
  • Document all manual steps - they become the cutover runbook
  • Run at least three full trials before production cutover
  • Include rollback testing - verify you can restore if needed

  • Record count reconciliation: Source system count vs Salesforce count per object
  • Checksum validation: Hash comparison on critical fields
  • Relationship integrity: Query for orphan records (child records with null lookup to expected parent)
  • Automation verification: Test that re-enabled triggers, flows, and validation rules fire correctly
  • Business stakeholder spot-checks: Have business users verify their own data
  • Report comparison: Run key business reports and compare to source system reports
  • Process walkthroughs: Execute end-to-end business processes using migrated data
  • Edge case verification: Check records with special characters, large text fields, attachments

ItemWhy DisableHow to Re-enable
Validation rulesMigrated data may not meet current rulesRe-enable after load, backfill violations
TriggersAvoid unintended automation during loadRe-enable, consider running trigger logic post-load
WorkflowsPrevent email alerts and field updatesRe-enable after validation
Process Builder (deprecated) / FlowsPrevent automation side effectsRe-enable, test with sample records
Duplicate rulesLegacy data may have intended duplicatesRe-enable, run dedup after migration
Assignment rulesPrevent reassignment of migrated recordsRe-enable for new records
Sharing recalculationDefer until all data is loadedTrigger manually after migration

Going directly to production cutover without rehearsal. Every migration has surprises; discover them in sandbox.

Loading dirty data and planning to “clean it up later.” Later never comes. Profile and cleanse before migration.

Loading child records before parents, then trying to fix relationships afterward. Use External IDs and correct sequencing.

Assuming the migration will succeed. Always plan for how to restore the system if migration fails.

Testing with 10K records and discovering the production load of 10M records takes 40 hours instead of the planned 8-hour window.


  • Data Modeling: migration sequence depends on relationship types and parent-child order
  • Integration Patterns: migration tools (Bulk API, middleware) are integration tools
  • Sharing Model: migrated data must respect sharing model; disable/re-enable sharing recalculation carefully
  • Data Quality & Governance: pre-migration profiling and cleansing are data quality exercises
  • Development Lifecycle: migration is a deployable workstream requiring environment strategy and sandbox planning
  • Large Data Volumes: data volume drives tool selection, batch sizing, and cutover window estimation

Personal study notes for the Salesforce CTA exam. Content compiled from VJ's study notes, official Salesforce documentation, community sources, and online publicly available content, then organized and presented with AI assistance. Not affiliated with Salesforce. © 2025–2026 VJ Srivastava.