Skip to content

Data Cloud Architecture

Salesforce Data Cloud (rebranded to Data 360 in October 2025) is a native lakehouse platform that ingests, harmonizes, unifies, and activates customer data at scale. For CTAs, Data Cloud represents the platform answer to Customer 360, replacing traditional ETL/MDM approaches with a metadata-driven, consumption-based data platform. This page covers the architecture in depth; for basic comparisons with external data options, see External Data.

Data Cloud processes data through six stages. Each stage transforms raw source data into unified, actionable customer insights.

Six-stage Data Cloud pipeline from data stream ingestion through preparation, modeling, identity resolution, analysis with segments and calculated insights, and activation via flows and data actions.
Figure 1. Data Cloud’s six-stage pipeline expands on the four-stage summary: raw Data Stream Objects pass through cleaning and typing in the Prepare stage before mapping to canonical Data Model Objects. Identity resolution then runs on modeled data, ensuring unified profiles are built from clean, harmonized inputs rather than raw source records.

Reactive processing: Data Cloud does not poll for changes. Storage Native Change Events (SNCE) detect every write operation via atomic metadata pointer swaps, and Change Data Feed (CDF) identifies exactly which records changed, enabling incremental downstream processing.


Understanding the DSO-DLO-DMO progression is foundational. Each layer serves a different purpose.

LayerObjectStoragePurpose
RawData Stream Object (DSO)Materialized (Parquet/Iceberg)Raw ingested data, schema as-is from source
PreparedData Lake Object (DLO)Materialized (Parquet/Iceberg)Cleaned, transformed data in the lakehouse
ModeledData Model Object (DMO)Physical and virtual viewsHarmonized canonical model mapped to Customer 360 schema
Diagram showing how data from four source types flows through Data Stream Objects and Data Lake Objects before converging into a single canonical Data Model Object for unified profiling.
Figure 2. Each source system produces a Data Stream Object preserving the raw schema as-is. Transforms in the Prepare stage clean and type the data into Data Lake Objects. Multiple DLOs then map to a single canonical DMO (the Unified Individual), which becomes the basis for identity resolution and segmentation.

Data Spaces provide logical partitions within a single Data Cloud instance, separating data by brand, region, department, or SDLC stage without provisioning separate orgs.

  • Data Sources, Data Streams, and DLOs can be shared across Data Spaces
  • DMOs and platform features (segments, activations) are isolated per Data Space
  • Permission Sets control read/write/admin access per Data Space
  • All operations are audit-logged for compliance

Identity resolution links records about the same entity across sources into a single Unified Profile. It uses a ruleset containing match rules and reconciliation rules.

Pipeline showing source profiles moving through matching with blocking keys and fuzzy rules, transitive clustering, and field-level reconciliation to produce a Unified Profile and ID Graph.
Figure 3. Identity resolution runs in three stages. Matching uses blocking keys to narrow candidates before scoring with exact and fuzzy rules. Clustering applies transitive matching: if A matches B and B matches C, all three merge into one cluster. Reconciliation then selects the winning field value per source priority or recency rules.
StageWhat HappensDetails
MatchingCandidate pairs identifiedBlocking keys narrow the search space; Locality Sensitive Hashing (LSH) finds candidates; exact and fuzzy (probabilistic) match rules score similarity
ClusteringRelated matches groupedTransitive matching: if A=B and B=C, then A=B=C form one cluster
ReconciliationWinner selected per fieldWhen multiple sources provide the same field (e.g., email), reconciliation rules pick the winner based on source priority, recency, or completeness
TypeHow It WorksExample
ExactField values must be identicalEmail = Email
FuzzyProbabilistic similarity scoring”Jon Smith” matches “John Smith” using phonetic/semantic algorithms
NormalizedPre-processing before comparisonStrip whitespace, lowercase, remove special characters

The ID graph connects all known identifiers for a person: email addresses, phone numbers, device IDs, loyalty IDs, and account usernames. Each Unified Profile has its own subgraph.

Identity resolution supports multiple entity types beyond individuals:

  • Individual: B2C customer profiles
  • Account: B2B company profiles
  • Household: Grouped individuals sharing an address or relationship
  • Cross-entity: Linking individuals to accounts and households

Calculated insights are derived metrics computed via ANSI SQL or a visual declarative builder. They surface aggregated intelligence on unified data.

AspectDetail
LanguageANSI SQL or visual builder (no-code)
InputDMOs, other calculated insights
OutputMetrics attached to profiles (e.g., Lifetime Value, RFM score, Engagement Score)
MaterializationBatch (periodic refresh) or streaming (continuous)
SurfacingAvailable on CRM records, in segments, in flows, and via API

Common calculated insight patterns:

  • Customer Lifetime Value (CLV) - sum of historical purchases
  • Recency-Frequency-Monetary (RFM) scoring
  • Engagement score - weighted sum of interactions across channels
  • Product affinity - category preferences from purchase/browse history
  • Churn risk - days since last interaction thresholds

Segments are audiences built from unified profiles and calculated insights. They define “who” to target.

  • Built using a drag-and-drop segment builder or SQL
  • Can reference DMO attributes, calculated insight values, and engagement data
  • Support nested logic (AND/OR), exclusions, and time-based filters
  • Recalculated on a schedule or near-real-time

Activation pushes segments to downstream systems for action.

Target CategoryExamples
MarketingMarketing Cloud Engagement, Google Ads, Meta Ads, Amazon Ads
CRMData actions to Salesforce flows, platform events, record updates
CommerceB2C Commerce Cloud for personalized storefronts
AnalyticsTableau, CRM Analytics for segment analysis
ExternalAny system via webhook, API, or data action

Data actions bridge Data Cloud and CRM. They fire when conditions on a DMO or calculated insight are met.

  • Same-org: Data Cloud-Triggered Flows respond to DMO changes directly
  • Cross-org: Data actions generate Platform Events consumed by flows or Apex in another org
  • Related lists: Data Cloud Related Lists surface DMO data on Contact, Account, and Lead records without replicating data into CRM objects
  • Field enrichment: Calculated insight values can be written back to CRM fields

Zero-copy eliminates data duplication by querying external data warehouses in place, using Apache Iceberg table format and Parquet files.

Architecture diagram showing Data Cloud's query engine federating SQL queries to Snowflake, Databricks, BigQuery, and Redshift via Iceberg REST Catalog without copying data into Data Cloud.
Figure 4. Zero-copy federation queries data directly in external warehouses via the Apache Iceberg REST Catalog protocol, eliminating the need to ingest and store that data in Data Cloud. For a scenario like B2B account intelligence with billions of product usage rows in Snowflake, this avoids ingestion costs of 2,000 credits per million rows; federation costs only 70.
DirectionMechanism
Data Cloud queries externalSQL federation with intelligent pushdown to external engines
External queries Data CloudJDBC driver or Data-as-a-Service (DaaS) API for file-based sharing
Use Zero-Copy WhenAvoid Zero-Copy When
Enterprise data lake is actively managed and governedSource data is poorly structured or undocumented
Data is already curated and complete in the warehouseFrequent complex transformations are needed
Avoiding duplicate pipelines across business unitsIdentity resolution is needed (lakes lack this)
Cost optimization - 70 credits/M rows vs free internal ingestionCompliance requires full data lineage within Salesforce

Data Cloud runs on a lakehouse architecture built on Apache Iceberg (table format) and Apache Parquet (file format), deployed on Hyperforce (AWS).

TierLatencyUse Case
Main memoryMillisecondsReal-time event processing, in-session personalization
Low Latency Store (LLS)Sub-secondNVMe-backed durable cache for hot data
Lakehouse (S3)SecondsLong-term storage for DLOs, historical data, bulk queries

The real-time layer enables sub-second personalization:

  • Real-time data graphs: denormalized Customer 360 profiles with pre-joined objects
  • Real-time ingest: millisecond-level event capture from Web/Mobile SDKs
  • Real-time identity resolution: exact-match only, instant unification
  • Real-time calculated insights: metrics computed in milliseconds
  • Real-time segmentation: on-the-fly audience evaluation
  • Real-time actions: immediate flow triggers or external channel activation

Data Cloud uses consumption-based pricing. All actions consume credits from a unified credit pool.

ActionCredits per Million RowsNotes
Data ingestion (batch, external)2,000Salesforce org-to-org ingestion is free as of August 2025
Data ingestion (streaming)5,0002.5x batch cost; use only when latency demands it
Identity resolution100,000Most expensive operation by far
Calculated insights (batch)15Very efficient for periodic metrics
Calculated insights (streaming)80053x batch cost
Data queries2Cheapest operation
Segmentation20Per million rows evaluated
Activation (batch)10Pushing segments to targets
Activation (streaming DMO)1,600Real-time activation is expensive
Zero-copy federation7035x cheaper than batch ingest

Pricing reference: 100,000 credits cost $500. Sandbox environments get a 20% discount on credit multipliers.


Scenario 1: Unified Customer 360 for Omnichannel Retail

Section titled “Scenario 1: Unified Customer 360 for Omnichannel Retail”

Situation: Retailer with separate systems for e-commerce (Commerce Cloud), in-store POS, loyalty program, and customer service (Service Cloud). No unified view of the customer.

Data Cloud solution: Ingest from all four sources via data streams. Map to standard Individual DMO. Run identity resolution to merge the same customer across channels (email from loyalty, phone from POS, cookie ID from web). Create calculated insights for CLV, channel preference, and churn risk. Activate segments to Marketing Cloud for personalized campaigns and to Service Cloud for proactive case routing.

Why not traditional MDM: Traditional MDM would require an external platform (Informatica, Reltio), ETL pipelines to Salesforce, and ongoing sync maintenance. Data Cloud provides native identity resolution and activation without middleware.

Scenario 2: B2B Account Intelligence with Zero-Copy

Section titled “Scenario 2: B2B Account Intelligence with Zero-Copy”

Situation: Enterprise SaaS company with product usage data in Snowflake (billions of rows), CRM data in Sales Cloud, and support data in Service Cloud. Leadership wants account health scores visible to sales reps.

Data Cloud solution: Zero-copy federation to Snowflake for product usage data (avoids ingesting billions of rows). Ingest CRM and Service Cloud data natively (free Salesforce-to-Salesforce ingestion). Create calculated insights for account health score combining usage, support ticket trends, and renewal dates. Surface scores on Account records via Data Cloud Related Lists. Trigger flows when health score drops below threshold.

Why not ETL: Ingesting billions of usage rows would consume massive credits (2,000 per million = 2M credits for 1B rows). Zero-copy queries the data in place for 70 credits per million.

Scenario 3: Real-Time Personalization for Financial Services

Section titled “Scenario 3: Real-Time Personalization for Financial Services”

Situation: Bank wants to personalize digital experiences in real-time, showing relevant product offers when a customer logs into the mobile app based on transaction history, life events, and segment membership.

Data Cloud solution: Ingest transaction data via streaming. Real-time identity resolution matches the authenticated user to their unified profile. Real-time calculated insights compute product eligibility scores. Real-time segmentation evaluates membership in offer audiences. Data actions push personalized offer payload to the mobile app in milliseconds.

Why Data Cloud over custom build: Building real-time identity resolution, segmentation, and activation from scratch requires significant engineering. Data Cloud provides this declaratively with sub-second latency.


Decision Guide: Data Cloud vs Alternatives

Section titled “Decision Guide: Data Cloud vs Alternatives”
FactorData CloudTraditional ETL/MDMSalesforce Connect
Best forUnified customer view, analytics, AITransactional data syncRead-only external data access
Data volumeBillions of recordsMillions of recordsAny (query-time)
Identity resolutionNative (match + reconcile)Requires separate MDM toolNot available
LatencyNear-real-time to real-timeReal-time to batchReal-time (per query)
Query modelSQL on lakehouseSOQL on platform objectsLimited SOQL subset
Cost modelCredits (consumption)Integration tool licenseSalesforce Connect license
AI/ML readinessNative (Einstein, Agentforce)Requires separate data pipelineNot applicable
ActivationNative to Marketing, CRM, CommerceCustom integration neededNot applicable

IndustryData Cloud PatternKey Data Sources
HealthcarePatient 360 - unified patient journey across EMR, appointments, wearablesEMR systems, IoT health trackers, scheduling platforms
Financial ServicesClient 360 - transaction history, credit data, relationship insightsCore banking, credit bureaus, wealth platforms
RetailCustomer 360 - omnichannel purchase, loyalty, browsing behaviorPOS, e-commerce, loyalty, clickstream
ManufacturingAsset 360 - IoT sensor data, service history, predictive maintenanceIoT platforms, ERP, field service

Each industry uses pre-built Data Cloud templates with industry-specific DMOs, reducing implementation time.



  • Data Modeling: DMO design follows CRM data modeling principles with a canonical schema approach
  • External Data: Data Cloud is an alternative to Salesforce Connect; zero-copy extends external data access further
  • Large Data Volumes: Data Cloud handles billions of records without LDV governor limit concerns
  • Data Quality & Governance: identity resolution and data spaces are governance mechanisms
  • Integration Patterns: Data Cloud complements integration middleware; data streams are an ingestion pattern
  • Sharing Model: data spaces and field-level masking enforce data governance and access control
  • Licensing: Data Cloud requires separate licensing with consumption-based credit pricing

Personal study notes for the Salesforce CTA exam. Content compiled from VJ's study notes, official Salesforce documentation, community sources, and online publicly available content, then organized and presented with AI assistance. Not affiliated with Salesforce. © 2025–2026 VJ Srivastava.