Data Cloud Architecture
Salesforce Data Cloud (rebranded to Data 360 in October 2025) is a native lakehouse platform that ingests, harmonizes, unifies, and activates customer data at scale. For CTAs, Data Cloud represents the platform answer to Customer 360, replacing traditional ETL/MDM approaches with a metadata-driven, consumption-based data platform. This page covers the architecture in depth; for basic comparisons with external data options, see External Data.
Architecture Pipeline
Section titled “Architecture Pipeline”Data Cloud processes data through six stages. Each stage transforms raw source data into unified, actionable customer insights.
Reactive processing: Data Cloud does not poll for changes. Storage Native Change Events (SNCE) detect every write operation via atomic metadata pointer swaps, and Change Data Feed (CDF) identifies exactly which records changed, enabling incremental downstream processing.
Data Object Hierarchy
Section titled “Data Object Hierarchy”Understanding the DSO-DLO-DMO progression is foundational. Each layer serves a different purpose.
| Layer | Object | Storage | Purpose |
|---|---|---|---|
| Raw | Data Stream Object (DSO) | Materialized (Parquet/Iceberg) | Raw ingested data, schema as-is from source |
| Prepared | Data Lake Object (DLO) | Materialized (Parquet/Iceberg) | Cleaned, transformed data in the lakehouse |
| Modeled | Data Model Object (DMO) | Physical and virtual views | Harmonized canonical model mapped to Customer 360 schema |
Data Spaces
Section titled “Data Spaces”Data Spaces provide logical partitions within a single Data Cloud instance, separating data by brand, region, department, or SDLC stage without provisioning separate orgs.
- Data Sources, Data Streams, and DLOs can be shared across Data Spaces
- DMOs and platform features (segments, activations) are isolated per Data Space
- Permission Sets control read/write/admin access per Data Space
- All operations are audit-logged for compliance
Identity Resolution
Section titled “Identity Resolution”Identity resolution links records about the same entity across sources into a single Unified Profile. It uses a ruleset containing match rules and reconciliation rules.
Three-Stage Process
Section titled “Three-Stage Process”| Stage | What Happens | Details |
|---|---|---|
| Matching | Candidate pairs identified | Blocking keys narrow the search space; Locality Sensitive Hashing (LSH) finds candidates; exact and fuzzy (probabilistic) match rules score similarity |
| Clustering | Related matches grouped | Transitive matching: if A=B and B=C, then A=B=C form one cluster |
| Reconciliation | Winner selected per field | When multiple sources provide the same field (e.g., email), reconciliation rules pick the winner based on source priority, recency, or completeness |
Match Rule Types
Section titled “Match Rule Types”| Type | How It Works | Example |
|---|---|---|
| Exact | Field values must be identical | Email = Email |
| Fuzzy | Probabilistic similarity scoring | ”Jon Smith” matches “John Smith” using phonetic/semantic algorithms |
| Normalized | Pre-processing before comparison | Strip whitespace, lowercase, remove special characters |
Individual ID Graph
Section titled “Individual ID Graph”The ID graph connects all known identifiers for a person: email addresses, phone numbers, device IDs, loyalty IDs, and account usernames. Each Unified Profile has its own subgraph.
Entity Types
Section titled “Entity Types”Identity resolution supports multiple entity types beyond individuals:
- Individual: B2C customer profiles
- Account: B2B company profiles
- Household: Grouped individuals sharing an address or relationship
- Cross-entity: Linking individuals to accounts and households
Calculated Insights
Section titled “Calculated Insights”Calculated insights are derived metrics computed via ANSI SQL or a visual declarative builder. They surface aggregated intelligence on unified data.
| Aspect | Detail |
|---|---|
| Language | ANSI SQL or visual builder (no-code) |
| Input | DMOs, other calculated insights |
| Output | Metrics attached to profiles (e.g., Lifetime Value, RFM score, Engagement Score) |
| Materialization | Batch (periodic refresh) or streaming (continuous) |
| Surfacing | Available on CRM records, in segments, in flows, and via API |
Common calculated insight patterns:
- Customer Lifetime Value (CLV) - sum of historical purchases
- Recency-Frequency-Monetary (RFM) scoring
- Engagement score - weighted sum of interactions across channels
- Product affinity - category preferences from purchase/browse history
- Churn risk - days since last interaction thresholds
Segmentation and Activation
Section titled “Segmentation and Activation”Segments
Section titled “Segments”Segments are audiences built from unified profiles and calculated insights. They define “who” to target.
- Built using a drag-and-drop segment builder or SQL
- Can reference DMO attributes, calculated insight values, and engagement data
- Support nested logic (AND/OR), exclusions, and time-based filters
- Recalculated on a schedule or near-real-time
Activation Targets
Section titled “Activation Targets”Activation pushes segments to downstream systems for action.
| Target Category | Examples |
|---|---|
| Marketing | Marketing Cloud Engagement, Google Ads, Meta Ads, Amazon Ads |
| CRM | Data actions to Salesforce flows, platform events, record updates |
| Commerce | B2C Commerce Cloud for personalized storefronts |
| Analytics | Tableau, CRM Analytics for segment analysis |
| External | Any system via webhook, API, or data action |
Data Actions and CRM Integration
Section titled “Data Actions and CRM Integration”Data actions bridge Data Cloud and CRM. They fire when conditions on a DMO or calculated insight are met.
- Same-org: Data Cloud-Triggered Flows respond to DMO changes directly
- Cross-org: Data actions generate Platform Events consumed by flows or Apex in another org
- Related lists: Data Cloud Related Lists surface DMO data on Contact, Account, and Lead records without replicating data into CRM objects
- Field enrichment: Calculated insight values can be written back to CRM fields
Zero-Copy Partner Network
Section titled “Zero-Copy Partner Network”Zero-copy eliminates data duplication by querying external data warehouses in place, using Apache Iceberg table format and Parquet files.
How It Works
Section titled “How It Works”Bidirectional Access
Section titled “Bidirectional Access”| Direction | Mechanism |
|---|---|
| Data Cloud queries external | SQL federation with intelligent pushdown to external engines |
| External queries Data Cloud | JDBC driver or Data-as-a-Service (DaaS) API for file-based sharing |
When to Use vs Avoid Zero-Copy
Section titled “When to Use vs Avoid Zero-Copy”| Use Zero-Copy When | Avoid Zero-Copy When |
|---|---|
| Enterprise data lake is actively managed and governed | Source data is poorly structured or undocumented |
| Data is already curated and complete in the warehouse | Frequent complex transformations are needed |
| Avoiding duplicate pipelines across business units | Identity resolution is needed (lakes lack this) |
| Cost optimization - 70 credits/M rows vs free internal ingestion | Compliance requires full data lineage within Salesforce |
Storage and Compute Architecture
Section titled “Storage and Compute Architecture”Data Cloud runs on a lakehouse architecture built on Apache Iceberg (table format) and Apache Parquet (file format), deployed on Hyperforce (AWS).
Tiered Storage
Section titled “Tiered Storage”| Tier | Latency | Use Case |
|---|---|---|
| Main memory | Milliseconds | Real-time event processing, in-session personalization |
| Low Latency Store (LLS) | Sub-second | NVMe-backed durable cache for hot data |
| Lakehouse (S3) | Seconds | Long-term storage for DLOs, historical data, bulk queries |
Real-Time Layer
Section titled “Real-Time Layer”The real-time layer enables sub-second personalization:
- Real-time data graphs: denormalized Customer 360 profiles with pre-joined objects
- Real-time ingest: millisecond-level event capture from Web/Mobile SDKs
- Real-time identity resolution: exact-match only, instant unification
- Real-time calculated insights: metrics computed in milliseconds
- Real-time segmentation: on-the-fly audience evaluation
- Real-time actions: immediate flow triggers or external channel activation
Credit Consumption Model
Section titled “Credit Consumption Model”Data Cloud uses consumption-based pricing. All actions consume credits from a unified credit pool.
| Action | Credits per Million Rows | Notes |
|---|---|---|
| Data ingestion (batch, external) | 2,000 | Salesforce org-to-org ingestion is free as of August 2025 |
| Data ingestion (streaming) | 5,000 | 2.5x batch cost; use only when latency demands it |
| Identity resolution | 100,000 | Most expensive operation by far |
| Calculated insights (batch) | 15 | Very efficient for periodic metrics |
| Calculated insights (streaming) | 800 | 53x batch cost |
| Data queries | 2 | Cheapest operation |
| Segmentation | 20 | Per million rows evaluated |
| Activation (batch) | 10 | Pushing segments to targets |
| Activation (streaming DMO) | 1,600 | Real-time activation is expensive |
| Zero-copy federation | 70 | 35x cheaper than batch ingest |
Pricing reference: 100,000 credits cost $500. Sandbox environments get a 20% discount on credit multipliers.
CTA Scenario Patterns
Section titled “CTA Scenario Patterns”Scenario 1: Unified Customer 360 for Omnichannel Retail
Section titled “Scenario 1: Unified Customer 360 for Omnichannel Retail”Situation: Retailer with separate systems for e-commerce (Commerce Cloud), in-store POS, loyalty program, and customer service (Service Cloud). No unified view of the customer.
Data Cloud solution: Ingest from all four sources via data streams. Map to standard Individual DMO. Run identity resolution to merge the same customer across channels (email from loyalty, phone from POS, cookie ID from web). Create calculated insights for CLV, channel preference, and churn risk. Activate segments to Marketing Cloud for personalized campaigns and to Service Cloud for proactive case routing.
Why not traditional MDM: Traditional MDM would require an external platform (Informatica, Reltio), ETL pipelines to Salesforce, and ongoing sync maintenance. Data Cloud provides native identity resolution and activation without middleware.
Scenario 2: B2B Account Intelligence with Zero-Copy
Section titled “Scenario 2: B2B Account Intelligence with Zero-Copy”Situation: Enterprise SaaS company with product usage data in Snowflake (billions of rows), CRM data in Sales Cloud, and support data in Service Cloud. Leadership wants account health scores visible to sales reps.
Data Cloud solution: Zero-copy federation to Snowflake for product usage data (avoids ingesting billions of rows). Ingest CRM and Service Cloud data natively (free Salesforce-to-Salesforce ingestion). Create calculated insights for account health score combining usage, support ticket trends, and renewal dates. Surface scores on Account records via Data Cloud Related Lists. Trigger flows when health score drops below threshold.
Why not ETL: Ingesting billions of usage rows would consume massive credits (2,000 per million = 2M credits for 1B rows). Zero-copy queries the data in place for 70 credits per million.
Scenario 3: Real-Time Personalization for Financial Services
Section titled “Scenario 3: Real-Time Personalization for Financial Services”Situation: Bank wants to personalize digital experiences in real-time, showing relevant product offers when a customer logs into the mobile app based on transaction history, life events, and segment membership.
Data Cloud solution: Ingest transaction data via streaming. Real-time identity resolution matches the authenticated user to their unified profile. Real-time calculated insights compute product eligibility scores. Real-time segmentation evaluates membership in offer audiences. Data actions push personalized offer payload to the mobile app in milliseconds.
Why Data Cloud over custom build: Building real-time identity resolution, segmentation, and activation from scratch requires significant engineering. Data Cloud provides this declaratively with sub-second latency.
Decision Guide: Data Cloud vs Alternatives
Section titled “Decision Guide: Data Cloud vs Alternatives”| Factor | Data Cloud | Traditional ETL/MDM | Salesforce Connect |
|---|---|---|---|
| Best for | Unified customer view, analytics, AI | Transactional data sync | Read-only external data access |
| Data volume | Billions of records | Millions of records | Any (query-time) |
| Identity resolution | Native (match + reconcile) | Requires separate MDM tool | Not available |
| Latency | Near-real-time to real-time | Real-time to batch | Real-time (per query) |
| Query model | SQL on lakehouse | SOQL on platform objects | Limited SOQL subset |
| Cost model | Credits (consumption) | Integration tool license | Salesforce Connect license |
| AI/ML readiness | Native (Einstein, Agentforce) | Requires separate data pipeline | Not applicable |
| Activation | Native to Marketing, CRM, Commerce | Custom integration needed | Not applicable |
Industry-Specific Patterns
Section titled “Industry-Specific Patterns”| Industry | Data Cloud Pattern | Key Data Sources |
|---|---|---|
| Healthcare | Patient 360 - unified patient journey across EMR, appointments, wearables | EMR systems, IoT health trackers, scheduling platforms |
| Financial Services | Client 360 - transaction history, credit data, relationship insights | Core banking, credit bureaus, wealth platforms |
| Retail | Customer 360 - omnichannel purchase, loyalty, browsing behavior | POS, e-commerce, loyalty, clickstream |
| Manufacturing | Asset 360 - IoT sensor data, service history, predictive maintenance | IoT platforms, ERP, field service |
Each industry uses pre-built Data Cloud templates with industry-specific DMOs, reducing implementation time.
Gotchas and Anti-Patterns
Section titled “Gotchas and Anti-Patterns”Related Topics
Section titled “Related Topics”- Data Modeling: DMO design follows CRM data modeling principles with a canonical schema approach
- External Data: Data Cloud is an alternative to Salesforce Connect; zero-copy extends external data access further
- Large Data Volumes: Data Cloud handles billions of records without LDV governor limit concerns
- Data Quality & Governance: identity resolution and data spaces are governance mechanisms
- Integration Patterns: Data Cloud complements integration middleware; data streams are an ingestion pattern
- Sharing Model: data spaces and field-level masking enforce data governance and access control
- Licensing: Data Cloud requires separate licensing with consumption-based credit pricing
Sources
Section titled “Sources”- Salesforce Architects: Data 360 Architecture
- Salesforce Architects: Data 360 Integration Patterns
- Salesforce Architects: Data 360 Interoperability Decision Guide
- Salesforce Developers: Model Data in Data Cloud (DMO Mapping Guide)
- Salesforce Help: Calculated Insights
- Salesforce Help: Data Actions in Data Cloud
- Salesforce Press Release: Zero Copy Partner Network (April 2024)
- Salesforce Engineering: Zero Copy Real-Time Analysis
- Salesforce Blog: Real-Time Identity Resolution
- Salesforce Blog: Data Cloud Pricing Updates (Aug 2025)
- Salesforce Developers Blog: Data Cloud and Identity Resolution (Oct 2024)
- Salesforce Ben: Data Cloud Zero Copy - When and When Not to Use It
- Salesforce Ben: Data Cloud Match Rules vs Duplicate Rules
- Trailhead: Identity Resolution Rulesets
- Trailhead: Data Cloud Insights and Use Cases
- David Palencia: Data Cloud Pricing and Credit Consumption
Personal study notes for the Salesforce CTA exam. Content compiled from VJ's study notes, official Salesforce documentation, community sources, and online publicly available content, then organized and presented with AI assistance. Not affiliated with Salesforce. © 2025–2026 VJ Srivastava.