Data Cloud Architecture

Salesforce Data Cloud (rebranded to Data 360 in October 2025) is a native lakehouse platform that ingests, harmonizes, unifies, and activates customer data at scale. For CTAs, Data Cloud represents the platform answer to Customer 360, replacing traditional ETL/MDM approaches with a metadata-driven, consumption-based data platform. This page covers the architecture in depth; for basic comparisons with external data options, see External Data.

Architecture Pipeline

Data Cloud processes data through six stages. Each stage transforms raw source data into unified, actionable customer insights.

Figure 1. Data Cloud’s six-stage pipeline expands on the four-stage summary: raw Data Stream Objects pass through cleaning and typing in the Prepare stage before mapping to canonical Data Model Objects. Identity resolution then runs on modeled data, ensuring unified profiles are built from clean, harmonized inputs rather than raw source records.

Reactive processing: Data Cloud does not poll for changes. Storage Native Change Events (SNCE) detect every write operation via atomic metadata pointer swaps, and Change Data Feed (CDF) identifies exactly which records changed, enabling incremental downstream processing.

Data Object Hierarchy

Understanding the DSO-DLO-DMO progression is foundational. Each layer serves a different purpose.

Layer	Object	Storage	Purpose
Raw	Data Stream Object (DSO)	Materialized (Parquet/Iceberg)	Raw ingested data, schema as-is from source
Prepared	Data Lake Object (DLO)	Materialized (Parquet/Iceberg)	Cleaned, transformed data in the lakehouse
Modeled	Data Model Object (DMO)	Physical and virtual views	Harmonized canonical model mapped to Customer 360 schema

Figure 2. Each source system produces a Data Stream Object preserving the raw schema as-is. Transforms in the Prepare stage clean and type the data into Data Lake Objects. Multiple DLOs then map to a single canonical DMO (the Unified Individual), which becomes the basis for identity resolution and segmentation.

Data Spaces

Data Spaces provide logical partitions within a single Data Cloud instance, separating data by brand, region, department, or SDLC stage without provisioning separate orgs.

Data Sources, Data Streams, and DLOs can be shared across Data Spaces
DMOs and platform features (segments, activations) are isolated per Data Space
Permission Sets control read/write/admin access per Data Space
All operations are audit-logged for compliance

Identity Resolution

Identity resolution links records about the same entity across sources into a single Unified Profile. It uses a ruleset containing match rules and reconciliation rules.

Three-Stage Process

Figure 3. Identity resolution runs in three stages. Matching uses blocking keys to narrow candidates before scoring with exact and fuzzy rules. Clustering applies transitive matching: if A matches B and B matches C, all three merge into one cluster. Reconciliation then selects the winning field value per source priority or recency rules.

Stage	What Happens	Details
Matching	Candidate pairs identified	Blocking keys narrow the search space; Locality Sensitive Hashing (LSH) finds candidates; exact and fuzzy (probabilistic) match rules score similarity
Clustering	Related matches grouped	Transitive matching: if A=B and B=C, then A=B=C form one cluster
Reconciliation	Winner selected per field	When multiple sources provide the same field (e.g., email), reconciliation rules pick the winner based on source priority, recency, or completeness

Match Rule Types

Type	How It Works	Example
Exact	Field values must be identical	Email = Email
Fuzzy	Probabilistic similarity scoring	”Jon Smith” matches “John Smith” using phonetic/semantic algorithms
Normalized	Pre-processing before comparison	Strip whitespace, lowercase, remove special characters

Individual ID Graph

The ID graph connects all known identifiers for a person: email addresses, phone numbers, device IDs, loyalty IDs, and account usernames. Each Unified Profile has its own subgraph.

Entity Types

Identity resolution supports multiple entity types beyond individuals:

Individual: B2C customer profiles
Account: B2B company profiles
Household: Grouped individuals sharing an address or relationship
Cross-entity: Linking individuals to accounts and households

Calculated Insights

Calculated insights are derived metrics computed via ANSI SQL or a visual declarative builder. They surface aggregated intelligence on unified data.

Aspect	Detail
Language	ANSI SQL or visual builder (no-code)
Input	DMOs, other calculated insights
Output	Metrics attached to profiles (e.g., Lifetime Value, RFM score, Engagement Score)
Materialization	Batch (periodic refresh) or streaming (continuous)
Surfacing	Available on CRM records, in segments, in flows, and via API

Common calculated insight patterns:

Customer Lifetime Value (CLV) - sum of historical purchases
Recency-Frequency-Monetary (RFM) scoring
Engagement score - weighted sum of interactions across channels
Product affinity - category preferences from purchase/browse history
Churn risk - days since last interaction thresholds

Segmentation and Activation

Segments

Segments are audiences built from unified profiles and calculated insights. They define “who” to target.

Built using a drag-and-drop segment builder or SQL
Can reference DMO attributes, calculated insight values, and engagement data
Support nested logic (AND/OR), exclusions, and time-based filters
Recalculated on a schedule or near-real-time

Activation Targets

Activation pushes segments to downstream systems for action.

Target Category	Examples
Marketing	Marketing Cloud Engagement, Google Ads, Meta Ads, Amazon Ads
CRM	Data actions to Salesforce flows, platform events, record updates
Commerce	B2C Commerce Cloud for personalized storefronts
Analytics	Tableau, CRM Analytics for segment analysis
External	Any system via webhook, API, or data action

Data Actions and CRM Integration

Data actions bridge Data Cloud and CRM. They fire when conditions on a DMO or calculated insight are met.

Same-org: Data Cloud-Triggered Flows respond to DMO changes directly
Cross-org: Data actions generate Platform Events consumed by flows or Apex in another org
Related lists: Data Cloud Related Lists surface DMO data on Contact, Account, and Lead records without replicating data into CRM objects
Field enrichment: Calculated insight values can be written back to CRM fields

Zero-Copy Partner Network

Zero-copy eliminates data duplication by querying external data warehouses in place, using Apache Iceberg table format and Parquet files.

How It Works

Figure 4. Zero-copy federation queries data directly in external warehouses via the Apache Iceberg REST Catalog protocol, eliminating the need to ingest and store that data in Data Cloud. For a scenario like B2B account intelligence with billions of product usage rows in Snowflake, this avoids ingestion costs of 2,000 credits per million rows; federation costs only 70.

Bidirectional Access

Direction	Mechanism
Data Cloud queries external	SQL federation with intelligent pushdown to external engines
External queries Data Cloud	JDBC driver or Data-as-a-Service (DaaS) API for file-based sharing

When to Use vs Avoid Zero-Copy

Use Zero-Copy When	Avoid Zero-Copy When
Enterprise data lake is actively managed and governed	Source data is poorly structured or undocumented
Data is already curated and complete in the warehouse	Frequent complex transformations are needed
Avoiding duplicate pipelines across business units	Identity resolution is needed (lakes lack this)
Cost optimization - 70 credits/M rows vs free internal ingestion	Compliance requires full data lineage within Salesforce

Storage and Compute Architecture

Data Cloud runs on a lakehouse architecture built on Apache Iceberg (table format) and Apache Parquet (file format), deployed on Hyperforce (AWS).

Tiered Storage

Tier	Latency	Use Case
Main memory	Milliseconds	Real-time event processing, in-session personalization
Low Latency Store (LLS)	Sub-second	NVMe-backed durable cache for hot data
Lakehouse (S3)	Seconds	Long-term storage for DLOs, historical data, bulk queries

Real-Time Layer

The real-time layer enables sub-second personalization:

Real-time data graphs: denormalized Customer 360 profiles with pre-joined objects
Real-time ingest: millisecond-level event capture from Web/Mobile SDKs
Real-time identity resolution: exact-match only, instant unification
Real-time calculated insights: metrics computed in milliseconds
Real-time segmentation: on-the-fly audience evaluation
Real-time actions: immediate flow triggers or external channel activation

Credit Consumption Model

Data Cloud uses consumption-based pricing. All actions consume credits from a unified credit pool.

Action	Credits per Million Rows	Notes
Data ingestion (batch, external)	2,000	Salesforce org-to-org ingestion is free as of August 2025
Data ingestion (streaming)	5,000	2.5x batch cost; use only when latency demands it
Identity resolution	100,000	Most expensive operation by far
Calculated insights (batch)	15	Very efficient for periodic metrics
Calculated insights (streaming)	800	53x batch cost
Data queries	2	Cheapest operation
Segmentation	20	Per million rows evaluated
Activation (batch)	10	Pushing segments to targets
Activation (streaming DMO)	1,600	Real-time activation is expensive
Zero-copy federation	70	35x cheaper than batch ingest

Pricing reference: 100,000 credits cost $500. Sandbox environments get a 20% discount on credit multipliers.

CTA Scenario Patterns

Scenario 1: Unified Customer 360 for Omnichannel Retail

Situation: Retailer with separate systems for e-commerce (Commerce Cloud), in-store POS, loyalty program, and customer service (Service Cloud). No unified view of the customer.

Data Cloud solution: Ingest from all four sources via data streams. Map to standard Individual DMO. Run identity resolution to merge the same customer across channels (email from loyalty, phone from POS, cookie ID from web). Create calculated insights for CLV, channel preference, and churn risk. Activate segments to Marketing Cloud for personalized campaigns and to Service Cloud for proactive case routing.

Why not traditional MDM: Traditional MDM would require an external platform (Informatica, Reltio), ETL pipelines to Salesforce, and ongoing sync maintenance. Data Cloud provides native identity resolution and activation without middleware.

Scenario 2: B2B Account Intelligence with Zero-Copy

Situation: Enterprise SaaS company with product usage data in Snowflake (billions of rows), CRM data in Sales Cloud, and support data in Service Cloud. Leadership wants account health scores visible to sales reps.

Data Cloud solution: Zero-copy federation to Snowflake for product usage data (avoids ingesting billions of rows). Ingest CRM and Service Cloud data natively (free Salesforce-to-Salesforce ingestion). Create calculated insights for account health score combining usage, support ticket trends, and renewal dates. Surface scores on Account records via Data Cloud Related Lists. Trigger flows when health score drops below threshold.

Why not ETL: Ingesting billions of usage rows would consume massive credits (2,000 per million = 2M credits for 1B rows). Zero-copy queries the data in place for 70 credits per million.

Scenario 3: Real-Time Personalization for Financial Services

Situation: Bank wants to personalize digital experiences in real-time, showing relevant product offers when a customer logs into the mobile app based on transaction history, life events, and segment membership.

Data Cloud solution: Ingest transaction data via streaming. Real-time identity resolution matches the authenticated user to their unified profile. Real-time calculated insights compute product eligibility scores. Real-time segmentation evaluates membership in offer audiences. Data actions push personalized offer payload to the mobile app in milliseconds.

Why Data Cloud over custom build: Building real-time identity resolution, segmentation, and activation from scratch requires significant engineering. Data Cloud provides this declaratively with sub-second latency.

Decision Guide: Data Cloud vs Alternatives

Factor	Data Cloud	Traditional ETL/MDM	Salesforce Connect
Best for	Unified customer view, analytics, AI	Transactional data sync	Read-only external data access
Data volume	Billions of records	Millions of records	Any (query-time)
Identity resolution	Native (match + reconcile)	Requires separate MDM tool	Not available
Latency	Near-real-time to real-time	Real-time to batch	Real-time (per query)
Query model	SQL on lakehouse	SOQL on platform objects	Limited SOQL subset
Cost model	Credits (consumption)	Integration tool license	Salesforce Connect license
AI/ML readiness	Native (Einstein, Agentforce)	Requires separate data pipeline	Not applicable
Activation	Native to Marketing, CRM, Commerce	Custom integration needed	Not applicable

Industry-Specific Patterns

Industry	Data Cloud Pattern	Key Data Sources
Healthcare	Patient 360 - unified patient journey across EMR, appointments, wearables	EMR systems, IoT health trackers, scheduling platforms
Financial Services	Client 360 - transaction history, credit data, relationship insights	Core banking, credit bureaus, wealth platforms
Retail	Customer 360 - omnichannel purchase, loyalty, browsing behavior	POS, e-commerce, loyalty, clickstream
Manufacturing	Asset 360 - IoT sensor data, service history, predictive maintenance	IoT platforms, ERP, field service

Each industry uses pre-built Data Cloud templates with industry-specific DMOs, reducing implementation time.

Gotchas and Anti-Patterns

Data Modeling: DMO design follows CRM data modeling principles with a canonical schema approach
External Data: Data Cloud is an alternative to Salesforce Connect; zero-copy extends external data access further
Large Data Volumes: Data Cloud handles billions of records without LDV governor limit concerns
Data Quality & Governance: identity resolution and data spaces are governance mechanisms
Integration Patterns: Data Cloud complements integration middleware; data streams are an ingestion pattern
Sharing Model: data spaces and field-level masking enforce data governance and access control
Licensing: Data Cloud requires separate licensing with consumption-based credit pricing

Sources

Personal study notes for the Salesforce CTA exam. Content compiled from VJ's study notes, official Salesforce documentation, community sources, and online publicly available content, then organized and presented with AI assistance. Not affiliated with Salesforce. © 2025–2026 VJ Srivastava.