Data Quality & Governance
Data quality and governance are ongoing disciplines, not one-time activities. Poor data quality undermines every downstream system: reports lie, integrations fail, and users lose trust in the platform.
Data Profiling
Section titled “Data Profiling”Profiling is the first step in understanding the data. It applies to both migration scenarios and ongoing data health monitoring.
Profiling Dimensions
Section titled “Profiling Dimensions”| Dimension | What to Measure | Red Flags |
|---|---|---|
| Completeness | % of required fields populated | Key fields < 80% populated |
| Accuracy | Values match real-world reality | Stale addresses, wrong phone formats |
| Consistency | Same data represented the same way | ”US” vs “USA” vs “United States” |
| Uniqueness | No unintended duplicates | Duplicate accounts, contacts |
| Timeliness | Data is current enough for its use | Last modified > 2 years ago |
| Validity | Values conform to expected formats | Dates in text fields, invalid emails |
Profiling Tools
Section titled “Profiling Tools”- Salesforce Reports: Record counts, field completeness via formula fields
- Data Loader exports: Export and analyze in Excel/Python for pattern detection
- Third-party tools: Informatica Data Quality, DemandTools, Validity (RingLead)
- Einstein Analytics / CRM Analytics (formerly Tableau CRM): Dashboard-based data quality monitoring
- Apex scripts: Custom profiling for complex business rules
Deduplication
Section titled “Deduplication”Duplicates are the most visible data quality problem. Salesforce provides native deduplication tools, but a CTA must design a broader strategy.
Native Salesforce Dedup
Section titled “Native Salesforce Dedup”Matching Rules
Section titled “Matching Rules”Matching rules define when two records are considered potential duplicates.
| Component | Description |
|---|---|
| Matching method | Exact or Fuzzy |
| Matching criteria | Fields to compare (Name, Email, Phone, Address) |
| Match key | Combination of fields that trigger comparison |
| Blank fields | How to handle nulls (match or skip) |
Standard matching rules exist for Account, Contact, and Lead. Custom matching rules can be created for any object.
Duplicate Rules
Section titled “Duplicate Rules”Duplicate rules define what happens when a match is found:
| Action | Effect |
|---|---|
| Alert | Warn the user but allow save |
| Block | Prevent the record from being saved |
| Report | Log the duplicate for later review |
Third-Party Deduplication
Section titled “Third-Party Deduplication”For enterprise-scale deduplication, native tools may fall short:
| Tool | Capability |
|---|---|
| DemandTools (Validity) | Mass dedup, merge, standardization |
| Cloudingo | Automated dedup with scheduling |
| RingLead | Real-time and batch dedup |
| Informatica | Enterprise MDM with fuzzy matching |
| DupeCatcher | Free AppExchange duplicate prevention |
Dedup Strategy Layers
Section titled “Dedup Strategy Layers”Master Data Management
Section titled “Master Data Management”MDM ensures that critical business entities (customers, products, employees) have a single, authoritative source of truth across all systems.
MDM Approaches
Section titled “MDM Approaches”| Approach | Description | When to Use |
|---|---|---|
| Registry | Each system maintains its own copy; a central registry maps IDs | Low integration maturity, many legacy systems |
| Consolidation | Data is copied to a master hub for reporting, not written back | Read-only analytics, data warehouse model |
| Coexistence | Multiple systems share and synchronize master data | Multiple systems of record per entity |
| Centralized | One system is the master; others are consumers | Clear system of record exists (e.g., Salesforce for customers) |
Salesforce as MDM Hub
Section titled “Salesforce as MDM Hub”Salesforce can serve as the master for customer data (Account, Contact) but is rarely the right choice for all entity types:
| Entity | Salesforce as Master? | Notes |
|---|---|---|
| Customer (B2B) | Often yes | Account/Contact is natural fit |
| Customer (B2C) | Sometimes | Person Accounts or Data Cloud |
| Product | Sometimes | CPQ scenarios; otherwise ERP |
| Employee | Rarely | HR systems (Workday, SAP HCM) are better fit |
| Financial data | No | ERP is the master |
| Inventory | No | ERP/WMS is the master |
Data Lifecycle Management
Section titled “Data Lifecycle Management”Every record has a lifecycle. Design for the full journey, not just creation.
Lifecycle Stages
Section titled “Lifecycle Stages”Create
Section titled “Create”- Define data entry standards (required fields, validation rules, dependent picklists)
- Integration-created records need quality controls (field mapping validation, dedup)
- Bulk imports need pre-load quality checks
Maintain
Section titled “Maintain”- Ongoing enrichment (address verification, firmographic data)
- Periodic deduplication scans
- Data steward reviews and corrections
- Automation to flag stale records (e.g., Account not modified in 12 months)
Archive
Section titled “Archive”- Move aged data to Big Objects, external storage, or Data Cloud
- Maintain reference access for compliance
- See Large Data Volumes for archival strategies
Delete
Section titled “Delete”- Soft delete: Records go to Recycle Bin (recoverable for 15 days)
- Hard delete: Permanent removal (Bulk API with hardDelete option)
- GDPR right to erasure: Must be able to permanently delete all personal data for a data subject
- Document deletion policies and audit trails
Data Retention Policies
Section titled “Data Retention Policies”Retention policies define how long different data types must be kept. Business requirements, legal obligations, and compliance mandates drive these decisions.
Designing Retention Policies
Section titled “Designing Retention Policies”| Data Category | Typical Retention | Driving Factor |
|---|---|---|
| Active customer records | Indefinite while customer active | Business need |
| Closed opportunities (won) | 5-7 years | Financial audit |
| Closed opportunities (lost) | 1-2 years | Sales analytics |
| Support cases | 3-5 years | Service quality, legal |
| Email messages | 1-3 years | Communication audit |
| Audit trail (field history) | 18-24 months on-platform | Compliance |
| Task/Event activities | 1-2 years | Business need |
| Debug/error logs | 30-90 days | Operational |
Data Classification Framework
Section titled “Data Classification Framework”Data classification drives encryption, access control, retention, and compliance decisions. Establish classification tiers during data model design, not as a retrofit.
| Tier | Examples | Security Controls |
|---|---|---|
| Restricted | SSN, credit card, health records | Shield Encryption, FLS, audit trail, right-to-erasure |
| Confidential | Salary, revenue, pricing strategy | FLS, sharing rules, data masking in sandboxes |
| Internal | Employee IDs, internal notes | Role-based access, standard sharing model |
| Public | Product names, company address | Portal/community visible, no restrictions |
Data Governance Process Flow
Section titled “Data Governance Process Flow”Governance is not a one-time setup. It is an ongoing operational process with defined roles, cadences, and escalation paths.
Data Stewardship Model
Section titled “Data Stewardship Model”Data stewardship assigns accountability for data quality to specific people or roles.
Stewardship Roles
Section titled “Stewardship Roles”| Role | Responsibility |
|---|---|
| Data Owner | Business executive accountable for data quality decisions |
| Data Steward | Hands-on responsibility for monitoring and correcting data |
| Data Custodian | Technical team managing data storage, security, and access |
| Data Consumer | End users who rely on data quality for their work |
Stewardship Processes
Section titled “Stewardship Processes”- Regular data quality reviews: Monthly or quarterly steward reviews of quality dashboards
- Issue resolution workflow: Process for reporting and fixing data quality issues
- Change management: Stewards approve changes to data standards, picklist values, record types
- Training: Ongoing user training on data entry standards
Compliance
Section titled “Compliance”GDPR and Data Privacy
Section titled “GDPR and Data Privacy”The General Data Protection Regulation (and similar privacy laws) imposes specific requirements on Salesforce data architecture:
| GDPR Right | Salesforce Implementation |
|---|---|
| Right to access | Data export, reports, customer portals |
| Right to rectification | Standard edit capabilities, community self-service |
| Right to erasure | Hard delete, field-level encryption with key destruction |
| Right to portability | Data export in machine-readable format (CSV, JSON) |
| Right to restriction | Record-level flags, process exclusion logic |
| Consent management | Custom objects or Salesforce Privacy Center |
Data Residency
Section titled “Data Residency”Some regulations require data to remain within specific geographic boundaries:
- Salesforce data residency: Data stored in the instance region (NA, EU, AP)
- Hyperforce: Enables deployment in specific public cloud regions
- Encryption: Salesforce Shield Platform Encryption with Bring Your Own Key (BYOK)
- Cross-border transfers: Documented in Salesforce’s Data Processing Addendum
Salesforce Shield
Section titled “Salesforce Shield”Shield provides compliance-focused features:
| Feature | Purpose |
|---|---|
| Platform Encryption | Encrypt data at rest for sensitive fields |
| Event Monitoring | Track user behavior and API activity |
| Field Audit Trail | Retain field history beyond standard 18-month limit (up to 10 years) |
Data Quality Metrics Dashboard
Section titled “Data Quality Metrics Dashboard”Design a data quality dashboard that stewards review regularly:
| Metric | Measurement | Target |
|---|---|---|
| Duplicate rate | % of records flagged as duplicates | < 5% |
| Completeness score | Avg % of required fields populated | > 90% |
| Stale records | Records not modified in 12+ months | < 20% of active records |
| Orphan records | Child records with broken lookups | 0% |
| Invalid values | Records failing validation logic | < 2% |
| Integration errors | Failed integration record creates/updates | < 1% |
Related Topics
Section titled “Related Topics”- Shield Encryption: data classification drives encryption decisions and field-level security
- Sharing Model: data sensitivity tiers influence OWD settings and sharing rules
- Integration Patterns: data quality at integration boundaries affects reliability and error rates
- Data Migration: pre-migration profiling is a data quality exercise
- Large Data Volumes: archival is both an LDV strategy and a governance activity
- Development Lifecycle: data governance is part of organizational change management
- Declarative vs Programmatic: validation rules and duplicate rules are declarative quality controls
Sources
Section titled “Sources”- Salesforce Architects: Data 360 Architecture
- Salesforce Architect: Well-Architected Framework - Compliant
- Salesforce Help: Duplicate Rules Overview
- Salesforce Help: Standard Matching Rules
- Salesforce Help: Salesforce Shield
- Salesforce Help: Privacy Center
- GDPR Official Text: Chapter III (Rights of the Data Subject)
- CTA Study Guide: Data Domain - Governance
- DAMA DMBOK: Data Quality Management
Personal study notes for the Salesforce CTA exam. Content compiled from VJ's study notes, official Salesforce documentation, community sources, and online publicly available content, then organized and presented with AI assistance. Not affiliated with Salesforce. © 2025–2026 VJ Srivastava.