Skip to main content

What is a Data Quality?


A measure of how accurately and completely your analytics data reflects real user behaviour. High data quality means events fire when and only when they should, properties contain the right values and types, and coverage is consistent across platforms. Data quality degrades gradually and silently - a missing required property, a type mismatch between platforms, a deprecated event that keeps firing - until dashboards stop reflecting reality.

Why data quality matters

Bad data does not announce itself. A dashboard built on events with missing properties, inconsistent types, or silent gaps looks exactly like a dashboard built on clean data. Teams make product decisions, allocate resources, and report metrics to leadership based on numbers they assume are correct. When data quality is low, those decisions are based on fiction.

The damage compounds because analytics data is rarely consumed raw. It flows through funnels, cohorts, retention curves, and A/B test calculations. A single event with a broken property can distort every metric that touches it. Fixing the event after the fact does not fix the historical data already in the warehouse.

Data quality is not a one-time project. It is a continuous discipline. Every code deploy, every new feature, and every platform update can introduce regressions. Teams that treat data quality as a launch checklist item discover problems months later. Teams that monitor it continuously catch issues within hours.

How it works in practice

Data quality has four dimensions that matter most for product analytics. Accuracy means events fire when and only when the intended action occurs. Completeness means every required property is present and every platform sends the event. Consistency means the same event has the same name, properties, and types across web, iOS, and Android. Timeliness means events arrive in the analytics platform without excessive delay.

Measuring these dimensions requires active monitoring. Track event volume over time to detect sudden drops (broken implementation) or spikes (event firing in a loop). Monitor property fill rates to catch required properties that stop arriving. Compare event counts across platforms to find implementation gaps. Use event validation to reject or flag events that do not conform to their schema.

Improving data quality starts at the source. Define every event in a structured schema. Use validation to enforce schemas at collection time. Run a regular analytics audit to identify event drift and coverage gaps. Use a codebase scanner to compare what is documented in the tracking plan against what is actually implemented in code.

Common mistakes

  • Assuming data is correct because dashboards show numbers. A dashboard will render any data you feed it. The presence of data does not mean the data is accurate. Validate events against their schemas and monitor volume trends to catch silent regressions.
  • Checking quality only at launch. Data quality degrades continuously as code changes, features evolve, and new platforms ship. Build ongoing monitoring into your analytics workflow, not a one-time QA pass.
  • Ignoring cross-platform inconsistencies. Web, iOS, and Android teams often implement the same event independently. Without schema enforcement, property names, types, and enum values diverge across platforms, making cross-platform analysis unreliable.
  • Treating missing data as zero. When a property is absent from an event payload, it does not mean the value is zero or null. It means the implementation is broken. Set up alerts for property fill rate drops instead of silently defaulting missing values.

Frequently asked questions

How do you measure data quality?

Track four key signals: volume trends (are event counts stable day over day?), property fill rates (are required properties present in 100% of payloads?), schema conformance (do property values match their defined types?), and platform parity (are event counts proportional across web, iOS, and Android?). Sudden changes in any of these signals indicate a quality regression.

What causes bad data quality?

The most common causes are manual instrumentation errors (typos in event names or property keys), missing schema enforcement (no validation between code and the tracking plan), uncoordinated platform changes (updating an event on web but not mobile), and deprecated events that keep firing. All of these are preventable with structured schemas, code generation, and continuous monitoring.

How do you prevent data quality problems?

Prevention has three layers. First, define every event in a structured schema with explicit types and required fields. Second, enforce schemas through event validation at collection time or compile-time type checking via code generation. Third, monitor continuously with volume alerts, fill rate tracking, and periodic audits. Each layer catches problems the others miss.

What is the relationship between data quality and event drift?

Event drift is one of the primary causes of data quality degradation. Drift happens when the actual implementation of an event diverges from its documented schema over time. Properties get renamed, types change, new values appear in enums, and platforms fall out of sync. Monitoring for drift is a core part of maintaining data quality.

Put these concepts into practice with Ordaze.

Try Ordaze free