Data quality dimensions describe a measurable characteristic of data and help defining data quality requirements. Use data quality dimensions to determine the expected results of data quality assessment, whether initial assessment or ongoing monitoring.
The state that you want your data to be in usually can be defined as fit for use, defect free, corresponds to specification, or meeting expectations and requirements. When you measure data quality, you compare the actual state of your data to this wanted state. The standards, expectations, and requirements that are important to your business processes are expressed as characteristics or dimensions of the data.
The Data Management Association (DAMA) International published a paper that describes 6 core dimensions of data quality:
Dimension | Description | Predefined data quality checks that identify issues associated with this dimension |
---|---|---|
Accuracy | Data values are as close as possible to real values. | None. |
Completeness | All required data values are present. | Unexpected missing values |
Consistency | Data values within a column comply with a rule. | Inconsistent capitalization Inconsistent representation of missing values Suspect values |
Timeliness | Data represents the reality from a required point in time. | None. |
Uniqueness | Distinct values appear only once. | Unexpected duplicated values |
Validity | Data conforms to the format, type, or range of its definition. | Data class violations Data type violations Format violations Values out of range |
You can create your own data quality dimensions by using the IBM Knowledge Catalog API Create a data quality dimension.
The state that you want your data to be in usually can be defined as fit for use, defect free, corresponds to specification, or meeting expectations and requirements. When you measure data quality, you compare the actual state of your data to this wanted state. The standards, expectations, and requirements that are important to your business processes are expressed as characteristics or dimensions of the data.
The Data Management Association (DAMA) International published a paper that describes 6 core dimensions of data quality: Accuracy, Completeness, Consistency, Timeliness, Uniqueness, Validity
In addition, IBM Knowledge Catalog provides the dimension Homogeneity.
All of these dimensions can be evaluated by running data quality checks as part of metadata enrichment or by running individual data quality rules.
The following table describes the data quality dimensions and lists the data quality checks in metadata enrichment that can identify issues associated with a specific dimension:
Dimension | Description | Types of data quality checks |
---|---|---|
Accuracy | Data values are as close as possible to real values. | None. |
Completeness | All required data values are present. | Completeness check |
Consistency | Data values within a column comply with a rule. | Capitalization style check Missing values representation check Referential integrity check (IBM Knowledge Catalog Premium) Suspect values check |
Homogeneity | Data is similar and consistent over time. | Historical stability (IBM Knowledge Catalog Premium) |
Timeliness | Data represents the reality from a required point in time. | None. |
Uniqueness | Distinct values appear only once. | Uniqueness check |
Validity | Data conforms to the format, type, or range of its definition. | Data class check Data type check Format check Length check Possible values check Range check Regex check |
Learn more
- Data quality analysis results
- Predefined data quality checks
- Configuring master data workflows
- IBM Knowledge Catalog API: List all data quality dimensions
- IBM Knowledge Catalog API: Create a data quality dimension
Parent topic: Managing data quality