Data observability
Data observability is a sub-discipline within observability that focuses on the state of data and health of your data infrastructure.
Data observability enables engineers to identify problems with their data and aids in the troubleshooting process.
Key features of data observability
Some key features of data observability include:
- Logging
- Monitoring / Alerting
- Anomaly detection
- SLA tracking
These features apply to both data at rest and in transit. For example, it is critical to detect whether or not data is either sent or received in a timely manner. The frequency of updates and the volume may also be monitored.
For most organizations, data observability plays a critical role for their data pipelines. Data observability can provide insight into the performance of the data ingestion, transformation, and delivery.
Data observability vs. data quality vs. data governance
Successful engineering organizations are concerned with data observability, data quality, and data governance. While these are complementary ideas, they play distinct roles.
Data quality reflects the accuracy, consistency, or validity of the data to be used. Data observability helps achieve better data quality by surfacing problems with the pipeline or even row- or column-level profiling or validation information.
Data governance refers to the standards of how data is collected, processed, and stored for compliance or security purposes. Data observability aids in monitoring for violations of data governance and ensuring that internal policies are being followed.
DataOps and data observability tools
Given the benefits and the importance of data observability, DataOps as a discipline is emerging as a popular framework to improve projects involving data. DataOps consists of three phases:
- Detection: focuses on validating the data for quality.
- Awareness: surfaces insights about the data for governance.
- Iteration: codifies data observability to create repeatable processes and frameworks.
Modern observability tools all provide varying levels of support for data observability. At the minimum, all tools provide a way to collect, process, and organize data in a centralized manner for analysis and visualization. More mature tools also automate security and auditing measures.