Integrity, by definition, is the state of being whole and undivided. Data integrity is the ability of data to provide a complete view.

Companies collect a lot of data through their transactional IT systems. They also have access to loads of data that gets created on the internet by humans or machines. But when companies try to leverage this data for various business applications, they find that data is often fragmented and incomplete.

Data is nothing but a reflection of things and events that happen in the real world, and to be able to get a complete view of these real world events, companies often need to integrate data from various standalone applications and other external data sources.

Data integration is an important aspect of enterprise data management. It is inherently a complex process and involves many activities.

Data Consolidation

Data consolidation involves consolidating together data from various standalone applications and creating a homogeneous view. It also involves standardizing and deduplicating data across applications so that all users across the organization get a consistent view of the data.

Reconciliation checks

Data integration involves moving of data across applications. There are always chances that data might get leaked during the process of integrating it. Hence reconciliation checks are performed to ensure that there is no data leakage.

Audit entries

Audit entries are created when companies collect, store and integrate data across applications, to ensure that there is a record of updates that happen to data during the process of integration. This is necessary to troubleshoot issues when reconciliation checks blow up.

Referential checks

When data is stored in relational databases, it is often normalised so reduce data redundancy. When data elements are stored in different tables, referential checks are enforced across the tables to ensure integrity of data.

Source traceability

When data across applications is integrated together to provide a consolidated holistic view, it is necessary to provision the facility to trace data back to the system or application from which it was originated. This helps in understanding the provenance of data and judge its authenticity. It also helps in understanding the appropriateness of applying that data to specific use cases.