Data loading techniques are integral to data management competencies. In a data driven organization, enterprise data needs to be available to all users, when they need it. In order to make this happen, data load strategies are designed for every application that uses enterprise data.

The data loading techniques can be broadly classified into the following buckets, based on how data loads happen:

Bulk Loads

Here, the entire data is loaded every time when data loads happen. Bulk loads are optimised to load huge volumes of data efficiently. This technique of loading data is deployed when new data gets created in large volume and speed.

Incremental loads

Here, only new data or changes to the previous data gets loaded. Incremental loads include the logic to detect new data or changed data in order to keep the data processing volume to the minimum. This technique of loading data is deployed when new data or changes to the old data happen infrequently.


 

The data loading techniques can also be broadly classified into the following buckets, based on when data loads happen:

Push techniques

These are scheduled batch jobs to load data into a data store at a specific time. The application is generally not used during the time window when data gets loaded, as it may give inconsistent results. The push strategy can be deployed on a daily basis to ensure latest reflection of data within the data store.

Pull techniques

These are batch jobs that run on demand, whenever the users feel the need to refresh data within the application. The pull method of data loads is usually done in cases when data refresh is not frequently required. Common instances of pull strategy is during data migrations or complex data analysis activities. Pull processes are also deployed in the form of lazy evaluations when using big data technologies like spark.

Streaming  techniques

These are real time data loads that happen on a continuous basis, as the data gets generated in the source. Data streaming is used in applications where data is continuously monitored for instant gratification, frauds etc or for data integration requirements across various disparate enterprise applications.