Building a data warehouse is a complex process, and involves many distinct activities that can be broadly categorised into planning, designing and engineering. The key to success, however, is synchronised efforts across various activities to manage dependencies and optimize resource utilization.
Planning & execution
Planning involves ensuring optimized and smooth implementation
- Identify the tasks that are needed to be performed.
- Identify the business, management and IT stakeholders necessary to build the data warehouse.
- Position the application and associated services to business users and ensure favourable responses, feedback and continuous engagement.
- Define budget, timelines and critical success factors.
- Define the execution strategy, roadmap, phases and milestones.
- Define workstreams like strategy, architecture, governance and engineering. Appoint workstream leads.
- Identify the teams from within the organization or external teams who will be performing the tasks. Recruit people and onboard them as needed.
- Define the roles and responsibilities, especially between the IT and Business stakeholders.
- Define projects within the program, including project scope, budget estimations and timelines, project success criteria, project team composition, project deliverables, project dependencies and execution priorities.
- Identify and fix issues and bottlenecks.
- Monitor progress of the overall execution against the defined milestones.
- Communicate execution status to all stakeholders against milestones.
- Ensure resource optimization.
- Ensure business engagement and continuous feedback from business users.
- Track budget and ensure funding. Optimize execution to fit the budget constraints.
- Manage expectations of all stakeholders.
- Ensure team motivation.
- Ensure that overall things are moving in the right direction.
A good design ensures that the data within the data warehouse is useful, secure, understandable, accessible and trustworthy. A design is built for the business users, so that it is fast, easy and convenient for them to use the data warehousing application.
- Understand the business case and identify various data requirements that will enable the business case.
- Define the data access layer which will be used by business users to access data from the data warehouse. Ensure that the definitions in the access layer are business friendly and consistent across the organization.
- Prioritize the data requirements as per the roadmap and milestones defined in the plan.
- Identify various data sources that will be used to build the data warehouse.
- Make this data available to data analysts so that they can analyze source data and define the data transformation rules for integrating and standardizing data into the data warehouse.
- Define the solution architecture for data warehouse. Ensure that it is aligned with the overall enterprise data architecture strategy.
- Define the infrastructure requirements for storing and processing data within the data warehouse. Also, define the infrastructure setup for hosting the data warehouse.
- Define the data security requirements and security implementation frameworks.
- Define data quality and data integrity frameworks to be used.
- Define data cataloging techniques to be used to make data within the data warehouse searchable, understandable and accessible. Ensure that the search results are customized for each business, based on their nomenclatures.
- Define the strategy to implement methods so that business users can trace data within the data warehouse back to the source system.
- Define data governance frameworks to be used for managing data within the data warehouse.
- Define the strategy for building master data
- Define the data modelling techniques used to store data within the data warehouse.
- Define the technologies, data staging architecture and data flow patterns to extract, store and process data.
- Define the environments to be used for building, testing and deploying the data warehousing application.
The technical implementation of the data warehouse is done as per the defined design and according to the defined plan.
- Deploy the servers, databases, ETL tools, reporting tools etc for various environments (dev/test/prod etc) on premise or in the cloud as defined in the design and according to the plan.
- Build the logical and physical data model as per the design and based on the data requirements defined in the data access layer.
- Create data warehouse tables in the databases based on the physical data model.
- Build the staging database which is the replica of the source system tables.
- Map the source data to the data warehouse.
- Define the data transformation rules, data standardization rules, data integration rules, data consolidation rules as per the design templates.
- Build ETL from source system to staging area, from staging to data warehouse, and from the data warehouse to the data access layer, and include the data transformation rules.
- Build the views in data access layers (and reports) as per the prioritization defined in the design of the access layer.
- Load bulk data (historic data) into the data warehouse and validate the data in the access layer with business users and deploy the data warehouse through incrementals as defined in the plan.
- Switch on incremental loads for each deployments to ensure regular refresh of data.
- Deploy data management tools according to the design and manage the data access, quality, integrity, security etc as per the defined data governance processes.