Saturday, August 25, 2018

Data warehouse design methods



In addition to Inmon's top-down approach to data warehouses and Kimball's bottom-up method, some organizations have also adopted hybrid options.
  • Top-down approach: Inmon's method calls for building the data warehouse first. Data is extracted from operational and possibly third-party external systems and may be validated in a staging area before being integrated into a normalized data model. Data marts are created from the data stored in the data warehouse.


  • Bottom-up method: Kimball's data warehousing architecture calls for dimensional data marts to be created first. Data is extracted from operational systems, moved to a staging area and modeled into a star schema design, with one or more fact tables connected to one or more dimensional tables. The data is then processed and loaded into data marts, each of which focuses on a specific business process. Data marts are integrated using a data warehouse bus architecture to form an enterprise data warehouse.



  • Hybrid method: Hybrid approaches to data warehouse design include aspects from both the top-down and bottom-up methods. Organizations often seek to combine the speed of the bottom-up approach with the integration achieved in a top-down design.

Data warehouses vs. databases vs. data lakes

Databases and data lakes are often confused with data warehouses, but there are important differences.
While data warehouses typically store data from multiple sources and utilize predefined schemas designed for data analytics, a database is generally used to capture and store data from a single source, such as a transactional system, and its schema is normalized. Databases aren't designed to run across very large data sets.

2 comments: