Monday, August 13, 2007

Dataware House Basic Concepts

What is a Data Warehouse?
A Data Warehouse is the "corporate memory". Academics will say it is a subject oriented, point-in-time, inquiry only collection of operational data.
Typical relational databases are designed for on-line transactional processing (OLTP) and do not meet the requirements for effective on-line analytical processing (OLAP). As a result, data warehouses are designed differently than traditional relational databases.

What is ETL?
ETL is the Data Warehouse acquisition processes of Extracting, Transforming (or Transporting) and Loading (ETL) data from source systems into the data warehouse.
What is the difference between a data warehouse and a data mart?
There are inherent similarities between the basic constructs used to design a data warehouse and a data mart. In general a Data Warehouse is used on an enterprise level, while Data Marts is used on a business division/department level. A data mart only contains the required subject specific data for local analysis.

What is the difference between a W/H and an OLTP application?
Typical relational databases are designed for on-line transactional processing (OLTP) and do not meet the requirements for effective on-line analytical processing (OLAP). As a result, data warehouses are designed differently than traditional relational databases.
Warehouses are Time Referenced, Subject-Oriented, Non-volatile (read only) and Integrated.
OLTP databases are designed to maintain atomicity, consistency and integrity (the "ACID" tests). Since a data warehouse is not updated, these constraints are relaxed.

What is the difference between OLAP, ROLAP, MOLAP and HOLAP?
ROLAP, MOLAP and HOLAP are specialized OLAP (Online Analytical Analysis) applications.
ROLAP stands for Relational OLAP. Users see their data organized in cubes with dimensions, but the data is really stored in a Relational Database (RDBMS) like Oracle. The RDBMS will store data at a fine grain level, response times are usually slow.
MOLAP stands for Multidimensional OLAP. Users see their data organized in cubes with dimensions, but the data is store in a Multi-dimensional database (MDBMS) like Oracle Express Server. In a MOLAP system lot of queries have a finite answer and performance is usually critical and fast.

HOLAP stands for Hybrid OLAP, it is a combination of both worlds.

What is the difference between an ODS and a W/H?
An ODS (Operational Data Store) is an integrated database of operational data. Its sources include legacy systems and it contains current or near term data. An ODS may contain 30 to 90 days of information.

A warehouse typically contains years of data (Time Referenced). Data warehouses group data by subject rather than by activity (subject-oriented). Other properties are: Non-volatile (read only) and Integrated.

When should one use an MD-database (multi-dimensional database) and not a relational one?
Data in a multi-dimensional database is stored as business people views it, allowing them to slice and dice the data to answer business questions. When designed correctly, an OLAP database will provide must faster response times for analytical queries.

Normal relational databases store data in two-dimensional tables and analytical queries against them are normally very slow.

How can Oracle Materialized Views be used to speed up data warehouse queries?
With "Query Rewrite" (QUERY_REWRITE_ENABLED=TRUE in INIT.ORA) Oracle can direct queries to use pre-aggregated tables instead of scanning large tables to answer complex queries.
Materialized views in a W/H environments is typically referred to as summaries, because they store summarized data.

No comments: