Zoom Outline PDF View  or  Email Back to Schedule for Ohio


Data Warehousing Fundamentals




SUMMARY:   This dimensional modelling techniques course is designed to answer questions such as the following:
  • What is data warehousing?
  • What is a data mart?
  • What are the data modelling options?
  • What is Extract, Transform and Load (ETL)?
  • What are the terms and concepts specific to data warehousing and OLAP design?
  • How to plan and implement a data warehouse with high availability, simplified manageability and optimal performance
  • What are common statistics, analytic and OLAP SQL queries?
The course is suitable for all popular DBMS platforms, including DB2, Oracle, SQL Server, Sybase, etc.

AUDIENCE:   Would-be data warehouse architects, IT developers, database administrators or anyone responsible for a data warehouse or related discipline.

DURATION:   3 days.

OBJECTIVES:   Upon completion of this course, the participant should be able to design a data warehouse using both star and snowflake schemas. And the delegate should understand the implication of such terms as cubes, dimensions, attributes, joins, hierarchies, measures, etc.

COURSE CONTENT:  
  1. DATA WAREHOUSING OVERVIEW
    • Overview
    • Typical uses
  2. DEFINITION, ARCHITECTURE AND CONCEPTS
    • Enterprise Data Model
    • Operational vs. historical data
    • Extract Transform Load (ETL)
    • Metadata
    • Data warehouse vs. data mart
    • Data mining
    • OLAP vs. OLTP
    • Massive size implementation
    • Logical design vs. physical design
    • Normalization vs. denormalization
    • Referential constraints
  3. DATA MODELLING OPTIONS
    • Entity model
    • Star schema
    • Snowflake schema
  4. DIMENSIONAL MODELLING DEVELOPMENT LIFE CYCLE
    • Requirements analysis
    • Requirements gathering
    • Requirements validation
    • Requirements modelling
    • Schema design
    • Project definition
    • Warehouse design
    • Implementation
    • Follow-up and review
  5. DIMENSIONAL MODELLING DESIGN
    • Overview
    • Metadata properties
    • Star schema
    • Snowflake schema
    • Cubes
    • Measures and facts
    • Attributes and relationships
    • Dimension
    • Hierarchies
    • Role-playing dimensions
    • Joins
    • Summary tables and aggregation
    • Exercises
  6. CASE STUDY
    • Project definition and scoping
    • Specify the requirements
    • Specify the grain (e.g., fact table types)
    • Specify the dimensions (e.g., handling slowly changing dimensions)
    • Specify the facts (e.g., conformed facts)
  7. IMPLEMENTATION OPTIONS
    • Overview
    • Top down
    • Bottom up
    • Sizing
    • Cleaning
    • Populating the data warehouse
  8. EXTRACT, TRANSFORM, LOAD (ETL) TERMS AND CONCEPTS
    • Options
    • Extraction options
    • Transformation options
    • Loading options
    • Change Data Capture and publishing
    • Staging areas
  9. EXTRACTING
    • Logical-to-physical data mapping
    • Disparate (heterogeneous) data sources
    • Extracting changes data – delta or other
  10. DATA CLEANING AND CONFORMING
    • Data quality criteria
    • Design methods and alternatives
    • Cleaning deliverables
    • Conforming dimension tables
    • Conforming fact tables
  11. DIMENSION TABLE DELIVERY
    • Dimension table structure
    • Surrogate key generation
    • Dimension table grain
    • Flat (denormalized) or snowflake?
    • Data and time dimensions
    • ‘Big’ vs. ‘small’ dimensions
    • Dimensional roles
    • Dimensions as subdimensions
    • Degenerate dimensions
  12. SLOWLY CHANGING DIMENSIONS
    • Type 1
    • Type 2
    • Type 3
    • Hybrid
    • Late arrivals
  13. MULTIVALUED DIMENSIONS
    • Definition
    • Bridge tables
  14. FACT TABLE DELIVERY
    • Fact table structure
    • Referential integrity (RI)
    • Surrogate key derivation and flow
    • Fundamental grain
    • Transaction fact tables
    • Factless fact tables
    • Periodic snapshots
    • Accumulating snapshots
  15. FACT TABLE LOAD CONSIDERATIONS
    • Index management
    • Partition management
    • Updates, deletes and inserts
    • Recovery
    • Summary tables
    • Parallelism
  16. DATA WAREHOUSE PERFORMANCE DESIGN
    • Materialized views
    • Large concurrent reports
    • Short running queries
    • Long running queries
    • Random queries
    • Occasional updates
    • On-line utilities
    • Index options
    • Partitioning and parallelism (e.g., LOADs)
  17. INTRODUCTION TO STATISTICS, ANALYTIC AND OLAP SQL QUERIES
    • AVG
    • CORRELATION
    • COUNT
    • COUNT_BIG
    • CONVARIANCE
    • MAX
    • MIN
    • RAND
    • STDDEV
    • SUM
    • VARIANCE
    • Regression function
    • GROUPING, ROLLUP AND CUBE
  18. PHYSICAL DESIGN CONSIDERATIONS
    • Denormalization
    • Index choices
    • Data placement
    • Free space
    • Summary tables
    • Data compression

GOW/08

© 2007 Verhoef Training, Inc.

Schedule Dates

Course offered as
Inhouse or Public

Date Location State



Copyright © 2007 - Verhoef Group of Companies - All Rights Reserved