|
Data Warehousing Fundamentals |
|
| SUMMARY: |
This dimensional modelling techniques course is designed to answer questions
such as the following:
- What is data warehousing?
- What is a data mart?
- What are the data modelling options?
- What is Extract, Transform and Load (ETL)?
- What are the terms and concepts specific to data warehousing and OLAP design?
- How to plan and implement a data warehouse with high availability, simplified manageability and optimal performance
- What are common statistics, analytic and OLAP SQL queries?
The course is suitable for all popular DBMS platforms, including DB2, Oracle,
SQL Server, Sybase, etc. |
| AUDIENCE: |
Would-be data warehouse architects, IT developers, database administrators or anyone responsible for a data warehouse or related discipline. |
| OBJECTIVES: |
Upon completion of this course, the participant should be able to design a data warehouse using both star and snowflake schemas. And the delegate should understand the implication of such terms as cubes, dimensions, attributes, joins, hierarchies, measures, etc.
|
| COURSE CONTENT: |
- DATA WAREHOUSING OVERVIEW
- DEFINITION, ARCHITECTURE AND CONCEPTS
- Enterprise Data Model
- Operational vs. historical data
- Extract Transform Load (ETL)
- Metadata
- Data warehouse vs. data mart
- Data mining
- OLAP vs. OLTP
- Massive size implementation
- Logical design vs. physical design
- Normalization vs. denormalization
- Referential constraints
- DATA MODELLING OPTIONS
- Entity model
- Star schema
- Snowflake schema
- DIMENSIONAL MODELLING DEVELOPMENT LIFE CYCLE
- Requirements analysis
- Requirements gathering
- Requirements validation
- Requirements modelling
- Schema design
- Project definition
- Warehouse design
- Implementation
- Follow-up and review
- DIMENSIONAL MODELLING DESIGN
- Overview
- Metadata properties
- Star schema
- Snowflake schema
- Cubes
- Measures and facts
- Attributes and relationships
- Dimension
- Hierarchies
- Role-playing dimensions
- Joins
- Summary tables and aggregation
- Exercises
- CASE STUDY
- Project definition and scoping
- Specify the requirements
- Specify the grain (e.g., fact table types)
- Specify the dimensions (e.g., handling slowly changing dimensions)
- Specify the facts (e.g., conformed facts)
- IMPLEMENTATION OPTIONS
- Overview
- Top down
- Bottom up
- Sizing
- Cleaning
- Populating the data warehouse
- EXTRACT, TRANSFORM, LOAD (ETL) TERMS AND CONCEPTS
- Options
- Extraction options
- Transformation options
- Loading options
- Change Data Capture and publishing
- Staging areas
- EXTRACTING
- Logical-to-physical data mapping
- Disparate (heterogeneous) data sources
- Extracting changes data – delta or other
- DATA CLEANING AND CONFORMING
- Data quality criteria
- Design methods and alternatives
- Cleaning deliverables
- Conforming dimension tables
- Conforming fact tables
- DIMENSION TABLE DELIVERY
- Dimension table structure
- Surrogate key generation
- Dimension table grain
- Flat (denormalized) or snowflake?
- Data and time dimensions
- ‘Big’ vs. ‘small’ dimensions
- Dimensional roles
- Dimensions as subdimensions
- Degenerate dimensions
- SLOWLY CHANGING DIMENSIONS
- Type 1
- Type 2
- Type 3
- Hybrid
- Late arrivals
- MULTIVALUED DIMENSIONS
- FACT TABLE DELIVERY
- Fact table structure
- Referential integrity (RI)
- Surrogate key derivation and flow
- Fundamental grain
- Transaction fact tables
- Factless fact tables
- Periodic snapshots
- Accumulating snapshots
- FACT TABLE LOAD CONSIDERATIONS
- Index management
- Partition management
- Updates, deletes and inserts
- Recovery
- Summary tables
- Parallelism
- DATA WAREHOUSE PERFORMANCE DESIGN
- Materialized views
- Large concurrent reports
- Short running queries
- Long running queries
- Random queries
- Occasional updates
- On-line utilities
- Index options
- Partitioning and parallelism (e.g., LOADs)
- INTRODUCTION TO STATISTICS, ANALYTIC AND OLAP SQL QUERIES
- AVG
- CORRELATION
- COUNT
- COUNT_BIG
- CONVARIANCE
- MAX
- MIN
- RAND
- STDDEV
- SUM
- VARIANCE
- Regression function
- GROUPING, ROLLUP AND CUBE
- PHYSICAL DESIGN CONSIDERATIONS
- Denormalization
- Index choices
- Data placement
- Free space
- Summary tables
- Data compression
GOW/08 |
|
© 2007 Verhoef Training, Inc.
|