ETL testing and database testing both involve data validation, but they are not the same. ETL testing is usually performed on data in a data warehouse system, whereas database testing is usually performed on transactional systems where data flows from different applications into a transactional database.
In general, ETL testing is part of DWH testing. The data warehouse is built using data extraction, transformation and loading procedures. ETL processes extract data from sources, transform it to meet BI reporting requirements, and then load it into the target data warehouse.
Database testing focuses on data accuracy, data validity, and valid values. It includes the following operations:
- Checking if primary and foreign keys are supported.
- Checking the correctness of column values in the table.
- Checking the accuracy of data in columns.
- Checking for missing data in columns.
Common database testing tools include Selenium, QTP, etc.
A typical ETL-based data warehouse uses staging area, data integration, and access layers to perform its functions. This is usually a three-layer architecture:
- Intermediate level. A staging layer or staging database is used to store data retrieved from various source data systems.
- Data integration layer. The integration layer transforms data from the staging layer and moves the data into the database, where the data is organized into hierarchical groups, often called dimensions, into facts and facts are aggregated. The combination of fact tables and dimensions in the DW system is called a schema.
- Access level. The access layer is used by end users to obtain data for analytical reports and information.
ETL testing is performed before the data is moved to the production data warehouse system. It is sometimes also called table balancing or production reconciliation. It differs from database testing in terms of its scope and the steps you need to take to complete it.
The main purpose of ETL testing is to identify and correct data defects and common errors that occur before data is processed for analytical reporting. However, there can be problems when testing ETL:
- Loss of data in the ETL process.
- Invalid, incomplete, or duplicate data.
- The DW system contains historical data, so the amount of data is too large and extremely difficult to perform ETL testing on the target system.
- ETL testers usually do not have access to view the timelines in the ETL tool. They have little or no access to BI reporting tools to see the final layout of reports and data within reports.
- It is difficult to generate and create test cases as the amount of data is too large and complex.
- ETL testers are usually unaware of the end-user reporting requirements and business flow of information.
- ETL testing includes various complex SQL concepts for validating data on a target system.
- Sometimes testers are not provided with source-to-target matching information.
- An unstable testing environment delays the development and testing of the process.