"... continuous integration (CI) implements continuous processes of applying quality control — small pieces of effort, applied frequently" - http://en.wikipedia.org/wiki/Continuous_integration
In my blog on Test-driven development I discuss the benefits of writing automated unit tests for business intelligence systems. If you are unfamiliar with the concept of unit tests, please read that blog first or one of the many articles on-line or on Wikipedia on the subject.
Continuous integration systems is the capability to automatically and regularly run all unit tests across the entire data warehouse. This will alert the data warehouse administrators of any failing tests which can, in turn, forewarn of issues in data sources or the data warehouse itself. Integrating this with a version control system (VCS), such as subversion, is even better. Using a repository to store all the DDL, ETL, and report code gives you a complete, searchable and linked history of the data warehouse. A VCS will also tell you who made any change, to what and when. With a CI environment, each commit to the VCS will automatically trigger the run of unit tests, so developers are aware if they introduce any defects immediately.
Another BI function which I use continuous integration for is the automatic and regular generation of user and system documentation. Assuming the data dictionary is kept up to date each time the data source or ETL is modified, we can use this to automatically generate the data warehouse help files and user manuals. With sufficient planning we could also generate the data dictionary automatically from appropriate markup in the ETL scripts themselves.
Lastly, the CI environment can control, automate and maintain the build and deployment processes between your development, test and production environments. Continuous integration systems provide other capabilities, such as code coverage and code standards, which are less useful in a BI context.
Examples of CI tools include;
- Cruisecontrol (Open Source)
- Bamboo (Proprietary, integrated with Jira)
- Team Foundation Server (Proprietary, Microsoft)
There are several benefits to using a continuous integration system in a BI context. If you are writing unit tests, it will improve the management and execution of these tests. The tests will help identify new issues early and as all tests are run regularly it can identify issues in older ETL code due to the latest changes. Lastly, as a general rule, with appropriate automation of repetitive tasks you can utilise your BI staff effectively on higher level tasks such as information analysis.
However, as with all things, there are disadvantages as well. There is an overhead to this process as time needs to be invested in developing the unit tests and maintaining the CI environment. This is also an ongoing process, if the tests and documentation are not kept up to date, that original investment in time and effort becomes wasted. However as the data warehouse becomes more complex this upfront cost has a significant long term savings measure; the investment in testing will reduce the time spent in debugging and enhancing.
any particular tools for etl testing
are their any particular tools you recommend for etl testing.
e,g, jmock junit dbunit etc.
any particular tools for etl testing
are their any particular tools you recommend for etl testing.
e,g, jmock junit dbunit etc.
re: any particular tools for etl testing
You need to pick the tool to meet your circumstances and systems.
DBFit is simple but limited to pure SQL testing
Zuzena can theoretically run tests with synthesised data (though I have never used it myself)
*Unit works very well if you have developers on staff who can write the tests (it is a different skill set to writing ETL code)
Lastly a few clients have written there unit tests directly in their ETL tool (in the cases I have seen, SAS and Datastage)
CI with BI
My project is a hard core BI proj.. with ETL packages and datamart and cube ( SQL scripts) as backend, Sharepoint as front end. Our client wants us to implement CI for our project. We are using TFS and following srum methodology. We need in detail steps and processes on how to implement CI to create builds and for QA. Any help would be higly appreciated !!!
Implementing CI in a BI project
My project is a hard core BI proj.. with ETL packages and datamart and cube ( SQL scripts) as backend, Sharepoint as front end. Our client wants us to implement CI for our project. We are using TFS and following srum methodology. We need in detail steps and processes on how to implement CI to create builds and for QA. Any help would be higly appreciated !!!
RE: Implementing CI in a BI project
Sorry, this is one of those how long is a piece of string questions. I have worked with quite a number of organisations and they all implement CI differently, depending on their infrastructure, toolset and business requirements. I'm happy to answer specific questions or provide training or consulting, but your question is too open ended.
I can recommend this book; http://martinfowler.com/books.html#duvall Not BI specific, but a very good introduction to continuous integration.