Yukon: What's Really Important
Data warehousing enhancements and analysis improvements round out a complete BI offering.
by Brian Connolly
TechEd, May 25, 2004
SQL Server 2005, code-named Yukon, promises some fundamental enhancements to Microsoft's core database technology, and it is a central focus of TechEd's Data Management track. The enhancement that has received the most attention is integration with the Common Language Runtime (CLR). This will give developers the ability to develop stored procedures and callable functions in languages such as VB and .NET.
If you're an experienced database developer, CLR integration might be of little interest—in fact, you might consider it downright dangerous. A successful application developer's skills are quite different from the skills you need to develop databases. Databases by their very nature are built to handle set operations. They require special skills to design logical and physical relationships, assign indexes for optimization, and tune queries for performance. Database engines are engineered for mass updates and inserts of numerous rows of data at a time, while traditional application models work on one object at a time.
Business Intelligence in Yukon
While CLR integration might be viewed with skepticism (see Resources), several Yukon features are quite compelling for data architects. Microsoft has made major enhancements to SQL Server and Analysis Services, and introduced several new user interface tools—all with the goal of delivering a comprehensive business intelligence (BI) platform.
BI is the practice of using a data warehouse, together with data mining, Online Analytical Processing (OLAP), and reporting services to give an organization detailed statistical information and forecasts on business trends and efficiency.
Data Warehouse Enhancements
Microsoft is making some changes to the core SQL Server offering to support its use as a data warehouse. Data warehouses are huge "write once, read often" stores of accumulated operational data. They are usually built using ETL (extraction, transformation, loading) tools, or in Microsoft's case, Data Transformation Services (DTS), to archive live data.
Data warehouses need terabyte-level scaling; to enable this, Microsoft is adding a table partitioning capability to SQL Server in order to improve query performance for machines with many CPUs. Partitions will also speed table loading. Microsoft is developing a new role for Report Server by providing SQL Server mirroring, which is a way of creating a trailing copy of a transaction server. The Report Server can perform analysis and reporting, so the main server can continue to support realtime transactions efficiently.
Extract, Transform, Load
ETL is a fundamental technology for enterprise data warehouses, and several vendors have filled missing pieces in Microsoft product offerings. Warehouses must be continuously fed from numerous tributaries of data, which might be anything from a transactional database, HTTP click streams, order fulfillment records, or data sets from suppliers. In Yukon, Microsoft adds some significant enhancements to DTS so it can supply a true warehouse ETL capability. These enhancements include the separation of control and data flow, event and message handling, queuing, and the ability to join and merge data streams.
Analysis Services
The reason you have a data warehouse is to do enterprise analysis and reporting. Microsoft has made significant and fundamental changes to Analysis Services 2000 to make it a competitive BI offering. Here's a brief description of a few major enhancements.
Normally, long latencies aren't a problem for reporting warehouses. These warehouses usually report on the organization's long-term trends, so lags in data updates are acceptable. But Microsoft wants to extend analysis and reporting so you can use it to monitor daily operational data as well. To this end, Microsoft supplemented the prior "pull" model—where a reporting cube was updated by pulling new data from an input stream—with an option to push DTS updates from the operational database to the cube.
Yukon will also allow definition of a caching policy for an OLAP cube dimension. The cache policy for a dimension specifies an allowable latency, and when the latency is exceeded the cube can automatically refresh itself from the current operational database.
Microsoft also added data mining models for clustering, sequence clustering, Naïve Bayes, associations, and time series. Analysts can use these models to identify patterns for business forecasting, classification, and prediction, and DTS can invoke these models to generate reports in real time.
CLR integration has received the most interest, but that interest is largely from non-database developers, because CLR integration will let them gain native access to data in their own language and development environment. Microsoft seems to have expended far more effort on composing a total BI product by making fundamental changes to the data engine, DTS, and Analysis Services. DBAs, analysts, and database developers should find that these enhancements are far more significant in the long run than CLR integration.
About the Author
Brian Connolly is an independent consultant specializing in large-scale transaction systems. Contact him at http://www.ideajungle.com.
|