Fawcette.com - Building Petabyte Databases With SQL Server and .NET

FTPOnline

Channels

Conferences

Resources

Back to VSLive! Show Daily Home

Resources

•	Click here for more resources

Building Petabyte Databases With SQL Server and .NET
Database guru Jim Gray gives us a glimpse into tomorrow's petabyte databases.
by Lee Thé, Executive Editor, Visual Studio Magazine

Posted February 22, 2002

Watch Jim Gray's Presentation

One week ago today, at the VSLive! 2002 San Francisco conference, Microsoft database guru Jim Gray treated SQL2TheMax attendees to a spectacular keynote on building petabyte databases using box-stock SQL Server 2000 and .NET. He mentioned that the next version of SQL Server will have some nice added goodies for storing and retrieving objects and a lot more, but he's already building huge databases with the current version.

He started with some predictions about the not-too-distant future of data storage in general. Disks will get 100 to 1,000 times more capacity, with 10 to 30 times more bandwidth and other technologies in the wings, such as mram and mems.

Data storage started out storing kilobytes of data, then megabytes, gigabytes, and even terabytes. But we'll soon be looking at petabyte-scale databases, to be followed eventually by exabyte, zettabyte, yottabyte, and more.

Even today, there are massive databases out there—many of them nonrelational, including the Library of Congress' book collection, some collections of a billion or more photos, and decades' worth of video.

So where will all this stuff be stored? In cyberspace, of course. Putting stuff on the 'Net keeps the cost per byte to a minimum. It shrinks time, because you can connect asynchronously. It shrinks space, because you can get at it from anywhere to anywhere. And you can automate processing with knowbots. You can make raw data available immediately, the digested version later—that is, data that's been located, processed, analyzed, summarized. And it can be delivered point to point or broadcast.

And to the distress of devotees of rows and tables, most of the data stored on computers today isn't the sort of stuff you can put into rows and tables. Instead, this data takes the form of documents, e-mail, video, photos, audio, scans, and so forth.

Of course after storing it, we have to find it. That's where SQL comes in. More than a file system, it unifies data and metadata. You can use it to subset and reorganize, do online updates, automatic indexing, and automatic replication.

And after you've found it, you've got to represent it in some way. A file metaphor is too primitive—basically it's just a blob. A table metaphor is too primitive as well—it's just records.

What you do needs to start with attaching metadata to a standardized format. Metadata lets you describe data context. That will tell you its format, providence (author/publisher/citations), rights, history, and related documents. For that standard format, it's obvious that XML and XML schema will play a large role. And the world is now defining many standard schemas.

Now in theory, XML gives us a lingua franca. XML documents are portable objects, and they can also be complex objects capable of handling the variety of data we need to store and access. But will all the implementations in Unix, SQL, C, and so on, match? We need conformance tests. Gray added the warning that "objects serialized as XML give us portable/mobile objects, but as Niklaus Wirth observes, 'algorithms+data=programs.' That is, we need both the objects and the methods to define the class. The new Web Services Interoperability initiative is a promising sign that the methods will be standardized and tested for interoperability." Gray was alluding to this week's historic agreement between Microsoft, IBM, Oracle, and just about everyone else but Sun.