Shedding Light

With WinFS postponed to an even later date, we’ll have to focus on Apple’s “Spotlight”:http://www.apple.com/macosx/tiger/spotlight.html for now. So far, most of what we heard is fairly user oriented. “It will help you find files much easier and faster!” That’s great, but how does it work? Does it use a database, like WinFS? Does it allow you to attach your own meta data to files? Is it easy to get more file types supported?

But let’s first get back to the basics. What the hell is it, and why do we need it? The fuss started when somebody asked himself (yeah, I’m pretty sure it was a he) a very simple question: “How come can I find a website on the whole friggin’ internet faster than a file on my harddrive?” And that’s a valid question. Ever used the find file functionality in Windows (and to a smaller extent in Mac OS X)? It’s unusably slow and most of the time only searches filenames. However, on the web, where there are a factor billion more files, we can do full-text searches in less than a second. What’s the deal with that?

The deal is the lack of good software to do it. And, integration of that software into the OS. “X1”:http://www.x1.com has been around for a little while, but it’s too expensive. More recently we got “Google’s Desktop Search”:http://desktop.google.com which is nice but only searches particular file types and offers zero OS integration. For a while now Microsoft and Apple have been working on this problem and came up with two similar solutions. Microsoft’s solution is called WinFS (Windows Future/File Storage) and Apple’s solution is called Spotlight. Microsoft’s WinFS would come out with Longhorn in 2006, but recently was postponed to some time later. Apple’s Spotlight will come out with Mac OS X Tiger (10.4) which will probably come out June 2005.

What did Apple and Microsoft come up with? It’s pretty simple. They’ll just use a database to store so-called meta data about all the files on your harddrive. Meta data is data about data. Confusing? Nah. File names, sizes, owners, last modified are all examples of meta data. It’s all data about other data (in this case data about a particular file). Both WinFS and Spotlight allow developers to add more meta data to a file than the basic set. So for MP3 files it would also store the artist, song title, album, year it came out and length. They’ll also do full-text searches on for example Microsoft Word documents and PDF files. Data like contacts from your address book and calendar events will also be indexed. Because all this data is stored in a specially tuned database, it can be searched really quickly. Apple’s Spotlight even searches as you type (incremental). Ever used iTunes, with the search field on the top right? Just like that, but then you search your whole harddrive.

Apple posted “Working with Spotlight”:http://developer.apple.com/macosx/tiger/spotlight.html, an article that shows you the basics of developing with, and the internals of Spotlight.

According to the article, the technologies that power Spotlight are:

* A database consisting of a high-performance meta-data store and content index that is fully integrated into the file system.
* Programmatic APIs that are part of the CoreServices and Cocoa frameworks that let you query the meta-data store and content index.
* A set of importer plug-ins that are used to populate the meta-data store and content index with information about the files on the file system.
* A plug-in API allowing you to provide meta-data and content to be indexed for your application’s custom file formats.

The database is used to store and index all the meta data. The programmatic APIs are used to search the meta data from your own applications, if you like. The importer plug-ins are used to obtain the meta data from within the files. By default Spotlight will only find the most obvious file meta data: file name, size, owner, date last changed etc. The file type specific data, like artist information, album name etc. has to be obtained from the file by using importers. By default Spotlight will come with importers for at least the following file types:

* JPEG, PNG, TIFF, and GIF images
* MP3 and AAC audio files
* QuickTime movies
* PDF files
* Microsoft Word and Excel documents
* iChat transcripts
* Email messages
* Address Book contacts
* iCal calendar files

If you’re a developer of an application that uses a different kind of file format, you’ll have to develop your own importer. Luckily that’s not that hard. It basically boils down to implementing one single interface method:

Boolean GetMetadataForFile(void *thisInterface,
CFMutableDictionaryRef attributes,
CFStringRef contentTypeUTI,
CFStringRef path)
{
/* do the actual work of pulling meta data from the file */
return TRUE;
}

Then you just have to package it the right way, put it in a particular directory and it will just work (apparantly).

I don’t know about you. But I can’t wait until trying these searching features. In iTunes I use it all the time.

If you’re a Windows user you’ll have to stick to “X1”:http://www.x1.com and “Google’s Desktop Search”:http://desktop.google.com for now. Sorry. Can’t help it. Buy a Mac. Or X1, which may be a cheaper option (not as cool though).