Log in


Desktop Search: Not ready for prime time? - Bloggit's Journal

About Desktop Search: Not ready for prime time?

Previous Entry Desktop Search: Not ready for prime time? Mar. 20th, 2005 @ 02:32 pm Next Entry
A long long time ago, Lotus released a product called "Magellan", which was wonderful for scanning (indexing) disks and finding documents. Of course those of us who either were very familiar with the Norton Utilies (and "Text Search", later "File Find") or with Unix (and "grep") didn't need this product, but it was very convenient. And it died the typical Lotus death of no updates and no advertising.

Many moons later, Desktop Search is coming back to the forefront. I resisted a long time, but was finally interested due to the Indexer service in Windows XP and due to having a huge number of files and data in my life, a result of running a bunch of servers.

First the niceties: Despite having made much of my living over the years on non-Microsoft technologies (BSD, Linux, Java, Palm...), I quite like Microsoft products. Not, perhaps, compared to what they could be, but they do tend to win based not just on marketing but also being based on more than just "good enough". Microsoft Word, Excel, SQL Server, Visual Studio are all easy to use for the basics and very performant and powerful. WindowsXP itself is a lot easier to set up than Linux and easier to dive right into. Not as easy for the extreme customization, but serve the bulk better. On the other hand, I run a non-Microsoft mail server, non-Microsoft web servers, non-Microsoft DNS servers, and more also. I'm not a pawn either way. So when I discovered that Indexing service:
  • Mostly works
  • Is hard to use for boolean queries
  • Requires iFilter DLLs that simply aren't available to expand to my needs
I decided to find a better solution. So I figured out my requirements and tried some out. And here's what I found.
My requirements are:
Index ConfigurationIndexes get big. I really am from the old-school of computer set-up; I put data on a different partition or drive from the system. My system drive isn't big enough for huge indexes. It doesn't need to be that big; data (and temp files) go elsewhere.
Simple requirement. Or so I thought.
EmailMost indexers and desktop search programs support email. By which they mean "Outlook". At this point, if you've read my previous blog entries, it won't surprise you to learn that I don't run Outlook. I run Thunderbird. With messages imported from Eudora when I ran that. So I need Thunderbird indexing.
Zip filesI set up scripts to automatically zip data from my servers and ftp it to a local external hard drive on my desktop. (I was using gz, which again tells you my basic philosophy, but the Windows Explorer can look inside Zip files. It doesn't do .gz files.) The zipping not only saves transfer time by compression but dramatically by reducing file count. Being able to find content that needs updating by doing a quick search through these, rather than switching over to the servers (which aren't indexed for performance reasons) would be beneficial.
MP3, JPEGUnlike most users of digital cameras and MP3 files, I use my headers. Some of my MP3s (which are mostly from CDs I own) have extensive ID tags denoting albums and such and perhaps 2% have lyrics in them. Many of my JPEGs not only have my Canon camera data but also IPTC comments descripting them. Which then work with the Gallery creating program I wrote to automatically caption photos on upload.
PDF FilesMany of my sources wind up as PDFs, but more importantly, lots of APIs (example: The Palm Programming manuals) are distributed as PDFs. As are many manuals. What's the point of indexing data if I can't get those?
Booleans and LimitsThis was part of why I started the search. What if I want to search for messages with, in Google terminology, "PALM AND SOUND AND NOT MP3", only across PDF files?
With those requirements in mind, I set out and downloaded Yahoo! Desktop, Copernic and Google Desktop. I did not download Microsoft's Desktop Search based on the strong suspicion that it won't handle Thunderbird email. The versions tested were the latest as of 19-March-05. Because this was a lengthy involved test, I probaby will not be updating these results regularly.

The Tests

The tests I set-up and ran were:
  • Find an email in Thunderbird by subject
  • Find an email in Thunderbird by content
  • Find content inside a ZIP file
  • Find a song (MP3) by lyric excerpt from the Lyrics tag in the header
  • Find a song (MP3) by ID3 v2 Comment tag content
  • Find a song (MP3) by Album from the ID3 v2 tag (added after the previous two tests failed)
  • Find a photo (JPEG) by IPTC Comment content
  • Find a photo (JPEG) by EXIF Owner field (added after the above test failed)
  • Find a PDF by text excerpt (a Palm API was used)
  • Text search using boolean (AND, OR)
  • Text search using entire phrase (to ensure documents with the words out-of-order don't match)
  • Supports limiting searching by file type - search only MP3s or JPEGs for relevant tests above
  • Source code searching (as installed)
  • Palm Address Book and Memo Pad searching
Yeah, that's a lot of tests. It was initially shorter, meeting the requirements in the table above, but as each of the Desktop Search tools failed so many tests, I decided to try to find something they were actually good at. It wasn't easy.

The Results

TestYahoo!GoogleCopernicMicrosoft Explorer/
Thunderbird emailNo support. Doesn't even scan them.Passes. Does index and display appropriately, but searching just email, or by field does not appear to be supported.Perfect. Can easily limit search to just email, search by content or by Subject or Date.Fail; Doesn't even seem to be able to do a brute-force text search on them.
Zip FilePassFail. Yep, Fail. Not a spec of success. Turns out you need to add a plug-in for this. Fail. Complete failure. Ironically, they don't even claim to support ZIP, but do slightly support GZ and RAR.Pass. No problems at all.
MP3 LyricsFailFailFailPass. Yep, only Microsoft found the correct file. In fairness, I did install MP3Filter.dll the day before running these tests. But that's not even available for the others, and I don't know if it simple searched the entire file or if it knows the format. Either way, Microsoft won without Desktop
MP3 AlbumPassFailPassPass, see above
MP3 NotesOf these four, only Microsoft could display the Lyrics appropriately. Not in the Search, but going to the results, in Properties - Summary, the Lyrics, Album and all other tags show appropriately in Explorer. Amazing, and even more so that the other three cannot do this even as theoretical enhancements over the operating system!
JPEG - IPTCFailFailFailFail
JPEG - EXIF OwnerFailFailFailFail
JPEG - EXIF Camera ModelFailFailFailPass Yep, again only Microsoft looks in the headers
PDF TextPassFailPassFail
Text BooleanPassFail - no OR supportPassN/A
Text PhrasePassPassPassPass
Limit by TypePassFail - it's there but not via U.I.PassPass
Source CodeSee CaveatFailPassPass
Palm DataFailFailFailFail
Index LocationPassFailPassFail
U.I. SpeedSlowFastFastMedium, depending on search type
U.I. PowerSecond BestWorstBestThird Place
Index Size224MB553MB345MBN/A

Other caveats:

  • Yahoo! includes the file path in the search. So if you're searching for a file with, for example, "WAP" in it and you have a folder called "Swap Meet", everything in that file will match. Very inconvenient.
  • Google may have been impacted by their dreadful interface. I never did determine (though I didn't spend hours on it) how to ensure that it had indexed my media files. There just doesn't seem to be any option for controlling which disks it is looking at.
  • Some of the Microsoft searches were far slower than the others, largely because they were done on-the-fly rather than from index. Of course, they also succeeded where the others failed.
  • The EXIF and IPTC headers were clearly a bit beyond what anyone expected. But they are found by grep and by many imaging tools I use. Yahoo and Copernic claim to index MP3 or JPEG metadata; they just don't appear to do it in reality. This may be a matter of definitions; they may be defining date, file size and image size as the meta data that they index. Which would be technically accurate, but certainly not complete enough for anyone who actually uses the headers. In fact, Copernic's claim is:
    Music: Full metadata indexing of iTunes, MP3, OGG, WMA and WAV music files.
    Pictures: Full metadata indexing of EXIF, JPEG, GIF picture files.
    which obviously is not true. (I have an email in to them about this; if they respond, I will update this page.)
  • Source Code Listing: I was hoping for Doxygen/JavaDoc style parsing. None delivered. As installed, Copernic indexed source code files and Microsoft was able to find via a search. Yahoo! supports adding the source extensions, which presumably would have caused it to pass.
  • Regardless of where installed, all of these default to putting the Index in the C:\Documents and Settings area. I don't tend to believe in that; I prefer my data to be in a different area. Google's index could not be moved; the others could.
  • Not tested but noted: Google can run in Opera, but brings up Internet Explorer regardless of what the default browser setting is. Meanwhile, Copernic recognizes IE and Firefox, but not Opera, for history scanning. Which is fine by me; I don't want my browse history scanned.
  • There were dramatic differences in how indexing occurred. Yahoo! put the list of files together fastest, but did not index the contents until later. Copernic was the first to index all all them. Google indexed fewer than I expected total.
  • Google Desktop creates it's own web server (port 4664 on my system) for the interface. The U.I. is very consistent with Google on the web, and results are displayed via IE (even if Opera is default) even when queried from the Google desktop bar.
  • Index Size: I confess to being confused as to how Google could simultanously have the worst results, the fewest file types scanned and the largest index (by quite some margin.)
Out of astonishment at how poorly these indexers did on the MP3 Lyric, I tried something simple. Being a Unix-type guy, I resorted to my old standby tool:
>grep "running out" *.mp3
Blondie_-_11-59.mp3 5 9:Time is running out.
Blondie_-_11-59.mp3 23 9:Time is running out.
Well, it's obviously there, and in plain-text. And the IPTC Comment in JPEG files is successfully searched also. Of course the output speed and formatting leave something to be desired, but still, the search is successful. Pathetic that these pups couldn't parse it.


All three of these systems hammer the system pretty hard during normal use. Having all three running at once is definitely not a good idea; you can watch them sucking down the processor on the Task Manager. They seem about equivalent to an active virus scanner. Uggah! It's so bad that I wound up turning them off simply to enable any real processing. (Although with only one running it probably wouldn't have seemed nearly as bad.)

Screen Shots

Two sets of screen shots here. The first set is the results of a search for a well-populated MP3 file. The interesting thing here is that Yahoo! provides the best data back on the song. (Google, as reported earlier, cannot parse it at all.)


Yahoo! Desktop did the best at displaying the MP3 hit. Remember it didn't pass the tests, but apparently some fields are parsed.
Yahoo! on Music
Yahoo! failed the email tests completely, so no screen-shot for that!


Copernic did almost as well as Yahoo! on music display:
Copernic Music
And Copernic also did fantastic on the email!
Copernic EMail


Google didn't do so well on music
Google"s empty music results
But did quite well on email headers
Google"s Email Headers
and email display
Google"s Email Body
Even if it's a bit less convenient than Copernic to go through them.


Right now, the whole Desktop Search niche is just not ready for prime time. All three of the programs I downloaded oversold and underdelivered. Of the three, Google was clearly the loser. Copernic's interface and speed (lack of hanging) are nicer than Yahoo!, but Yahoo! supports Zip files while Copernic doesn't. Meanwhile, Copernic supports Thunderbird email, while Yahoo! doesn't. So of these three, Copernic is my winner by a nose, figuring I need email indexed a lot more than I need Zip files indexed.

Ironically, Microsoft Explorer (with Indexing) did just as well as they did at word processing files. And, to top it off, iFilter add-ins can easily extend Microsoft's capabilities; it's probably only a matter of time before someone, perhaps me, creates iFilters for the real ID3 tags, IPTC, EXIF and Thunderbird mbox mail. Which in turn means that I will probably be moving towards the Microsoft Desktop Search not because I consider it the best right now, but because I can make it the best for me, something I cannot do with the others.

That's an odd way to win a war.
Current Mood: distresseddistressed
Current Music: Dazed and Confused, Led Zeppelin
Leave a comment
Date:March 22nd, 2005 05:35 pm (UTC)

Another desktop search engine

You might give Enfish Find a try. It isn't free, but there is a working demo that can be downloaded at enfish.com. I've used it for about a year and it works the best of any that I've tried. I haven't tried it with mp3s but email and documents are fantastic. It even does some EXIF indexing.
From: bloggit
Date:March 22nd, 2005 07:19 pm (UTC)


Enfish looks interesting, but no support for Thunderbird (nor Eudora) email, Zip files or MP3 files. Given the focus, it probably is the best tool for the typical office user, but my needs aren't quite typical.

Thanks for pointing it out though; I had completely overlooked it.
Date:March 23rd, 2005 01:07 pm (UTC)

Don't forget the plug-ins for Google

One thing to keep in mind about Google that could be interesting for your reviews over time is that Google took a very open approach to plug-in development and did so early. Microsoft has tools to open plug-ins for desktop search to filter additional types of content but only Google provides and promotes an XML interface to query desktop search from within their product and then promotes the plug-ins on their web site so that end-users can find and download the plug-ins.

At Viapoint (http://www.viapoint.com) we made a plug-in to combine desktop organization with Google Desktop Search and now it is available on the Google plug-in site (http://desktop.google.com/plugins/viapointorg.html) where the list is very rapidly expanding with new applications and connectivity to new data types.

BTW: Feel free to download Viapoint... it is free!

Dan Housman
Viapoint Corp.
Date:March 23rd, 2005 06:05 pm (UTC)

You ought to try out a professional desktop search app

Try out www.dtsearch.com, they have a 30 day trail period. its been around for donkey years before google was even born.
From: bloggit
Date:April 3rd, 2005 02:09 am (UTC)

Re: You ought to try out a professional desktop search app

DTSearch claims to support Outlook and Eudora, no mention of Thunderbird. Thunderbird was #2 on my rather short list of requirements.
Date:March 23rd, 2005 09:52 pm (UTC)

Google Desktop Search

A lot of the problems you had with GDS are addressed here:

I'll answer a few that jumped out at me:
* GDS does index PDF. To search only PDF files, use 'filetype:pdf' in your search.
* Searching by filetype works fine in GDS. Again - use 'filetype:' and the extension. This is the same syntax Google web search uses.
* Searching email by field is supported. You can search by subject, to, from, cc and bcc. You can limit your search to email by using 'filetype:email'. See here: http://desktop.google.com/features.html#advancedsearch
* There is a plugin for searching MP3 files here: http://desktop.google.com/plugins/mp3tag.html
* For source code search, there is a plugin for searching any text file here:
* To control which drives GDS searches, see the tip on indexing network drives here: http://users.tns.net/~skingery/firefox/GDS_Tips.html#network
* I have no problem using GDS in Firefox. If the browser is closed, GDS opens it. If Firefox is open, GDS opens a new tab.

It's a good article - very comprehensive. I wanted to make sure you had all the facts.
From: bloggit
Date:April 3rd, 2005 02:04 am (UTC)

Re: Google Desktop Search

* GDS does index PDF. To search only PDF files, use 'filetype:pdf' in your search.
My tests were based on it claiming it completed indexing and then failing to find the data. If it has to be told to search PDF, that's not much of a feature, in my opinion.

* Searching by filetype works fine in GDS. Again - use 'filetype:' and the extension. This is the same syntax Google web search uses.
Yes, and I noted this when I wrote, Fail - it's there but not via U.I. Just not friendly enough.

Regarding plug-ins, I'm a bit cautious on judging a program on the basis of third-party add-ons. Consider that if I had loaded a plug-in and the resulting app constantly crashed, fans of that app would accuse me of setting it up for failure. It can't be had both ways... so until the plug-in is provided by Google (and, by extension, presumed approved by them), it's not part of it.

I broke this rule a bit with the Microsoft Index Service, but it has little impact since it's a more mature platform and since so much functionality is built into the O.S.

All that said, I am seriously considering dumping Copernic for Google on the basis of Copernic's inability to index Thunderbird mail that is not stored under the profile in the typical base (e.g. C:\Documents and Settings\%USER%\Application Data\Thunderbird\Profiles\default). Copernic just can't find them. I'll have to check Yahoo! and Google.
Date:April 11th, 2005 06:11 am (UTC)

You should've at least tested the latest from Microsoft

I'm not sure why you decided not to test the product from MSN. You say it's because it doesn't support Thunderbird but neither did Yahoo. It's a very comprehensive article except for the fact you skipped one of the main contenders in my opinion.

From: bloggit
Date:April 11th, 2005 02:54 pm (UTC)

I did, just not then!

Well, for starters, it never occurred to me that Yahoo! wouldn't support Thunderbird. I was quite surprised by that.

But... after noticing how well Indexer service and XP did without MSN Desktop Search, I then did review it, and quite like it.

For the latest updates, check from the top of the blog. Since writing the entry this comment responded to, I've reviewed several Google add-on interfaces that make using Google less painful and I've fallen in love with MSN Desktop Search and am trying to solve the Thunderbird problem on it.

Thanks for writing!
(Leave a comment)
Top of Page Powered by LiveJournal.com