Home
 

Bloggit's Journal

About Recent Entries

TVs Jul. 30th, 2005 @ 07:06 am
I see that Copernic has released V1.6 of CDS. They don't enumerate the improvements much; some toolbars for I.E. and Firefox, and "misc." usability and performance improvements. Trouble is, they never wrote me at all, cementing my opinion of their lack of desire to support developers. It's moot now anyhow of course... I am happily, though unexpectedly, entangled in Microsoft in several ways and one of them is my bemused discovery that MSN Desktop Search is very good and easily/reliably extensible.

My Palm IFilter is serving it's job admirably. I really should write a Protocol Handler to better display the results, but since it's working and I have other home and work projects, that's a pretty low priority.

Which brings us to a recent development at home. After about eighteen months to two years of research and waiting, I've finally bought a new T.V., replacing one I bought in 1993 for my cat. It took so long because I'm pretty demanding.

We tested with our primary source material... primarily Babylon5 DVD scenes. These have a lot of dimly-lit CGI (graphics), but also dramatic switches from dark to light and back. The action scenes also are hard on slow systems. Here's what we found.

Initially...
  • Plasmas couldn't handle the low-light scenes. Heads and hands in spaceships would become disembodied. This is called "crushing black levels".
  • Plasmas also couldn't handle going from a low-light scene with a bright doorway, to entering that room. The white would turn grey very quickly. This is called "Contrast rates at varying APLS", APLs standing for "average pixel level".
  • LCDs looked a bit hazy
  • DLPs had an amazing amount of shimmer on the rotating station shot. This, it turned out, was an amazing test of RBE (rainbow effect)
  • DLPs also were a lot brighter in the center than on the edges, which was distracting in some scenes.
  • LCoS, at least the ones we saw, didn't do blacks at all. We saw greys.
As I said, this was several years ago. Technology marches on, and I decided to wait until at least one technology mostly got better. I expected it to be DLP. Here's why and what changed.

  • LCD
    • Cons
      • LCDs were hazy and had ghosting
      • The ghosting is a latency problem; pixels can only change so fast.
      • Contrast and a poor black level hurt picture quality
      • The white CFL backlight is energy efficient, but far from ideal for color accuracy.
      • LCDs are very expensive in larger sizes.
    • Pros
      • Energy Efficient
      • Generally highest available resolution
      • No Burn-In
    • Improvements
      • Sharp first, and then Samsung, raised the resolution and contrast level
      • Raising the resolution made pixels smaller, which made them faster.
  • DLP
    • Cons
      • DLPs use mirrors to bounce light away or on; if it reflects off the sides of the box, contrast is lowered.
      • Rear projectors (LCoS too) have more light in the centre.
      • RBE - Rainbow Effect; some people on some material can see flashing colors where pixels should be, due to the spinning color (rainbow) wheel.
      • Deeper cabinets than Plasma or LCD
    • Pros
      • No Burn-In
      • Cheaper than LCD or Plasma
      • Great blacks and dynamic range
      • Good colors
      • Fast pixels
    • Improvements
      • TI made additional improvements to contrast and speed in later generations.
      • More recently, resolution came up through "wobulation", the quick wriggling of pixels
      • But as of this writing, those units are still not common
      • And the brightness and the RBE (where you can see flashes of other colors) are still there.
  • Plasma
    • Cons
      • Burn-In
      • Lower resolution; each color pixel is made of three discrete capsules of gas that must be large enough to hold the gas, phosphors, be energized, etc.
      • Power-hungry
      • Poor low-level black performance ("crushed" blacks)
      • Inconsistent contrast at varying APLs (the more pixels that are on, the less "on" they can be.
      • Pricey
      • Fragile
      • Generate lots of heat
    • Pros
      • Very vivid colors
      • High contrast rates (at low APLs)
      • Best viewing from an angle (LCD and rear projection fade fast as you move from right in front of them)
    • Improvements
      • Most of the Cons have been addressed in the last two generations
      • More efficient phosphors generate less heat
      • Because they're more efficient, they're less power hungry.
      • Lower power allows more low-level nuance, improving black performance
      • Higher efficiency reduces need to overdrive the pixels, reducing burn-in concerns.
      • LCDs are far more expensive in the 37" and up size ranges.
To my amazement, the new Panasonic plasmas looked fantastic on our source material. And on other test material we brought. So I bought one and set it up. After a bit of tweaking, it looks very film-like. Sure, I could get higher real resolution with LCD or higher simulated resolution with DLP, but the image put out by the plasma is, to our eyes, significantly better finally.

Buying It

We didn't expect to come home with a new T.V. that day. My wife was a smidge stir-crazy, and I was a bit bored, so I suggested checking out the newest sets to see if the new DLPs looked any better. A quick check to AVSForum indicated that Panasonic had just released new plasmas that apparently didn't make any improvements, but I figured they might be worth a check anyhow.

We went to Magnolia, a chain now owned by the parents of Best Buy, but a more boutique-like chain. Higher quality stuff, much better service and warranties. Not really higher prices, but that's only because Best Buy charges MSRP for everything; their name does not reflect their philosophy or pricing. Anyhow, Magnolia had just, that day, received the new Panasonic plasmas in Washington. The two stores we hit were in the process of unboxing their demos. Which looked fantastic, solved all the problems we had with plasmas. (So, in reference to the last AVSForum thread in the previous paragraph, it's important to realize that specs aren't everything. If maximum and minimum brightness are the same, but the ability to do amounts in-between smoothly is improved, the specs won't note it but your eyes will.) But Magnolia had an interesting pricing policy: The first store (which didn't have any in stock, just the demo) could get me one, at the list price of $2998. The second store had one in stock and had their tag up under their demo by the time we got there. And it said, I'm not making this up...
Was $2998
Now $2899
Huh? You haven't sold any, haven't had it a day, when was it $2998?

Didn't matter to me; I negotiated an extra discount anyhow. Took the thing home, tweaked it and was amazed.

Partially off-topic (as if the whole blog isn't), apparently the Voom satellites aren't accessible from my portion of the U.S. EchoStar (Dish Network) runs them now, but we can't get the programming from them until EchoStar propagates it to another bird.

Palm IFilter Released Jun. 25th, 2005 @ 03:52 pm
Well, at long last I've uploaded the files and code.

You may recall that this trek began with a simple comparison of various Desktop Search engines... excluding MSN Desktop because it lacked Thunderbird support. I settled on Copernic and decided to write a File Extractor to extend Copernic's support to the Palm Desktop files from my PDA, which I did in C# to improve my skills in that (new to me) language. But Copernic combined a few serious bugs with a deep silence, such that I gave up and resorted to trying MSN Desktop Search (which, by this time, had changed its name to "MSN Desktop with Windows Search".) A bit later, Benoit at Citeknet came out with a Thunderbird Protocol Handler for MSN Desktop, eliminating my primary objection to it. So I finished my IFilter in C# for the Palm Desktop files, and got it running much more quickly than the Copernic File Extractor because, while IFilters are a bit more convoluted, the support code seems bug-free. And I discovered that C# is not well suited for IFilters (or Protocol Handlers) and that database-type systems are beyond the scope of IFilter rendering (not indexing) for MSN Desktop. More specifically, Desktop cannot display much of the file (more than about 1200 bytes), even though it correctly indexes it, because that's not how they expected IFilters to be used.

Which brings us to today. I toyed with the idea of diving into a Protocol Handler to solve this problem "right", but the thing is, the indexing was the most important part to me. Once I know where the data is, I can easily enough go to that location to get it. And rewriting so much code just didn't fit into my schedule recently; it's been a busy several months. You can get some of that from the rest of my blog, but I've been on some massive projects at work too. So instead I wrote a Palm Desktop Preview GUI. Basically just a program that, when run against a Palm database that it recognizes, converts it to HTML and views it. This didn't take long because I'd already written the HTML conversion code during that brief period when SafeHTML was inadvertantly allowing me to use my computer the way I want to rather than protecting me from myself. I'm not sure what caused that brief functionality to vanish, but as I reported before, I'm told it wasn't supposed to be there.

So the only trick was figuring out the GUI and determining how little to do. I did very little. No options. All it does is show a form of the file and let you search (via Ctrl-F) it. But that's enough, for me for now. Soon I'll probably add features like a next/previous record button and an actual Find box, but it does all I needed for now so that's what you get. Plus the source code; if you want to enhance it, you can.

The Files

  • PalmFilter_setup.exe is the installer, which includes the (Windows-obviously) binaries for the viewer and the IFilter, plus a readme file, uninstall capabilities, etc. The installer will also handle some registration details, so hang onto it.

  • PalmIFilter_Source.zip is a zip of the source C# files. I didn't include the assemblies, DLLs or solution files, so you'll need to know what you're doing. The important things are to add COM support to the project and import the Web Browser control into your Designer Toolbox.
Everything else you need to know is in the readme file... and most of that is covered in this or other blog entries. The other highlights are:
  • Don't forget to ensure that your hotsync directories are actually set to be indexed by MSN Desktop. And you might want to un-check temporary file directories while you're at it.

  • I'm using the MIT license because it's pretty non-restrictive. If you want to modify the code, you can. You don't have to give the changes back. If you want to sell your modifications, you can. I would prefer you didn't, but you can. That's not true of some of the other open source licenses.

  • The Palm data viewer will (if you allow the installer to register it) work not just by double-clicking on a file in MSN Desktop/Windows Search but also from Explorer in Windows.
Enjoy. And please post with any comments, improvements, etc. I will probably eventually tackle a Protocol Handler for this, but perhaps not until Autumn.

Whitesnake Concert Jun. 24th, 2005 @ 06:35 pm
So the wife took me to a Whitesnake concert. If the Joan Jett concert raised questions about gender security, the initial Whitesnake crowd (before the real fans arrive) calls inquestion the geriatric impact of drugs. There's just something wrong about so much of your audience using walkers. Some of the conversations were astonishing for their vapidity: beyond their inabilty to master polysyllabic words well into their 50s, they were comparing two other recent concerts - Van Halen and, I swear I'm not making this up, Yanni! And this guy was spitting into a cup too. Crass.

But by the time Whitesnake took the stage, after the opening act was done, the audience was more like that of Joan Jett. Maybe 55% male, but not dominantly so, mostly middle-aged fun-loving rockers.

Unlike the Emerald Queen, no food or drink are allowed in the Paramount. Pity; having a beer with Joan Jett was nice. But we see high class stuff at the Paramount too - the musicals we go to are there - so I understand.

At 8pm the lights went out and Point One came on. Both the guitar and bass player had thin mowawks. Amazing. They're a punk thrash band, and might have qualified as good in 1977, but I'm not sure there's a market for this now that we've moved on to better drugs. It was certainly loud; I was grateful for having brought ear plugs. All the musicians had them too (in both bands.) To be fair, while Point One needs better production values, they clearly have some skill. Get them in the studio with Mutt Lange and it might be amazing.

The show was sold out, but when Point One took the stage, the theatre was maybe 40 percent occupied. They played (thrashed?) about half an hour, at the end of which it was still only 50% occupied. But half an hour later (an hour after the show started), when Whitesnake took the stage, it was packed. The people next to us arrived in the middle of intermission so I asked about the timing. They had waited upstairs, watching on the monitors. Good choice!

The show was impressive. David Coverdale may not be the best singer, but he's got a wide range and is a fantastic showman. He seemed to be having a great time, which is contagious. The lead guitarist (Doug Aldridge) was great and had a lengthy "let's give the band a break" solo I thought had gone out of style 15 years ago. And then Tommy Aldridge did a long drum solo... the last half of which was bare-handed and with his head! It sounded good, it looked painful. Really memorable.

On an interesting side-note (since I've written so much about desktop search), I have quite a few Whitesnake songs ripped from my own (legal) CDs on my system. I set the mood to "Here I Go Again". But oddly, MSN Desktop/Windows Search (come on, guys, shorten the name already!) cannot find the query [whitesnake here i go] even though it can find [whitesnake "here i go"]. Too strange to make up. Copernic finds it either way.
Current Music: Here I Go Again

IFilters and Thunderbird support on MSN Desktop Jun. 10th, 2005 @ 05:07 pm
Two weeks ago, I said, I'm going to try to take a break from programming this kind of stuff for a few weeks, but in June will try to write a Thunderbird IFilter, such that I can move completely to Windows Desktop Search.. Betcha didn't believe me!

In the meantime I have discovered that I had lucked onto a foible or bug in getting the HTML display in MSN Desktop with Windows Search. An area P.M. at Microsoft has confirmed that all formatting is stripped of all HTML tags for security reasons. This seems to be the case after a recent update of it, leaving me back at about 1200 characters of preview, which is also confirmed by the P.M. According to him, the correct tool for this is a Protocol Handler. This makes sense; the Palm works off of databases, not files. Unlike an MP3 or application, each Palm file has many entries. So to best index them, they should be treated as many entries. Initially I had assumed IFilter would do this because it has the GetChunk() and GetText(), allowing different text to associate with different file part... but that's not how they use it. (Actually I have no idea how they do use it other than for GetValue() returns, because it seems singularly pointless to support "chunks" but not chunk the results.)

Anyhow, that said, my C# IFilter for Palm Desktop still works, but only displays the beginning of the relevant file. At least you'll know where the information is. I'll post that, with code soon.

And I'm giving up on IFilters in C#, both because GetValue() is not something I can do in C# - it requires pointers that C# protects you from, even in "unsafe" mode - and because IFilters aren't quite what I need. I'll try my hand at a Palm Protocol Handler.

Which brings us to Thunderbird. I had mentioned that I was preparing to write an IFilter for Thunderbird. Shortly after I posted that, Benoit at Citeknet sent me a preview of the new Beta Thunderbird Protocol Extractor. They seem to have run into a few of the same problems I did, but this works and works well. It detects the various Profiles and Folders more effectively than the Copernic Thunderbird handler, text is indexed, and forgiving some formatting issues, the first 1200 bytes of the message are previewed. (Same problem I ran into, which Microsoft is looking at.) Good stuff. I've been using this for well over a week now and have been delighted. No need for me to write one now!

Installing MSN Toolbar with Windows Desktop Search

This necessitated installing MSN Desktop Search on a system that had neither it nor Outlook. In the process I ran into a few issues.

If you don't have Outlook installed, it warns you... twice... that it requires Outlook and then Outlook Express if you wish to "search email messages, contacts and appointments on your computer." Of course I have solved the contacts/appointments with my Palm IFilter.

Installation resulted in an interesting error. I quote, "MSNFirstRunWiz.exe - Ordinal Not Found"; "The ordinal 198 could not be located in the dynamic link library MAPI32.dll." My guess is that they forgot that they couldn't find Outlook on my system. Silly guys! It repeated this with ordinal 49 also.

This and similar errors pop up regularly when running MSN Desktop Search. Assuming it might be because the MAPI library in place (from either Eudora or Thunderbird) doesn't match it, I disabled email searching (not the default, but easily done.) That didn't solve the problem. Nor did removing MAPI32.dll; it requires one. But restoring the original MAPI32.DLL did solve the problem

Moving the index was absolutely trivial; select the option from the option window, choose new location, and it does the rest. They all should be this easy!

I have the ability right now to compare to Copernic. Keep in mind that Copernic has broken their API, doesn't respond anymore, and despite knowing it is broken, has not released a fix in a long time. Pity because it's got a great interface and was (other than their bugs) easy to write for; it's really a great product if you can live with the stock File Filters (and most people probably can), with a great interface, good speed, wide file format support, etc. A year ago this version (1.5) would have been the clear leader. Anyhow, Copernic was still on this box for the Thunderbird support. I discovered that, at least as currently set up, MSN Desktop with Windows Search is more effective than Copernic on my system for an unexpected reason: Whatever ZIP IFilter I've installed is searching inside the Zips, where as Copernic merely indexes the existence of them. This was very helpful, because I tend to compress stuff when done with it to make moving it easier (as a single archive), and because many other servers are archived in zip format to an external drive attached to my primary system. I'm finding it ironic that initially I simply ruled Microsoft' entry out! (Even more so considering my relationship with them, not withstanding the fact that I was doing embedded Linux 24 months ago and BSD internals programming 12 months ago.)


Palm IFilter for Windows Desktop Search May. 23rd, 2005 @ 07:38 pm
It looks like MSN Desktop Search has already been renamed and is now Windows Desktop Search. Fine by me; it's still the last man standing in the battle.

I was overly pessimistic yesterday in my estimation of how long it would take to write an IFilter. It's done. Here's a picture:

The marshalling issue I was running into was solved by simply marshalling the data copy rather than the parameters. Everything else fell into place very quickly. A few minor bugs and work-arounds and all is well.

Note that the text is nicely formatted. Some of the others don't do this; I added a word-wrap feature for my Copernic extractor. Microsoft uses something called "SafeHTML" to handle theirs (I determined this by reading the resulting preview source), and it's got a minor flaw... at least on my output (Unicode-16), it is including, but isn't converting, the carriage returns. It may be getting confused by the data type. But it was escaping the other HTML special characters. The result was a solid block of text. So I simply quickly wrapped my output in HTML also, which gave me the opportunity to boldface the category and such. All-in-all, I'll call that an undocumented feature, since it added power but is certainly indeed undocumented. And now, thanks to Microsoft's HTML processing, it's wrapped beautifully too.

There is one big disadvantage compared to some of the other engines; the displayed text will have your search terms in them, but the Preview starts from the top of the output and doesn't have any term-highlighting. And a few bits of the interface aren't intuitive... for example, if your system is set to "small" icons rather than large, you won't see the match text in the results window under the filename, but if you have large icons, you will... but again the beginning of the file (at least on my data) and without the HTML interpreted if present. Also, <ctrl-F> does not bring up Find for the Preview pane... but a Tools menu off to the side does, so you can jump to the relevant text. Just don't close that window, because F3 pulls up the animated Search doggie rather than repeating this text-search. That's all minor stuff though; on average this was very easy and quick to write and runs well and fast.

Writing an IFilter in C# was a bit tricky because there are a lot of moving parts. It was made easier by FiltDump.exe and by the Desktop/Indexing code simply working. Not having hidden bugs in your host can be a real boon to productivity. If you're thinking of doing it, download my code (or, if I haven't gotten it up yet, email me for it.) I'm happy to share the wisdom-wealth.

I'm going to try to take a break from programming this kind of stuff for a few weeks, but in June will try to write a Thunderbird IFilter, such that I can move completely to Windows Desktop Search.
Other entries
» Update: Toshiba NetCam Support, Copernic Search Support
Since the topics-of-note lately have been Desktop Search and Network Camera issues, today's entry is on progress on those two fronts.

The last two weeks have required a lot of time on the job. My job is a lot of fun, but can be very time-consuming in crunch time. As a result, I haven't done much in these hobby areas.

Toshiba Network Camera

Toshiba has done a lot to earn my respect. In my last blog on the Toshiba network camera, I pointed out that Support was good, as was the image quality, but that the out-of-box experience was pretty dreadful and the FTP (and wireless) don't work. The support guys did speak to a developer and learned that there's no room left in the firmware to fix the problem. But they also arranged an exchange for me. I had them "cross-ship", which means I held onto my original until the replacement (secured with a credit card) arrived.

The replacement did arrive decently promptly. But it turns out to be older than mine. The one I bought, at the end of April 2005, was made apparently in 2003. The one they sent me, a refurb, was even older; the serial number was lower and they decoded it too to 2003. Strange that they would consider an older model likely to fix the problem. (It did have the latest firmware, so perhaps the serial only applies to the actual case.) And anyhow, it didn't solve the problem. But on the bright side, when I spoke to them about this, they arranged free return shipping for me also.

I still don't have the FTP working perfectly, although I did improve it with some configuration changes. Since they mentioned the problem seems to be customer-based, not camera-based, it's probably a simple dumb buffer overrun or numeric overflow problem. If anyone is getting it working, the problem is avoidable. So here's what I tried.

  • There are three sets of User ID/Password in the camera: User (guest), Admin and FTP. I left the Admin ones long-and-complex (assuming they would have tested that), but shortened the User and FTP ones to under eight characters each.
  • The FTP settings default to Active Mode and disconnecting with each upload; I switched these to PASV and "Stay Connected".
  • The camera expects (requires) a string it embeds in the filename and a destination directory. I set these to a very short (two character) string and to "." (which means "current directory"; I set the FTP directory on the server. I previously had been leaving that value blank.)
This made things much better, extending the crap-out time by a factor of 12 or so.

It may take a lot more changes and trial-and-error, and quite frankly, it's not worth my time. My goal was to enable checking of the camera without the flakey Java applet the camera uploads. Toshiba's support was very helpful on this also; they suggest Active WebCam, which costs about $50 and amazingly, seems to work very well. (Actually, configuration crashes sometimes, but other than that it seems solid.) And it also works with my Sony television card, allowing me to put up live images of our CCTV surveillance system. I may simply register this program.

Another alternative was to write an HTML pull interface rather than rely on an FTP push. I started to do that but quickly realized it would take more than $50 of my time. Enough's enough. ;)

Anyhow, back to Toshiba: Their support staff (based in Sherman Oaks, which is part of "The Valley" in Los Angeles County) is helpful, pretty knowledgable and speaks English very well. They're among the best phone staffs I've run into, right up there with the old Sony staff that was in Florida.

Copernic

Moving from among the best to among the worst, it's time to revisit the Desktop Search space again. As I mentioned last week, Copernic's Desktop Search has a few flaws which they seem bent on making worse. My primary gripes are:
  • They released an API for developers and didn't test it
  • It doesn't work due to quite a few bugs and errors
  • They are very slow to acknowledge the issues
  • They've made them worse rather than better
When I sent them data on the bugs, they did eventually admit that yes, such bugs appeared to exist. And stated that they "will be able to provide you with a version containing these bug fixes by the end of next week", for the week ending last week. I considered that an unreasonably long lead-time, since other developers may also be hosed by them in the interim, but then they put out that even more broken build.

Copernic has not responded to my subsequent message and inquiry; I suspect I've been blacklisted. But it doesn't matter either way because they also haven't provided the fix. So here's what the random developer should consider:
  • Copernic's API is broken
  • They know it's broken
  • They won't tell you it's broken
  • They don't appear to be interested in fixing it
  • If you complain, they'll stop responding to you
Odd position for a company that should be wanting add-on extractors. And it's really a pity. I like CDS; In my initial comparison, it did well, and after I worked around their Thunderbird Profile Location Bug, it has served most of my needs. But I don't see much point in writing add-ons that won't load due to bugs in the host program.

So I've been working on an IFilter extension for Microsoft Desktop Search. I haven't put much time into it yet, due to the work schedule and the fact that Copernic came a bit closer to what I wanted to start with. But I have come to the conclusion that doing an IFilter in C# was probably a poor choice; I'm struggling with marshalling the IFilter interface for lack of decent examples. But that's my fault for chosing the wrong language for the task. At least Microsoft provides decent tools for quickly testing it and their API seems to work just fine; even with the added complexity and marshalling issues, I doubt it will take as long as I've invested in the Copernic Extractor. I may wind up stepping back to C++ for this project. Stay tuned.
Update: Read the next blog entry. I was over-solving the Marshalling problem and it's now working with very little incremental work.
» Copernic Strikes (Progress on Palm)
My Palm File Extractor for Copernic Desktop Search is nearly ready. This handles Memo and Address Book data. It has been successfully indexing and previewing these files for over a week, and now has a nice installation interface. However I'm running into some new Copernic issues where CDS claims to be indexing the data (and I can see that it is getting it back), but on some computers the only thing it really indexed is the file name! (All the data is rendered inside though, again evidence that the Extractor is working. And Copernic seems to always index the filename, regardless of other data.) This may be related to the version of .NET or something other than Copernic's fault, which at this point would be a nice change.

Update: The problem is that Copernic broke functionality with a fix, moving from Build 644 (the released build) to Build 646! Uninstalling 646 on the non-working system and rolling back to 644 solved the indexing problem.

As I said last week, I was delaying this a bit to give Copernic a chance to help with some issues; this is because my preference was to be able to write a "They were helpful and forthcoming" summary. Unfortunately, you instead get the gory details of several bugs and downhill developments.

Copernic Overview

Copernic is, of course, a Desktop Search entry. Some of these, such as Google, extract the file contents once and cache them as necessary for subsequent view. This is slower on index, faster on render and takes more space. Copernic indexes the files and then calls the same extractor again later for render when the user asks to see those hits. So rendering is slower but the displayed data is always up-to-date and less disk space is needed.

The call to the external file extractor should be the same either way:
  • Call the Extractor
  • Set the URI (file name)
  • Get the Content Stream Handle
  • Seek to end and back to beginning if you want the file size before getting data
  • Read until you get no more data

Preview Seek Error

First off, their API, which they documented and announced, quite simply doesn't currently work. There is a nasty bug in the Preview stage where it will seek past the end of the file data and then try to read from there. So no developer writing against the released Desktop (or any subsequent versions that Copernic has made me aware of) could have gotten a File Extractor working.

I got around it by implementing my own IStream interface with heavy logging so I could see what was going on. I knew my code worked, and the Indexing stage seemed to work, so the failure to render was stumping. But I wouldn't expect many developers to try that; it's not something you should have to do.

The following is a commented log from my trial runs, Indexing first:
Seek 0 bytes from current; now 0/ 41813
Seek 0 bytes from end; now 41813/ 41813
Seek 4222543 bytes from beginning; now 4222543/ 41813
Seek 0 bytes from beginning; now 0/ 41813
It then starts reading in 32KB (32768 byte) chunks, ignoring the size it got back, until it gets back 0 bytes. And that works fine. But then we get to Preview (i.e. you've done a search, found a hit and want to display it.) That also starts with seeks, but the logic is screwed up and it lands way past the file end.
Seek 0 bytes from current; now 0/ 41813 Seek 0 bytes from end; now 41813/ 41813
Seek 4222543 bytes from beginning; now 4222543/ 41813
Seek 0 bytes from beginning; now 0/ 41813
Seek 0 bytes from current; now 0/ 41813
Seek 0 bytes from end; now 41813/ 41813
Seek 15422548 bytes from beginning; now 15422548/ 41813
And then it starts reading in 61440 byte (0xF000) chunks. Which, of course, fail. And when it fails, it retries it. 251 times! (I cover that later.) These Reads of 61440 are then followed, amazingly, by an attempt to read 1108 bytes.

After finding their bug and alerting them to it, I later discovered that if IStream (or my own implementation) had in fact not implemented Seek() (which would be incorrect), their code would handle it better. Nice going, guys.

Rendering Delay

Once data is read by the Preview mode (i.e. my kludge that detects when they're going to the end of the file right before a read goes instead to the beginning), it takes about 20 seconds to display it (for my test bed of 45KB.) Initially I thought this was due to their read bug (next) but looking at the time logging, it isn't. CDS simply spins its wheels for 20 seconds before popping the data up.

But again, a developer is going to think its in his code, because CDS displays internally rendered data very quickly. They must have done this on purpose, but I can't fathom what it would be.

251 Reads

After finishing the Seek() process, the Indexing portion simply issues an IStream::Read() until it gets no data back. This is a normal behavior, perhaps a bit odd considering a file size check (the initial seeks) is done first, but that's the only oddity.

After finishing the Seek() process, the Preview portion issues 251 IStream::Read() calls (with a much larger buffer), regardless of reported filesize and of any returned data. 251! Amazing.

My initial guess was that this was the cause of the rendering delay, but when I added timing to my logging, it became clear that it isn't. The Read() calls are all done very quickly because my program knows that it's out of data and returns instantly. None-the-less, this is yet another example of dreadful design on Copernic's part.

Quality Summary

As I said above, I was hoping not to have to write any of this. I gave Copernic a week to respond to my logs and proof. I even sent them my program! Demonstrating these issues is absolutely trivial. But it took Copernic a week even to acknowledge the bugs!

On the bright side, they did acknowledge them, but these are pretty big honking bugs. And let's review what the implication is:
  • Copernic announced with fanfare that they now support a File Extractor Plug-In
  • It obviously doesn't work, wasting the time of any developer attempting to use it
  • Copernic obviously doesn't have any test cases they use to verify it on their side.
  • They don't move quickly when presented with evidence of the issues.
  • Per my update above, when they move at all, it can be in the very worst direction.
In their defense, they do respond more quickly than, say, Microsoft, but Microsoft wouldn't release something this flawed, nor would Microsoft make it worse with a fix (see update at top. It wasn't intended to fix the Extractor issues, but that doesn't explain why it makes them worse.) Also in their defense, reaching a human at Google or Yahoo would be much harder, and Yahoo at least has far larger flaws. But this experience did illustrate to me why there are so few Copernic plug-in File Extractors available.

I will release the extractor (and source code to anyone who wants to see how I did it, perhaps so they can write other extractors) once Copernic improves the quality a bit and I've verified it (having just learned that they won't verify it themselves.) I'm afraid right now that there may be hidden bugs completely obscured by the blindingly-obvious ones they missed, and don't want to have my code blamed should someone suffer from them. On top of which, I haven't looked into the Build 646 issue enough to have any sense of whether I can overcome their newest foibles, and therefore cannot ensure that any given user will be able to use my extractor. (It works wonderfully for me on 644 though.) Judging from the responsiveness and bugs, Copernic may not want third-party File Extractors. If I had known up front that I would spend so much time debugging their problems with no likelihood that it will work with future versions, I wouldn't have written one. Take that into account before you write one.

More later... perhaps... if developments merit. Judging from the last week, don't expect any news for a few weeks. Suddenly MSN Desktop is looking mighty pretty!
» Copernic Extractor for the Palm Desktop
I've mentioned that what I really wanted, and didn't get, out of the Desktop Search Engines was Palm Desktop. I also required Thunderbird, (which eliminated MSN Desktop) and decent reliability (which eliminated Yahoo!) and a friendly experience (eliminating Google.) Initially, Thunderbird support eliminated Copernic also, until I found a work-around to a bug in their product regarding the handling of the Server Default Directory setting. Unfortunately, I never heard back from them on this after explaining the problem in their support form, which leads me to suspect they aren't very serious about their product.

However, work-around in place, this left Copernic, unless I could extend MSN Desktop to support Thunderbird. And in fairness, MSN Desktop really is superior in some ways. But it can't be easily extended for Thunderbird because the way they call IFilter interface seems to skip GetClassFile() and therefore is not compatible with files lacking extensions.

So on to Copernic. I mentioned a while ago that Copernic is extensible but only crudely so; no support for information tagging (or context.) On the other hand, this should make it easier to write for. (More on that in a few days, depending...) So I did. I have implemented a Palm Desktop file extractor for Copernic. And here are the results. Bottom line is, it works. Here's a screen shot of an address-book hit...

Hubers, incidentally, is Portland (Oregon)'s oldest restaurant and has fantastic drinks, excellent ambiance (though a bit loud on Friday nights) and good food. Probably the closest thing that town has to a must-see eatery.

And one of a memo pad hit...

Well... actually, I made that up for this blog. But you wouldn't want to read about my life anyhow. I really did write it in Palm Memo Pad and then have Copernic index it though.

An important point to which I alluded last week is that the information cannot be tagged. This is not the case with Google or MSN Desktop; in those, the contacts would be contacts. Here, as you can see, contacts are, in this case, just documents like any other file that isn't a picture, video, music or email. Copernic does support contacts, with nice formatting. You just can't get them in there from a File Extractor Plug-In. And as I wrote on this previously, Copernic has acknowledged this and is not planning any enhancements in this area.

Now on to the extractor. I wrote it purely in C#. It's the first working Copernic File Extractor plug-in I've seen, although there is another blogger who posted about their attempts. In the process, I ran into an issue that needs clarification; it didn't prevent me getting the Extractor working obviously, but there are hurdles. I have posted (and emailed, given the failure posting had on the Thunderbird issue mentioned above) Copernic; we'll see if they come through this time.

Testing the File Extractor was a bit painful, although mostly due to the above issues. (Meaning if not for the above issues, my normal testing would have been fine.) I have a very large Copernic index. What I wound up doing was disabling all but one directory and rebuilding the entire index on demand to force a rescan. Copernic is quite different from Google in how it renders; Google stores a representation of the matched document in a cache. Copernic matches by indexed keywords, but relies on the original extractor to re-extract for the Preview. Oddly, it seems to do this through an entirely distinct code path. But on the bright side, Copernic will display the current information, not a deprecated cache. And deleted data stays deleted.

So once done with some of the hurdles, I had to reindex. Good time to take the dawg for a long walk.

Once those last issues are handled (i.e. I've heard from Copernic and solutions have been found), I'll put the sourcecode and an installer up, probably on SourceForge, since that's where several of my other projects are. Perhaps Copernic can be our final answer!
» Utility of Desktop Search Engines
So far a lot more of this blog has been Desktop Search oriented than I ever would have guessed. Apparently it's a topic that I care about, even though I don't work in that space. I was a big fan of Lotus Magellan and continue to use grep for all sorts of things, so I guess I view it as an extension of how I use technology.

In a nutshell, I use it to compensate for my shortcomings. I'm very good at analysis and understanding, but have very poor short-term memory. Amazingly bad. I mean, the kind so bad that people usually think I'm pulling thier leg. Unless something seems really important, I don't file it in my RAM. This is complicated by having excellent relational long-term memory... I may not have any clue that you told me about your new red Mustang convertible two days later... or even 20 minutes later... but six months later someone may mention wanting a new drop-top and you would then pop to mind... and I'd remember that you have a new red Mustang convertible. The information gets stored, but I don't have direct access.

So people assume I have an excellent memory and just am being rude or teasing them. Nope.

My compensation is to jot down all sorts of things, in any convenient electronic form, and build relationships between the data. I build it for everything; I have outlined lists for seasonal tasks (such as gutter-cleaning and fertilizing), for items I'm researching to buy (e.g. televisions), for what liquor I have in the cabinet. I use MP3 ID3 V2 tags to include notes about tracks (songs) and their lyrics, because those things will pop into my head. (This is why lyrics were on my tests initially.)

But because I put it in any convenient place, I now have a new problem: finding it. Common places include:
  • Email to myself
  • SMS to myself
  • Email to my cellphone
  • Palm Memo notes
  • SmartList2Go (database on the Palm) notes
  • Natura Bonsai (a Palm/Windows Outliner)
  • Text files
  • Outlook Tasks
  • As mentioned above, metadata in MP3 (and JPEG) files
  • Source code comments (including To Do items)
So lots of different P.C. places.

Now that I've figured out why I care so much about Desktop Search... because I need to span all these locations easily... I better understand why I care about the sources I do. But how often do I use it?

Thus far about three times a day, it turns out. Yesterday (just to pick a recent day that I can remember), I used it to:
  • Find out where I was supposed to meet a friend. (It turned out this was in Outlook, but in an email, not in a Calendar event as I'd initially guessed.)
  • Find out another friend's phone number (Turned out this too was in an email, but not, ironically, from him)
  • Find a document that I thought had been emailed to me (I had put it in a folder for its project)
  • Find my wife's travel plans (In email, and I knew it was, see below.)
  • Find who I ordered a specific item from (was in a text file; I expected it to have been in an email or in Quicken, which I searched first without Desktop.)
  • Check on an internet order (Wasn't sure if I saved the "invoice" to disk or if I got it via email)
That's just one (busy) day, but decently typical.

One interesting point of these is that all three of the Desktop Search programs I've been using over the last week (Microsoft on my work system, Google and Copernic on my home systems) are far better for finding contacts and email content than Outlook's search or Thunderbird's search is. Outlook and Thunderbird have slow and clumsy searches; I often can't find stuff I know is in there with their native searches, but the desktop search engines find it fine.

Would I be lost without Desktop Search? Not really. It saves me some time and effort, but mostly it's a convenience, just like being able to email a note to my cell phone.
» Building Filters/Indexers for Desktop Search Engines
The three Search Engines that are still in my cross-hairs are Copernic, MSN Desktop Search and Google Desktop Search. I have gripes with all three, basically boiling down to:
  • Google is unfriendly, non-configurable, makes bad assumptions and won't index longer text files
  • MSN Desktop Search won't do Thunderbird email, and IFilters are rather finicky to write to fix this due to the lack of file extensions on Thunderbird mailboxes
  • Copernic had troubles with moved Thunderbird mailboxes (see workaround in my previous blog)
  • None of them index full MP3, JPEG or other media metadata.
In addition, though, I want support for Palm Desktop. This is where my contacts, notes and appointments are stored. My Palm-compatible even synchronizes with two of my three cell phones, so this would be very beneficial. So I looked into writing filter extensions for each. (I looked into MSN Desktop Search filter extensions for Thunderbird support; maybe someday I'll take the serious time to get that working.)

There are two main ways file indexing could be done:
  • Generically - give the text over
  • Tagged - have the file index plug-in know something about the data
My belief is that the former is, well, stupid. Better than nothing, but at the end of the indexing, you don't know if the result is a contact, email, media or what. Presentation is unformatted. It's a start, but just that.

First things first: All three platforms use COM. That makes sense; .NET assemblies aren't widely used yet, and other API call mechanisms are still flakey. Microsoft's IFilter interface is the best-known, but also rather finicky and not well-documented. Certainly few good samples out there. But Microsoft's filters can be used in IIS and the Indexer service that comes with Windows in addition to MSN Desktop. Google's Indexer API is actually more complex to implement due to a larger number of moving pieces, but not significantly so. Copernic's is about as simple as a COM interface could be... not that this is a good thing.

MSN Desktop Search uses the IFilter COM interface with lots of support for tagging the results. Text can be extracted and returned in chunks (essentially records) with formatting, interpretation, types, etc. So a contact can not only be labelled a contact, but given different sub-tags (phone number, for example). Documents are known to be different from contacts. The beginning and end of a record is known. And the returned strings may either be literal (what was in the file) or be interpreted (what it meant). Bottom line: The IFilter interface is quite robust and I like what Microsoft did with it. (Now if only they would provide better sample code!)

In contrast, Copernic's API uses the ICopernicDesktopSearchFileExtractor interface, which despite the long name has only three functions: LoadURI(), GetContentStream() and IsContentUnicode(). So although Copernic has the best user interface for distinguishing between types you're looking for (e.g. Contacts or Music), the add-in file extractor cannot actually tag the data appropriately. Copernic actually responded to my querying email that they only support Document-indexers being written currently. Unfortunate.

Google's support is somewhere in the middle. Tags are called "schemas", and while significantly more limited than in MSN, they're a huge step ahead of Copernic. Ironically (considering I chose "Contacts" as an example solely because that's what I most want from Palm Desktop), Google's schemas do not include Contact forms. That's okay; Google's display and search options wouldn't support it anyhow. Google Desktop is the hammer approach to the problem. But that Copernic lacks the schema/tag support entirely even though it supports limiting searches on such, while Google provides a richer data world even though it doesn't fully utilize it, seems odd.


» Copernic / Thunderbird Update (and How-To)
As I mentioned in a previous entry, Google Desktop Search has a few very annoying limitations:
  • It only indexes the first few pages of text files
  • It isn't easily configurable as to which drives it supports
  • The interface is not as nice as many others
  • Most things that should be settings aren't, and even those that sort-of are require undocumented registry tweakings
Clearly I'm not a huge fan. But I've been using it because Thunderbird support is important to me. My two favorites, Copernic and MSN Desktop, don't handle Thunderbird. MSN for obvious reasons. Copernic does handle it, but not if the mail is in a non-default location, an amazing oversight.

Last week I downloaded PodSync's TweakGDS, hoping it would improve Google's settings. Not so much; TweakGDS cannot retrieve or accurately set drives to be scanned on my system with the current GDS. I tried several times, including time-consuming reindexes; it never got it. Don't know why; it's not that hard. Three registry tweaks later and I was back-in-business. But this annoyed me, especially because Google's indexing is slow. (Copernic is fast and MSN Desktop seems fast.)

I've already been working on an MSN Desktop Search IFilter for Thunderbird, but due to limited time around family constraints and troubles getting the registration to work through GetClassFile() due to lack of Thunderbird extensions, it's going slowly. So I set about seeing if I could determine why Copernic failed on moved Thunderbird mail files.

There are two ways to tell Thunderbird where to put mail. One of them is Thunderbird-specific, the other is Mozilla-generic and not accessible from Thunderbird normally. These are to either specify the location for a particular email account's folders, or to specify the location of an entire profile.

A profile is a Mozilla user, much like a Windows account under 2000/XP; all the settings are different. Few people have two of these even for their browsers; I can't imagine why anyone would want two for Thunderbird. And the normal installation process doesn't give you any input into it. To call up the profile manager, you even must exit Thunderbird entirely and restart it with a specific command-line parameter!

So while Thunderbird's deeper documentation mentions and supports profiles, they aren't the primary mechanism for telling it where to put your mail, especially after you've installed it. Instead, you simply go to the Account Settings for the desired email account, select "Server Settings" and change the "Local Directory". Absolutely trivial.

And absolutely unsupported by Copernic. But the profile data is stored in C:\Documents and Settings\%%USER%%\Application Data\Thunderbird, with both an "IsRelative" and "Path" entry. The MailBox files are, unless set otherwise, inside the profile Path folder in a Mail directory. The Profile Path is a random pseudo-guid string, so I figured Copernic must be reading the Profile entries and just not parsing Thunderbird's very odd configuration data. (Thunderbird uses neither simple ini nor registry settings for most settings; it's mostly on-the-fly generated scripting.) Changing the location of the entire profile, including Mail folder by Exiting Thunderbird, changing these and copying files over, and restarting Thunderbird worked fine; Copernic could still find the email accounts pointed to in the default location of the profile.

So the bright side is that Copernic is now (after moving my primary email folders back into the now-moved profile) working with Thunderbird again. That's a big win, since I prefer it significantly over Google Desktop Search. And Copernic does index all the way into text files, a big win for my sheet music listings. This gets me almost all the way where I want to be. Still want Palm Desktop support though. That's the next entry.
» Another Google Interface
This blog entry references the following blog entries, linked for convenience: I recently ran across another Google Desktop interface, at http://www.bytegems.com/google-desktop-search.shtml This one mimics the horrible Microsoft Explorer interface, excepting the animated characters. I'm not convinced of the value, especially compared to GDSuite, but at least it's another choice.

GDSuite, incidently, has the same problem GD Extreme does on updates; it doesn't remember where it was installed to. Stupid stupid stupid. Other than that, good program though.

Because I prefer the results, speed, power and interface of MSN Desktop Search, I've been working away on learning to build iFilters for MSN Desktop Search. These are a bit complex, partly because it's inherently a complex topic - parsing other file types and sending that to an indexer in tagged bits. But also because there aren't many examples, somehow I wasn't able to find any of the documented ones anywhere, and when I did find a few, they're still poorly documented. But I'm making progress. Not sure if I can do Thunderbird files because the Mozilla guys didn't give Thunderbird files an actual extension, which of course is how filetypes are generally discerned on Windows. (On *nix systems, it's via magic number at the beginning of the file, which is much more sensible but does require you open the file.)

Windows in concept supports the magic cookie formula; you use GetClassFile() which, if it can't find matching registered extensions, checks registered patterns in a CLSID portion of the registry. This actually works, except that Windows itself doesn't use it. Explorer doesn't, and as nearly as I've yet been able to tell, neither does MSN Desktop Search's indexer implementation. (My experiments do continue though.) This is odd since the LoadIFilter() documentation seems to indicate it should be essentially automatic. Oh well, another spitball.

On yet another topic, I have found another reason to be woefully disappointed in Google Desktop Search. Recall that I consider the interface poor, and even with the add-ons it's not great, the configurability and indexing pathetic, and many of the features just plain weak. It's just that it supports Thunderbird better than anything else. Rather like driving a Jeep Wrangler on the freeway every day because once a month you go riverbedding.

Anyhow, to the problem: only the first few KB of text and HTML files are indexed. This might seem like a minor thing, but I have a lot of data in text and HTML. They are universal formats, ran on all my palmtops from the HP95LX on, and even now I tend to toss data in them. I found this out by looking for a string in a list I have of indexes to my music books. (i.e. songs I own the scores to.) Very useful to prevent me from buying it again. But... no such luck. After maybe 20KB in, which really isn't very far, the indexing simply stops.

Huh? I uninstalled Copernic and Yahoo, so I can't verify on them, but I'm pretty such MSN Desktop Search isn't doing this.

So today's summary is: I still run Google on my home system due to Thunderbird support, with the aid of GDSuite, but find the interface and the failure to index very far into text files very limiting. I run MSN Desktop Search on my work system, and like it far better, but am having trouble extending it to support Thunderbird. Onward!
» Update to Google Add-In Interfaces
This blog entry references the following blog entries, linked for convenience: In the "Well, that was quick" column, Nathan at PodSync has already made some changes to Google Desktop Extreme. Things I noticed:
  • The installer (an MSI, so it should be smarter than this) has no idea where the program went before, instead defaulting to C:\Program Files\PodSync.com\Google Desktop Extreme\. Come on, Nathan, use a registry setting like everyone else.

  • No option to run the program at the end of the installer, nor does it run it automatically.

  • No obvious way to get the version number or build date except using Explorer's Version tab on the executable. (The version I'm looking at is 1.0.1920.20772.)

  • Even the installer doesn't have the version or build date; it merely states "Quickly find what you need with an enhanced interface for Google Desktop Search" and that the author is "nato". Oh, it does have a revision number. It is {2222AC15-6D82-4EC6-B0FE-E27889C554BD}, which I doubt Nathan would recognize.

  • The only way to do a second search is to go back to the search bar on the task bar. Adding an input bar in the results window would be nice.

  • Extreme hides the clock when the input bar is up. (I only noticed this because I'm scheduled to run off and do something soon.) This means, oddly, that I have two search input bars; a Google one and an Extreme one. Of course I could hide the Google one but still, hiding the clock?

  • Nathan mentioned he sped up the real-time input for this version. Initially that didn't seem to be the case; it wouldn't take mine at all. It seems I'm too fast a typist for it. But typing slower, it does seem to be better than before. Still, I'm best off with the real-time search disabled.

  • Nathan mentioned he had fixed the unhandled exception when using Thunderbird mail. That's true, I didn't get an unhandled exception. Instead, if "open emails in default application" is selected, I get a blank page opened in Opera with the URL of http://127.0.0.1:4664/openemail&product=57?id=2567370794+72460722+410%2D22005425184542406%40earthlink%2Enet&action=d&s=GNhd5TS5vq6bsegG02o0QnEjdww and Thunderbird opens the correct email in the background. But at least it works. If "open emails in default application" is not selected, I get the email displayed properly in Opera. Which, ironically, is better than Google Desktop, in as much as Google ignores my default browser setting and uses I.E. regardless.

  • Nathan also said in this thread on the Desktop Group that he made the program remember where it was put and allows it to show entire folder paths. These are probably very useful to some people, not relevant to me right now.
All that said, I am beginning to look at writing an iFilter for MSN Desktop for Thunderbird. MSN Desktop is seriously nice. (Although again, excepting my penchant for Thunderbird/Apache/Posadis/SciTe/Java, etc., I may have conflicted interests.)
» Search Engine Updates
This is long, so you may choose to jump to: The Desktop Search Engine arena is certainly immature. So far I have discovered...
  • That Yahoo! crashes and doesn't support Thunderbird at all
  • That Google cannot natively handle basics such as MP3s, moving the index or indexing USB or external drives
  • That Google provides a potentially-powerful but unfriendly user interface
  • That Copernic cannot index Thunderbird files unless they're in the default Documents and Settings/%USER% directory
  • That Copernic does not really index much metadata at all despite their claims on their website
Enough already? Nah...

Copernic's inability to deal with Thunderbird email not on the system drive caused me to reload Google to see how it would handle it, along with loading the free MP3 plugin to Google for better testing.

Google takes two chunks of your system drive. As I mentioned before, coming from the Linux/BSD world, I'm accustommed to partitioning differently, but Google doesn't allow that. On my system, it takes 3.5MB in c:\program files\google, and quite a bit more for the index in C:\Documents and Settings\%USER%\Local Settings\Application Data\Google

Both of these can theoretically be changed by tweaking the registry. I didn't bother tweaking the program location, figuring that would likely get broken upon subsequent refresh (which Google doesn't ask permission for), and because it would be a lot of registry settings. Lots of pointers to DLLs and classes in this folder. There is a primary registry setting, The index is in HKEY_CURRENT_USER\Software\Google\Google Desktop, in the install_dir key, but again the directory name is propagated heavily throughout the registry.

The index is in HKEY_CURRENT_USER\Software\Google\Google Desktop, in the data_dir key. C:\Documents and Settings\%USER%\Local Settings\Application Data\Google\Google Desktop Search (And duplicated in HKEY_USERS\S-1-5-21-532243027-1406638472-2499556169-1005\Software\Google\Google Desktop)

Unfortunately, Google does not appear to support index rebuilding nor does it pick up preference changes, at least in a timely matter, as to changed file types to scan. Since I was initially retesting Thunderbird (with changed location), I set indexing to only email. That somewhat worked; although it did a lot of other files such as text, it did exclude Office files and got the email quickly. But 18 hours after adding back the Office format, it still hadn't scanned even the base "My Documents" folder (which again is on a different drive, but Google picks it up fine and indexes that drive anyhow.) Deleting the index to force a rebuild seems the only way to effect a real change of preferences in the short term. GDS may do a full rescan eventually, but all of the other Desktop engines support both changing preferences and forcing of index updates much better.

Google does handle Thunderbird having email in a changed location. Do note though that Google handles this correctly for a bad reason... Copernic (which doesn't) appears to index the mbox files. This is what I would expect; it means that Copernic is processing them directly and consequently should be compatible with other mbox-format programs.

Google, in constrast, installs a DLL in Thunderbird. The first start-up after adding GDS or after removing it, Thunderbird fails with a DLL load problem. (It succeeds after.) This is good for Thunderbird support, but limits other mbox-format program support.

I then ran into another problem. From my previous entries, you can see that I tested on MP3 files... but they're all on an external USB2 drive. This wasn't an issue for any other indexer, but Google refused to touch it. Fixing this took a lot of work because Google ignores the external drives on purpose. The registry hack, which I found on the Google newsgroup, is better explained at http://users.tns.net/~skingery/firefox/GDS_Tips.html to make GDS index more.

Okay, so now we've:
  • Reinstalled Google
  • Tweaked the registry to move the index
  • Added a Plug-In
  • Tweaked the Registry to force reindexing
  • Tweaked the Registry to add an external drive.
This is not a program for the faint-of-heart. But on the bright side, Google now finds my media files and already handled Thunderbird email well. At least we've accomplished something.

Google User Interfaces

When perusing the Google Desktop newsgroup, I noticed that there are two interface program add-ons. Both are free. They are GDSuite and GD Extreme Both improve the interface significantly.

GDSuite requires you accept the vbaccelerator license. vbAccelerator is Apache-based and open source, so this seems like a non-issue.

GDS Extreme requires .NET Framework v 1.1.4322. This necessitated a visit to Windows Update for me, and it is a 23MB download, which may be relevant if you're on a slow connection. But the license agreement is even friendlier than GDSuite's; it's simply "we're not liable". Cool.

Using Them

GDSuite has a somewhat clunky windows-like interface. Fully expanded, it's too long. But a big improvement on the browser.

GDSuite has a nice interface, with lots of options. It's not as effective as Copernic's (see previous blog entries), and it doesn't know a thing about music even with the plug-in, but it provides convenient access to the normal search options.

Things I'd like to see added would be the ability to choose multiple filetypes to include and to choose to exclude them, and a multi-line option for the results list.


Google Desktop Extreme is very different; it pops up from a search bar on the task bar, with buttons for selecting file types and showing matches by rank without their data (default setting) but with a snippet. Unfortunately, double-clicking on a Thunderbird email message item results in an unhandled exception. Okay, it's not ready for prime time yet. But, in fairness, this is the very first release and it's a great improvement over the start.

A bigger problem for me with Desktop Extreme is that I just don't see it bringing me much functionality. Using the "real time search" (a search-as-you-type option) was painfully slow on my 2.8GHz unburdened test mule, and few advanced search options were present. On the bright side, Desktop Extreme does have a "media" button, allowing searches to be limited to music and visuals. Still a great move forward for a V1, and very nice looking.


Both interfaces open Word files fine, both seem to provide about the same results. GDSuite is a bit more powerful for configuring the search parameters, closer to Copernic. GD Extreme appears to be an attempt to split the difference between providing a complex-but-powerful interface vs a simple-but-emasculated one.

Both provide just a snippet of the result in their view, with double-clicking opening the native application. Strangely, GDSuite opens email in the browser (as Google Desktop Search does on it's own), while GD Extreme simply errors out (see above)... my guess is GD Extreme is trying to open the email application.

Microsoft MSN Desktop Search

I had not initially installed MSN Desktop Search simply because it obviously would not search Thunderbird files, and I don't use Outlook at home. I do, however, use Outlook in the office. And actually I have nothing against it except that it's far heavier than I need for home use and I've been using Eudora and then Thunderbird for years. Unlike many people with *nix backgrounds, I have nothing against Microsoft or MSN, love some of their products, am less thrilled by others, and although I don't work on or with MSN Search, I do have Microsoft ties. Although, again, if you've read my blog entries on Apache and such, you probably discount this entire paragraph.

Anyhow, so I installed MSN Desktop Search on my work system with Outlook and gave it a spin, just out of curiousity. And I'm quite surprised and a little bit annoyed. The interface is up there with Yahoo and Copernic, maybe better. Very nice interface.

One foible... everyone knows that MSN Desktop Search builds on the Microsoft indexing service. Right? Well, perhaps, but there's a complication. The indexing service has settings for directories as shown below...

But these settings are not picked up by the MSN Search indexer. Seriously. They must be set via the MSN Desktop settings

On the bright side, MSN Desktop Search is the easiest app to do reindexing of the full or just a directory in. It's very easy, and the status is well-displayed. Search is intuitive and fast, separate buttons for different types, and an excellent display of found items...

Note that the results display is right up there with Yahoo! for information and far nicer looking. Sweet!

The biggest downsides to me are the lack of Thunderbird support and the complexity of writing an iFilter DLL to add support. Even though a sample (SMPFilt) is documented, I was unable to find it. Which leaves Copernic and Google in the lead for now for my purposes. But seriously, if iFilters were easier to write or if Thunderbird were supported, this would be my #1 choice. It didn't bog the system down at all, it's friendly, powerful, good looking, accurate and fast.
» Updates on the Desktop Indexers
There are some great comments on my previous Desktop Indexer blog.  Among them are other suggestions (which don't really work for me) and the point that Google is extensible and has type-limitation support.  True enough, but despite being a very long time kernel developer, I am uninterested in remembering or digging up query strings to facilitate finding things.  The U.I. should make it easy as well as powerful.  Copernic and Yahoo do this well, and ironically, MSN Desktop seems to.  Google may be after a different niche.

I don't actually consider Google any more extensible than MSN Desktop Search; both are extended via well-documented hooks.  Granted, Google's hooks are more open-source, but Microsoft's are better known, still well-documented, and so on.  However, keep in mind that Google fell down on the interface also.

In the mean time, I have installed MSN Desktop Search on a system with Outlook on it.  Having already used the Index Service, I didn't expect much more from it.  I haven't played with it much, but am so far pleasantly surprised, if you use Outlook.  Of course, I prefer not to.  It's fast, very powerful, displays the results in the expected application, and doesn't harm performance during indexing as much as the other systems.  (Copernic is pretty good about this also.)  I'd put the interface behind Copernic's but it's very different.

Copernic responded to my queries about their MPG/JPEG header support.  It does seem that their marketing (well, they said "redaction") team didn't communicate well with their development team, and nobody actually thought about what the users would be interested in.  Very little grasp of the likely fields to be used; no indexing of IPTC, for example, which is where the real descriptive text goes in journalistic photos, but they do index the artist and title of MP3s (which are typically in the file name anyhow) while ignoring the lyrics and comments.  Helpful.  (They also told me they would update the web site immediately, but ten days later have not.  On the other hand, they have updated the program twice.  Since they didn't provide a change-list, I'm not sure what they've done to it.)  Copernic seems committed to responding to input, stating that while they won't change metadata indexing for 1.5 (still in beta), it will be considered later.

Unfortunately, Copernic has a nasty little bug too, in that if you've moved your Thunderbird email folders (e.g. to a larger disk), which Thunderbird supports quite well, Copernic cannot find them and cannot be told to index them. I'm guessing it hardcodes looking at the default locations per-user (C:\Documents and Settings\%USER%\Application Data\Thunderbird\Profiles\default) rather than checking the registry.) Stupid stupid stupid. Even uninstall/reinstall didn't fix this.

At this point, Copernic is still narrowly my choice for stock Desktop Search programs. Also if I can find or easily write an iFilter DLL for MSN Desktop Search to support Thunderbird mail and another to support Palm Desktop, I'll switch to MSN Desktop.  It's a close second due to the very powerful query language, integration into the OS, convenience and speed.  Copernic is nicer, MSN could easily be more powerful. But... I have run into a few Google Desktop U.I.s which I will discuss (with screenshots) soon that may turn the tables. Tune in again.
» Desktop Search: Not ready for prime time?
A long long time ago, Lotus released a product called "Magellan", which was wonderful for scanning (indexing) disks and finding documents. Of course those of us who either were very familiar with the Norton Utilies (and "Text Search", later "File Find") or with Unix (and "grep") didn't need this product, but it was very convenient. And it died the typical Lotus death of no updates and no advertising.

Many moons later, Desktop Search is coming back to the forefront. I resisted a long time, but was finally interested due to the Indexer service in Windows XP and due to having a huge number of files and data in my life, a result of running a bunch of servers.

First the niceties: Despite having made much of my living over the years on non-Microsoft technologies (BSD, Linux, Java, Palm...), I quite like Microsoft products. Not, perhaps, compared to what they could be, but they do tend to win based not just on marketing but also being based on more than just "good enough". Microsoft Word, Excel, SQL Server, Visual Studio are all easy to use for the basics and very performant and powerful. WindowsXP itself is a lot easier to set up than Linux and easier to dive right into. Not as easy for the extreme customization, but serve the bulk better. On the other hand, I run a non-Microsoft mail server, non-Microsoft web servers, non-Microsoft DNS servers, and more also. I'm not a pawn either way. So when I discovered that Indexing service:
  • Mostly works
  • Is hard to use for boolean queries
  • Requires iFilter DLLs that simply aren't available to expand to my needs
I decided to find a better solution. So I figured out my requirements and tried some out. And here's what I found.
My requirements are:
WhatWhy
Index ConfigurationIndexes get big. I really am from the old-school of computer set-up; I put data on a different partition or drive from the system. My system drive isn't big enough for huge indexes. It doesn't need to be that big; data (and temp files) go elsewhere.
Simple requirement. Or so I thought.
EmailMost indexers and desktop search programs support email. By which they mean "Outlook". At this point, if you've read my previous blog entries, it won't surprise you to learn that I don't run Outlook. I run Thunderbird. With messages imported from Eudora when I ran that. So I need Thunderbird indexing.
Zip filesI set up scripts to automatically zip data from my servers and ftp it to a local external hard drive on my desktop. (I was using gz, which again tells you my basic philosophy, but the Windows Explorer can look inside Zip files. It doesn't do .gz files.) The zipping not only saves transfer time by compression but dramatically by reducing file count. Being able to find content that needs updating by doing a quick search through these, rather than switching over to the servers (which aren't indexed for performance reasons) would be beneficial.
MP3, JPEGUnlike most users of digital cameras and MP3 files, I use my headers. Some of my MP3s (which are mostly from CDs I own) have extensive ID tags denoting albums and such and perhaps 2% have lyrics in them. Many of my JPEGs not only have my Canon camera data but also IPTC comments descripting them. Which then work with the Gallery creating program I wrote to automatically caption photos on upload.
PDF FilesMany of my sources wind up as PDFs, but more importantly, lots of APIs (example: The Palm Programming manuals) are distributed as PDFs. As are many manuals. What's the point of indexing data if I can't get those?
Booleans and LimitsThis was part of why I started the search. What if I want to search for messages with, in Google terminology, "PALM AND SOUND AND NOT MP3", only across PDF files?
With those requirements in mind, I set out and downloaded Yahoo! Desktop, Copernic and Google Desktop. I did not download Microsoft's Desktop Search based on the strong suspicion that it won't handle Thunderbird email. The versions tested were the latest as of 19-March-05. Because this was a lengthy involved test, I probaby will not be updating these results regularly.

The Tests

The tests I set-up and ran were:
  • Find an email in Thunderbird by subject
  • Find an email in Thunderbird by content
  • Find content inside a ZIP file
  • Find a song (MP3) by lyric excerpt from the Lyrics tag in the header
  • Find a song (MP3) by ID3 v2 Comment tag content
  • Find a song (MP3) by Album from the ID3 v2 tag (added after the previous two tests failed)
  • Find a photo (JPEG) by IPTC Comment content
  • Find a photo (JPEG) by EXIF Owner field (added after the above test failed)
  • Find a PDF by text excerpt (a Palm API was used)
  • Text search using boolean (AND, OR)
  • Text search using entire phrase (to ensure documents with the words out-of-order don't match)
  • Supports limiting searching by file type - search only MP3s or JPEGs for relevant tests above
  • Source code searching (as installed)
  • Palm Address Book and Memo Pad searching
Yeah, that's a lot of tests. It was initially shorter, meeting the requirements in the table above, but as each of the Desktop Search tools failed so many tests, I decided to try to find something they were actually good at. It wasn't easy.

The Results

TestYahoo!GoogleCopernicMicrosoft Explorer/
Indexer
Thunderbird emailNo support. Doesn't even scan them.Passes. Does index and display appropriately, but searching just email, or by field does not appear to be supported.Perfect. Can easily limit search to just email, search by content or by Subject or Date.Fail; Doesn't even seem to be able to do a brute-force text search on them.
Zip FilePassFail. Yep, Fail. Not a spec of success. Turns out you need to add a plug-in for this. Fail. Complete failure. Ironically, they don't even claim to support ZIP, but do slightly support GZ and RAR.Pass. No problems at all.
MP3 LyricsFailFailFailPass. Yep, only Microsoft found the correct file. In fairness, I did install MP3Filter.dll the day before running these tests. But that's not even available for the others, and I don't know if it simple searched the entire file or if it knows the format. Either way, Microsoft won without Desktop
MP3 AlbumPassFailPassPass, see above
MP3 NotesOf these four, only Microsoft could display the Lyrics appropriately. Not in the Search, but going to the results, in Properties - Summary, the Lyrics, Album and all other tags show appropriately in Explorer. Amazing, and even more so that the other three cannot do this even as theoretical enhancements over the operating system!
JPEG - IPTCFailFailFailFail
JPEG - EXIF OwnerFailFailFailFail
JPEG - EXIF Camera ModelFailFailFailPass Yep, again only Microsoft looks in the headers
PDF TextPassFailPassFail
Text BooleanPassFail - no OR supportPassN/A
Text PhrasePassPassPassPass
Limit by TypePassFail - it's there but not via U.I.PassPass
Source CodeSee CaveatFailPassPass
Palm DataFailFailFailFail
Index LocationPassFailPassFail
U.I. SpeedSlowFastFastMedium, depending on search type
U.I. PowerSecond BestWorstBestThird Place
Index Size224MB553MB345MBN/A

Other caveats:

  • Yahoo! includes the file path in the search. So if you're searching for a file with, for example, "WAP" in it and you have a folder called "Swap Meet", everything in that file will match. Very inconvenient.
  • Google may have been impacted by their dreadful interface. I never did determine (though I didn't spend hours on it) how to ensure that it had indexed my media files. There just doesn't seem to be any option for controlling which disks it is looking at.
  • Some of the Microsoft searches were far slower than the others, largely because they were done on-the-fly rather than from index. Of course, they also succeeded where the others failed.
  • The EXIF and IPTC headers were clearly a bit beyond what anyone expected. But they are found by grep and by many imaging tools I use. Yahoo and Copernic claim to index MP3 or JPEG metadata; they just don't appear to do it in reality. This may be a matter of definitions; they may be defining date, file size and image size as the meta data that they index. Which would be technically accurate, but certainly not complete enough for anyone who actually uses the headers. In fact, Copernic's claim is:
    Music: Full metadata indexing of iTunes, MP3, OGG, WMA and WAV music files.
    Pictures: Full metadata indexing of EXIF, JPEG, GIF picture files.
    which obviously is not true. (I have an email in to them about this; if they respond, I will update this page.)
  • Source Code Listing: I was hoping for Doxygen/JavaDoc style parsing. None delivered. As installed, Copernic indexed source code files and Microsoft was able to find via a search. Yahoo! supports adding the source extensions, which presumably would have caused it to pass.
  • Regardless of where installed, all of these default to putting the Index in the C:\Documents and Settings area. I don't tend to believe in that; I prefer my data to be in a different area. Google's index could not be moved; the others could.
  • Not tested but noted: Google can run in Opera, but brings up Internet Explorer regardless of what the default browser setting is. Meanwhile, Copernic recognizes IE and Firefox, but not Opera, for history scanning. Which is fine by me; I don't want my browse history scanned.
  • There were dramatic differences in how indexing occurred. Yahoo! put the list of files together fastest, but did not index the contents until later. Copernic was the first to index all all them. Google indexed fewer than I expected total.
  • Google Desktop creates it's own web server (port 4664 on my system) for the interface. The U.I. is very consistent with Google on the web, and results are displayed via IE (even if Opera is default) even when queried from the Google desktop bar.
  • Index Size: I confess to being confused as to how Google could simultanously have the worst results, the fewest file types scanned and the largest index (by quite some margin.)
Out of astonishment at how poorly these indexers did on the MP3 Lyric, I tried something simple. Being a Unix-type guy, I resorted to my old standby tool:
>grep "running out" *.mp3
Blondie_-_11-59.mp3 5 9:Time is running out.
Blondie_-_11-59.mp3 23 9:Time is running out.
Well, it's obviously there, and in plain-text. And the IPTC Comment in JPEG files is successfully searched also. Of course the output speed and formatting leave something to be desired, but still, the search is successful. Pathetic that these pups couldn't parse it.

Performance

All three of these systems hammer the system pretty hard during normal use. Having all three running at once is definitely not a good idea; you can watch them sucking down the processor on the Task Manager. They seem about equivalent to an active virus scanner. Uggah! It's so bad that I wound up turning them off simply to enable any real processing. (Although with only one running it probably wouldn't have seemed nearly as bad.)

Screen Shots

Two sets of screen shots here. The first set is the results of a search for a well-populated MP3 file. The interesting thing here is that Yahoo! provides the best data back on the song. (Google, as reported earlier, cannot parse it at all.)

Yahoo!

Yahoo! Desktop did the best at displaying the MP3 hit. Remember it didn't pass the tests, but apparently some fields are parsed.
Yahoo! on Music
Yahoo! failed the email tests completely, so no screen-shot for that!

Copernic

Copernic did almost as well as Yahoo! on music display:
Copernic Music
And Copernic also did fantastic on the email!
Copernic EMail

Google

Google didn't do so well on music
Google's empty music results
But did quite well on email headers
Google's Email Headers
and email display
Google's Email Body
Even if it's a bit less convenient than Copernic to go through them.

Summary

Right now, the whole Desktop Search niche is just not ready for prime time. All three of the programs I downloaded oversold and underdelivered. Of the three, Google was clearly the loser. Copernic's interface and speed (lack of hanging) are nicer than Yahoo!, but Yahoo! supports Zip files while Copernic doesn't. Meanwhile, Copernic supports Thunderbird email, while Yahoo! doesn't. So of these three, Copernic is my winner by a nose, figuring I need email indexed a lot more than I need Zip files indexed.

Ironically, Microsoft Explorer (with Indexing) did just as well as they did at word processing files. And, to top it off, iFilter add-ins can easily extend Microsoft's capabilities; it's probably only a matter of time before someone, perhaps me, creates iFilters for the real ID3 tags, IPTC, EXIF and Thunderbird mbox mail. Which in turn means that I will probably be moving towards the Microsoft Desktop Search not because I consider it the best right now, but because I can make it the best for me, something I cannot do with the others.

That's an odd way to win a war.
Top of Page Powered by LiveJournal.com