Home
 

Bloggit's Journal - Copernic Strikes (Progress on Palm)

About Copernic Strikes (Progress on Palm)

Previous Entry Copernic Strikes (Progress on Palm) May. 15th, 2005 @ 01:53 pm Next Entry
My Palm File Extractor for Copernic Desktop Search is nearly ready. This handles Memo and Address Book data. It has been successfully indexing and previewing these files for over a week, and now has a nice installation interface. However I'm running into some new Copernic issues where CDS claims to be indexing the data (and I can see that it is getting it back), but on some computers the only thing it really indexed is the file name! (All the data is rendered inside though, again evidence that the Extractor is working. And Copernic seems to always index the filename, regardless of other data.) This may be related to the version of .NET or something other than Copernic's fault, which at this point would be a nice change.

Update: The problem is that Copernic broke functionality with a fix, moving from Build 644 (the released build) to Build 646! Uninstalling 646 on the non-working system and rolling back to 644 solved the indexing problem.

As I said last week, I was delaying this a bit to give Copernic a chance to help with some issues; this is because my preference was to be able to write a "They were helpful and forthcoming" summary. Unfortunately, you instead get the gory details of several bugs and downhill developments.

Copernic Overview

Copernic is, of course, a Desktop Search entry. Some of these, such as Google, extract the file contents once and cache them as necessary for subsequent view. This is slower on index, faster on render and takes more space. Copernic indexes the files and then calls the same extractor again later for render when the user asks to see those hits. So rendering is slower but the displayed data is always up-to-date and less disk space is needed.

The call to the external file extractor should be the same either way:
  • Call the Extractor
  • Set the URI (file name)
  • Get the Content Stream Handle
  • Seek to end and back to beginning if you want the file size before getting data
  • Read until you get no more data

Preview Seek Error

First off, their API, which they documented and announced, quite simply doesn't currently work. There is a nasty bug in the Preview stage where it will seek past the end of the file data and then try to read from there. So no developer writing against the released Desktop (or any subsequent versions that Copernic has made me aware of) could have gotten a File Extractor working.

I got around it by implementing my own IStream interface with heavy logging so I could see what was going on. I knew my code worked, and the Indexing stage seemed to work, so the failure to render was stumping. But I wouldn't expect many developers to try that; it's not something you should have to do.

The following is a commented log from my trial runs, Indexing first:
Seek 0 bytes from current; now 0/ 41813
Seek 0 bytes from end; now 41813/ 41813
Seek 4222543 bytes from beginning; now 4222543/ 41813
Seek 0 bytes from beginning; now 0/ 41813
It then starts reading in 32KB (32768 byte) chunks, ignoring the size it got back, until it gets back 0 bytes. And that works fine. But then we get to Preview (i.e. you've done a search, found a hit and want to display it.) That also starts with seeks, but the logic is screwed up and it lands way past the file end.
Seek 0 bytes from current; now 0/ 41813 Seek 0 bytes from end; now 41813/ 41813
Seek 4222543 bytes from beginning; now 4222543/ 41813
Seek 0 bytes from beginning; now 0/ 41813
Seek 0 bytes from current; now 0/ 41813
Seek 0 bytes from end; now 41813/ 41813
Seek 15422548 bytes from beginning; now 15422548/ 41813
And then it starts reading in 61440 byte (0xF000) chunks. Which, of course, fail. And when it fails, it retries it. 251 times! (I cover that later.) These Reads of 61440 are then followed, amazingly, by an attempt to read 1108 bytes.

After finding their bug and alerting them to it, I later discovered that if IStream (or my own implementation) had in fact not implemented Seek() (which would be incorrect), their code would handle it better. Nice going, guys.

Rendering Delay

Once data is read by the Preview mode (i.e. my kludge that detects when they're going to the end of the file right before a read goes instead to the beginning), it takes about 20 seconds to display it (for my test bed of 45KB.) Initially I thought this was due to their read bug (next) but looking at the time logging, it isn't. CDS simply spins its wheels for 20 seconds before popping the data up.

But again, a developer is going to think its in his code, because CDS displays internally rendered data very quickly. They must have done this on purpose, but I can't fathom what it would be.

251 Reads

After finishing the Seek() process, the Indexing portion simply issues an IStream::Read() until it gets no data back. This is a normal behavior, perhaps a bit odd considering a file size check (the initial seeks) is done first, but that's the only oddity.

After finishing the Seek() process, the Preview portion issues 251 IStream::Read() calls (with a much larger buffer), regardless of reported filesize and of any returned data. 251! Amazing.

My initial guess was that this was the cause of the rendering delay, but when I added timing to my logging, it became clear that it isn't. The Read() calls are all done very quickly because my program knows that it's out of data and returns instantly. None-the-less, this is yet another example of dreadful design on Copernic's part.

Quality Summary

As I said above, I was hoping not to have to write any of this. I gave Copernic a week to respond to my logs and proof. I even sent them my program! Demonstrating these issues is absolutely trivial. But it took Copernic a week even to acknowledge the bugs!

On the bright side, they did acknowledge them, but these are pretty big honking bugs. And let's review what the implication is:
  • Copernic announced with fanfare that they now support a File Extractor Plug-In
  • It obviously doesn't work, wasting the time of any developer attempting to use it
  • Copernic obviously doesn't have any test cases they use to verify it on their side.
  • They don't move quickly when presented with evidence of the issues.
  • Per my update above, when they move at all, it can be in the very worst direction.
In their defense, they do respond more quickly than, say, Microsoft, but Microsoft wouldn't release something this flawed, nor would Microsoft make it worse with a fix (see update at top. It wasn't intended to fix the Extractor issues, but that doesn't explain why it makes them worse.) Also in their defense, reaching a human at Google or Yahoo would be much harder, and Yahoo at least has far larger flaws. But this experience did illustrate to me why there are so few Copernic plug-in File Extractors available.

I will release the extractor (and source code to anyone who wants to see how I did it, perhaps so they can write other extractors) once Copernic improves the quality a bit and I've verified it (having just learned that they won't verify it themselves.) I'm afraid right now that there may be hidden bugs completely obscured by the blindingly-obvious ones they missed, and don't want to have my code blamed should someone suffer from them. On top of which, I haven't looked into the Build 646 issue enough to have any sense of whether I can overcome their newest foibles, and therefore cannot ensure that any given user will be able to use my extractor. (It works wonderfully for me on 644 though.) Judging from the responsiveness and bugs, Copernic may not want third-party File Extractors. If I had known up front that I would spend so much time debugging their problems with no likelihood that it will work with future versions, I wouldn't have written one. Take that into account before you write one.

More later... perhaps... if developments merit. Judging from the last week, don't expect any news for a few weeks. Suddenly MSN Desktop is looking mighty pretty!
Leave a comment
Top of Page Powered by LiveJournal.com