January 25, 2003

I tested the image spider code yesterday and found that it worked for the first image and then continued to fetch the same image over and over again for the rest of the links. I thought there must be some bug in the code (and I couldn’t do any debugging with the debugger since each time I exited, Explorer would crash … hard <g> and so I had to debug using message boxes – which I hate doing) and spent most of my time trying to figure out where the bug was. I tried various approaches and went through a lot of code but all that time, I had the sneaking suspicion that the bug was somewhere else .. and I was right :p I discovered the bug finally just as I was about to give up for the day – the URL’s I got from the selected stuff on the page had some text encoding (the ampersand encoded as text for instance) and that was throwing things off. But what had confused me was the fact that any URL brought up the first picture on the list (that was probably a server side thing …) and I so I thought the problem was somewhere else. Ah well, I do have the solution now but I still haven’t implemented it since I tired to be too smart about it :p

I had been using third-party HTML parsing code but then discovered yesterday that I could use the IE browser component’s built-in IHTMLDocument2 interface to get a list of elements on the web page. I was using this fine for parsing a full-page by the time I discovered the problem with the URL text encoding and since I was simply returning the selected stuff on the page as a text range, I decided to use another method which returns a control range since I reasoned that this would allow me to access the stuff in the selection element by element and that the element attributes probably would be in the un-encoded form. However, I discovered only when I ran the code that the particular interface that I was using to do the job was not supported by Delphi! I hate when that happens! I guess I’ll have to go back to the original method I was using and translate the encoded stuff back to their un-encoded form for the moment but when I refine the code, I’m going to do away with the selection method altogether and use some sort of filtering – or simply return a list of links on the page and let the user select the ones that s/he wants to work with. But that’s for another day …

Be Sociable, Share!
Tags: General
Posted by Fahim at 6:45 am   Comments (0)