March 8, 2003

Still working on the web-update component with frequent forays into side jobs :p I thought of a few enhancements for the data mining app I wrote for my friend (I call it Harvester) and implemented that today. I would like to go back to that app at a later time and try to come up with some form of a template based approach which would let somebody choose a web page, define the values they want to extract by highlighting and naming them and have the program extract those values from any page which fits that particular template. Sounds easy enough but it would be a heck of a lot harder to implement than you’d imagine because of the uncertainty factor – the web page layout can change at any time, the page might be dynamically generated and might not have had some information which might appear on another page of the same category etc. But, it is an interesting concept and something that would tie in with quite a few of my other projects.

The only work I was able to do on the web-update component was to test it further. I discovered a few more bugs at the file download stage and fixed those but I still haven’t gotten to a stage where all the selected components are successfully downloaded. The only thing I can be happy about is that I used the kind of test data which does create problems and has helped me identify a few bugs and logical errors which otherwise might not have been identified till much later :p

