Designing a Search Engine: Design Patterns for Crawlers | Alejandro Moreno López | Pulse | LinkedIn
One of the things I've been really passionate about for the last few years are the crawl technologies and how the search engines work. In fact, it is probably when I was in my third year in the University, specialising in Artificial Intelligence when I started to be interested on that field.
It was there when I wrote a small engine in python which was actively crawling your hard disk and indexing all the information in a data base. The idea was that once the user had to search something, all the data was already there, ready to be displayed a lot quicker than the technologies in that time did it.
Time past, I passed the exam for that class, MacOS did something similar which is simply awesome (try to search a file in your computer nowadays, Ha!, beat that), and then my interests moved to the internet world… but I always kept an eye and the original passion for crawling techniques. That's when I started CruiseHunter, a group of algorithms that crawl the web indexing the best offers and prices for… yes, Cruises.
On the beginning CruiseHunter was coded in Ruby, language which I found quite nice to deal with xml/html files and all the problems you could find in crawling information from a site. Some time later, the project is still alive, more than ever I'd say, but now it is living a huge rewriting, using Symfony, Drupal and proper Software Design principles.
My life in Capgemini has basically changed my way of seeing things, and I can say now that the software I write is much more maintainable... I'd say it is even beautiful.
Read full article from Designing a Search Engine: Design Patterns for Crawlers | Alejandro Moreno López | Pulse | LinkedIn
No comments:
Post a Comment