Distributed Web Crawler System Design to crawl Billions of web pages Learn web crawler system design and software architecture to Design a distributed web crawler that will crawl all the pages on the internet. Let’s learn how to build a google spider bot or google distributed web crawler. Crawler System Design Spider Systemdesigntips Bot Systemdesign Search Engine Computerscience Learn...
Know-How you need about Search Engines
A search engine creates a document base from documents stored on a computer or network. Depending on the purpose of the search engine, all documents or a certain subset of information sources can form the document base. For example documents from a certain country or in a certain language or in a certain format.
An index is created for the document base using keywords.
The document base and index are constantly extended and maintained by the search engine.
Search queries using keywords are processed.
In addition, the results are presented in as meaningful form and according to relevance as possible. Depending on the type of document, different presentations are suitable.
For example, when searching in text documents, a text excerpt is usually presented. While in an image search engine suitable images are presented in thumbnails.
Therefore, that’s it! But we want to digg deeper…