Search Engine Tech Nerd with Geek attribute

Web Crawler System Design

1 Min read

W

Distributed Web Crawler System Design to crawl Billions of web pages Learn web crawler system design and software architecture to Design a distributed web crawler that will crawl all the pages on the internet. Let’s learn how to build a google spider bot or google distributed web crawler. Crawler System Design Spider Systemdesigntips Bot Systemdesign Search Engine Computerscience Learn...

Add comment

Dogfooding

By vanGato

In General, Know-how

3 Min read

D

Dogfooding a Quick Guide to Internal Testing Eating your own dog food, also called dogfooding, occurs when an organization uses its own product. 1970 Alpo’s Dog Food Advertising Alpo, with the help of Lorne Greene, convinced consumers to buy their products because they themselves used them. 1980 Memo from the Apple CEO „We believe the typewriter is obsolete. Let’s prove it inside before we try...

Add comment

PHP dynamic Favicon Script

By vanGato

In Backend, Search

2 Min read

P

What is a dynamic Counter Favicon you will ask? No, you already know! We all have already seen the small icon in the Browser Tab called Favicon. But if you want to read more about this tiny little thing, take a look at the Favicon-Cheat-Sheet on GitHub Page from Audrey Roy Greenfeld. But today i want to direct your attention to an other topic. Dynamic content in such an favicon. This can be used...

2 comments

User-Agents of the Top 10 Web-Crawler

By vanGato

In Crawler

2 Min read

U

There are thousends of bots and web crawlers working the internet but below is my list of the 10 popular search engines user-agents. If you browse the logfiles of your website, you will always see the access to a file called “robots.txt”. These are usually calls from search engines. Their web crawlers with there user-agents that read the robots.txt file (hopefully you have one). They...

Add comment

Linklist you need if you want to build a search engine

By vanGato

In Links

5 Min read

L

Linklist Here are some background informations about how a search engine exactly work. We light ub what is difficult to crack if we try to build our own web crawler search engine from scratch: Giga Blast This page is a bit outdated (2004). But here you can read from the developer Matt Wells personally: All steps the search engine GigaBlast went through during the development process: After that...

Add comment

Stop-Word List

By vanGato

In Parser

5 Min read

S

What is a stop-word list and what advantage does it have to remove them? Stop words are extremely common words A Stopword is a word without essential information content, such as “and”, “the”, or “www”, etc. In English, the terms “stopword” or “stopwords” are used for this purpose. They are used very often, but do not really provide any...

Add comment

Writing Your Own Search Engine is Hard

By vanGato

In Links

2 Min read

W

Why is it so hard? Anna Patterson, Software Engineer. As well as contributed to search engines and artificial intelligence at Google, and co-founded Cuil. Makes following quotation about developing search engines: “There must be 4,000 programmers typing away in their basements trying to build the next “world’s most scalable” search engine. It has been done only a few times. It has never...

1 comment

AuthorvanGato

Web Crawler System Design

Dogfooding

PHP dynamic Favicon Script

User-Agents of the Top 10 Web-Crawler

Linklist you need if you want to build a search engine

Stop-Word List

Writing Your Own Search Engine is Hard

Latest posts

Latest comments

Categories

Search

AuthorvanGato

Latest posts

Latest comments

Categories

Tag Cloud

Search