Where does Search Engines store there Data?
The storage of the big data of a search engine is one of the challenges of building up a search engine.
There are several Server Types:
- Web Server for the Search
- Bot Server for the Crawler
- Index Server for the Reverse-Index
- Data Server for the Documents
- Ad Server
- Spellcheck Server
We are interested most in the Index and Data Server.
First of all we think to need a SQL like database, but the content is mostly written once and only read many. So e.g. Google use “Big Table” a NoSQL Database that is developed by them self and twice faster than competitors.
Doug Cutting, the creator of Hadoop and Lucene, once said:
You know, people today think that search and big data are separate but in two or three years, everyone will wonder why we ever thought that.”
When it comes to Big Data, it is usually also about the Hadoop Distributed File System (HDFS). HDFS is increasingly becoming the preferred tool to help enterprise storage users overcome big data problems.
But lets make a little smaller decision:
Database vs. Filesystem
Pros and Cons of Database
+ Consistency check
+ Automatic sync
+ Backups more automatically
+ More secure
– Slow performance
– Access is more difficult
– Data must be converted to db format (e.g.blob)
– Backups are big and heavy
– Memory is ineffective
Pros and Cons of the File System
+ Better performance
+ Saving and downloading is simple
+ Migrating is easy
+ Cost effective
+ Cloud storage is easy
– No consistency check
– Low security
So make your own decision but in my opinion the filesystem ist the faster, cheaper and so the better way for big data!
Google owns huge datacenters all over the world: Data center locations
Bing uses Microsoft datacenters: Microsoft Data Center Tour