Search Tech Blog

CategoryParser

What is a parser used for in a search engine?

A parser is taking the content and split its text into word fragments. Linguistic algorithms like Porter Stemmer, and the removing of stop words are also applied here. Such a tokenized wordlist will be prepared for insertion into the forward and inverted indices.

The preparation of such a word list is also called Natural Language processing.

NLP – Indexing, Parsing & Tokenization aka.

content or text analysis
lexing or lexical analysis
concordance generation
speech segmentation
text segmentation
text mining

Finally a NLP is the subject of continuous research and technological improvement. As a result of this tokenization presents many challenges. Most noteworthy tokenization for indexing also involves multiple technologies. The implementation of which are commonly kept as corporate secrets.

But we want to shed some light! Let’s go for it…

Stop-Word List

By vanGato

In Parser

5 Min read

What is a stop-word list and what advantage does it have to remove them? Stop words are extremely common words A Stopword is a word without essential information content, such as “and”, “the”, or “www”, etc. In English, the terms “stopword” or “stopwords” are used for this purpose. They are used very often, but do not really provide any...

Add comment

CategoryParser

What is a parser used for in a search engine?

Stop-Word List

Latest posts

Latest comments

Categories

Search

CategoryParser

What is a parser used for in a search engine?

Latest posts

Latest comments

Categories

Tag Cloud

Search