Kasun’s Blog

Kasun Indrasiri

  • Kasun Indrasiri

  • Info

    Department of CSE
    University of Moratuwa,
    Sri Lanka

  • Archives

  • Categories

Googling, Searching and Information Retrieval

Posted by kasun04 on March 8, 2009

google-cartoon-021

As many computer addicted people, I also a “Google” junkie. I believe that most computer users can’t exists without Google or at least they need a good alternative like ‘LiveSearch’ or ‘Yahoo’. I don’t want to go in to a search engine comparison but just to show how we are depended distributed information providers.

In the modern parlance, the word ‘Search’ is a very ambiguous one. However an ordinary user more likely to interpret that one as ‘Google’ or some other search engine. So, the proficient guys in the fields of ‘search engine’ replace the word ‘search’ with ‘Information Retrieval(IR)’.

Information Retrieval (IR) is defined as, finding the materials (items/documents) of an unstructured nature (text) that satisfies an information need from within large collections(stored on computers).

Its obvious that the IR is not just bounded to web search yet web search is the dominant member of Information Retrieval. In the modern days Information Retrieval is fast becoming the dominant form of information access overtaking traditional database-style search.

However the definition is restricted to ‘unstructured data’ but IR systems capable of processing ‘semi-structured’ data as well. For example a book may be structured as ‘Title’, ‘Preface’, ‘Chapters’ etc. Also Information Retrieval also supports users in browsing or filtering document collections or further processing them. (similar to arrange books on a shelf based on their topics). The classification process is more or less automated in IR systems.

IR Systems can be classified in to three prominent categories.

Web Search

google1

In web search the IR system has to deal with billions of documents distributed among millions of computers and server billions of users across the web. So, the performance is a major issue and system is more focused towards handling billions of documents and serving billions of user is most optimal manner. However, the hardware and other resources are provided in large scale and managing them in optimal way is another issue.

Personal IR

vista

This is the counterpart of web search. The Personal IR is more focused on information retrieval of a single computer and server a single user at time. So, its obvious that the resources are limited and also the scale of the system is so small. Yet we need to provide the easy to use and efficient IR system to the user. A most suitable example of a personal IR system is the ‘search’ utility provided by your OS. These IR systems are extremely lightweight (hit F3 to invoke :)).

Enterprise, Institutional and Domain-Specific Search

fast

In these IR Systems retrievals might be provided for collections such as ‘internal documents’ of the company, a collection of research articles etc. In this scenario, the documents are stored locally and distributed among an internal distributed-file system and handful of dedicated computers may provided to the system.

In those three categories, web search is the most widely used and has immense influence on typical computer users. Despite the fact that all the IR categories based on similar kind of design (document feeding , processing, indexing and ranking etc), the detailed designs and the implementations of web search specific IR are quite rare and hardly published.

However, the companies like ‘Google’ are claiming (see Sergey’s speech )that they have published a great deal of information about the ‘Google’ design etc but its really hard to find them, apart from the research article that Sergey Brin and Lawrence Page published for Stanford University during the early days of Google.(http://infolab.stanford.edu/~backrub/google.html)

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: