In today’s scenario, there is an ample amount of data on the internet that can be accessed by everyone. This is the data that can be indexed by search engines. There are softwares named Web Crawlers that explore the WWW in an efficient manner. But there is also a large amount of data that is still out of reach from the access of the conventional search engines. This is known as Deep Web or Invisible Web. Web pages that are hidden created dynamically as a result of queries send to particular web databases. For traditional web crawlers, it is almost impossible to access the content of deep web due to its structure. To retrieve the contents of deep web is a challenge in itself. This paper discusses the methods and tools of crawling the web that is hidden beneath the surface.
Keywords
Deep web, surface web, search engines, crawling, information retrieval