IJCATR Volume 5 Issue 2

An Effective Approach for Document Crawling With Usage Pattern and Image Based Crawling

Ankur Tailang
10.7753/IJCATR0502.1002
keywords : WebCrawler, Page Ranking, Indexer, Usage Pattern, Relevant Search, Domain Profile, Migrating Agent, Image Based Crawling.

PDF
As the Web continues to grow day by day each and every second a new page gets uploaded into the web; it has become a difficult task for a user to search for the relevant and necessary information using traditional retrieval approaches. The amount of information has increased in World Wide Web, it has become difficult to get access to desired information on Web; therefore it has become a necessity to use Information retrieval tools like Search Engines to search for desired information on the Internet or Web. Already Existing and used Crawling, Indexing and Page Ranking techniques that are used by the underlying Search Engines before the result gets generated, the result sets that are returned by the engine lack in accuracy, efficiency and preciseness. The return set of result does not really satisfy the request of the user and results in frustration on the user’s side. A Large number of irrelevant links/pages get fetched, unwanted information, topic drift, and load on servers are some of the other issues that need to be caught and rectified towards developing an efficient and a smart search engine. The main objective of this paper is to propose or present a solution for the improvement of the existing crawling methodology that makes an attempt to reduce the amount of load on server by taking advantage of computational software processes known as “Migrating Agents” for downloading the related pages that are relevant to a particular topic only. The downloaded Pages are then provided a unique positive number i.e. called the page has been ranked, taking into consideration the combinational words that are synonyms and other related words, user preferences using domain profiles and the interested field of a particular user and past knowledge of relevance of a web page that is average amount of time spent by users. A solution is also been given in context to Image based web Crawling associating the Digital Image Processing technique with Crawling.
@artical{a522016ijcatr05021002,
Title = "An Effective Approach for Document Crawling With Usage Pattern and Image Based Crawling",
Journal ="International Journal of Computer Applications Technology and Research(IJCATR)",
Volume = "5",
Issue ="2",
Pages ="49 - 55",
Year = "2016",
Authors ="Ankur Tailang"}
  • A new and improved algorithm has been proposed for Document crawling.
  • A method for Image base crawling has also been proposed.
  • Morphological Image processing has been combined with the crawling technique.
  • Relevancy on the basis of User’s Personalization has been proposed.