IJCATR Volume 3 Issue 7

Efficient Web Data Extraction

Yogita R.Chavan
10.7753/IJCATR0307.1018
keywords : auxiliary information, data extraction, DOM Tree, record alignment.

PDF
Web data extraction is an important problem for information integration as multiple web pages may present the same or similar information using completely different formats or syntaxes that make integration of information a challenging task. Hence the need of a system that automatically extracts the information from web pages is vital. Several efforts have already been carried out and used in the past. Some of the techniques are record level while the others are page level. This paper shows the work aims at extracting useful information from web pages using the concepts of tags and values. To avoid discarding of non-matching first node that represents non auxiliary information in the data region an efficient algorithm is proposed.
@artical{y372014ijcatr03071018,
Title = "Efficient Web Data Extraction",
Journal ="International Journal of Computer Applications Technology and Research(IJCATR)",
Volume = "3",
Issue ="7",
Pages ="483 - 487",
Year = "2014",
Authors ="Yogita R.Chavan"}
  • null