IJCATR Volume 6 Issue 4

Unstructured Datasets Analysis: Thesaurus Model

Parvathy Gopakumar , Neethu Maria John , Vinodh P Vijayan
keywords : Hadoop;MapReduce;HDFS;NoSQL;Hadoop-Streaming

Mankind has stored more than 295 billion gigabytes (or 295 Exabyte) of data since 1986, as per a report by the University of Southern California. Storing and monitoring this data in widely distributed environments for 24/7 is a huge task for global service organizations. These datasets require high processing power which can’t be offered by traditional databases as they are stored in an unstructured format. Although one can use Map Reduce paradigm to solve this problem using java based Hadoop, it cannot provide us with maximum functionality. Drawbacks can be overcome using Hadoop-streaming techniques that allow users to define non-java executable for processing this datasets. This paper proposes a THESAURUS model which allows a faster and easier version of business analysis.
Title = "Unstructured Datasets Analysis: Thesaurus Model",
Journal ="International Journal of Computer Applications Technology and Research(IJCATR)",
Volume = "6",
Issue ="4",
Pages ="172 - 212",
Year = "2017",
Authors ="Parvathy Gopakumar , Neethu Maria John , Vinodh P Vijayan"}
  • Studies the existing bigdata analysis framework
  • Specify the advantages of Hadoop –Streaming and why to use it
  • Proposes a new bigdata analysis framework for industry needs-Thesaurus Model
  • Specify a road map of analysis framework with example