IJCATR Volume 7 Issue 12

Advanced Characteristic Analysis of Real Time Junk Occurrences in Twitter

Ancy S , Aruna Jasmine.J
10.7753/IJCATR0712.1001
keywords : cloud computing; Twitter ; Natural Language Tool Kit; Spam; statistical features

PDF
Spam on twitter is a major threat in recent days. To overcome these problems we take many steps to work on this. This work uses twitter as the input data source to address the problem of real-time. As twitter data contains a lot of spam, we built a dictionary of words to remove spam from the tweet social media. In order to solve these problem, we firstly carry out a deep analysis on the statistical features of taking training sets of data to differentiate spam tweet and non-spam tweet. Then we propose a approach called “NLTK(Natural Language Tool Kit). The proposed approach can discover “changed” spam posts from unlabeled posts and incorporate them into classifier’s training process. To evaluate the proposed scheme many experiments were carried out. The results show that our proposed NLTK can remarkably improve the spam detection accuracy in real-world scenario
@artical{a7122018ijcatr07121001,
Title = "Advanced Characteristic Analysis of Real Time Junk Occurrences in Twitter",
Journal ="International Journal of Computer Applications Technology and Research(IJCATR)",
Volume = "7",
Issue ="12",
Pages ="412 - 434",
Year = "2018",
Authors ="Ancy S , Aruna Jasmine.J"}