Real and Fake News Classification Using Natural Language Processing
DOI:
https://doi.org/10.47750/pnr.2022.13.S03.236Keywords:
Natural Language Processing, Classification Purpose, Proposed SystemAbstract
The concept of Real and Fake news Classification and Detection is a domain which is still in the initial-development stage as compared to other projects of similar kind in this domain.ML or Machine Learning is a useful part of this project. The purpose of using these algorithms is to help the users to understand the various difficult and unyielding problems and to build Smart Artificial Intelligence and Machine Learning Systems to tackle problems for this concept. For the purpose of this research, we have used the concept of NLP along with two popular Machine Learning Algorithms for the purpose of the classification of real and fake news. They are Logistic Regression and Decision Tree Classifier. Other Algorithms like Random Forest, Support Vector Machine can also be used for this. The purpose of the project that has been built here is not to simply perform classification of the news articles as one cannot simply implement the ML algorithms and then predict whether the news is real or not. No, what has been done here is a clear-cut implementation and a mix of Data Science Tools as well as ML concepts for the classification as well as prediction of fake news. Various ML models will be implemented here for the prediction of news. The process of the classification will focus on using data science tools for pre-processing of the text and then using the results of the preprocessed dataset to build a improved model for the project. The major obstacle which was tackled whilst project was the lack of a properly processed dataset as well as a pre-defined model to differentiate between the two categories of news as mentioned in the title of the paper. For simplicity’s sake, some of the more commonly known ML algorithms and classifiers have been implemented on some datasets that are available on the internet. The results, when the ML models were implemented on the dataset, have been very encouraging and can prove to be very useful if any future work is done on this project or in this particular domain.