To build a model to accurately classify a piece of news as REAL or FAKE.
Using sklearn, build a TfidfVectorizer on the provided dataset. Then, initialize a PassiveAggressive Classifier and fit the model. In the end, the accuracy score and the confusion matrix tell us how well our model fares.
- news.zip: Unzip the Dataset to get news.csv
- news.csv: Dataset having fake and real news
- Real_or_Fake_News.ipynb: Jupyter Notebook containing all explanation and my workdoings
- train.py: Simply run this file to automatically train the model and generate vocabulary and model.pkl file to be saved for further
Could be run only once
Takes a command line argument taking the file name
Usage
python train.py news.csv
- predict.py: Run this file as much as you want. Uses the saved models to run, hence is much faster to execute.
P.S The larger the text, the better the chance of accurate prediction
python predict.py