Community Prediction with NLP: Text Classification Algorithms Using Reddit Data

ANLY-512 Group 4

Victor De Lima (vad49)
Matt Moriarty (mdm341)

Abstract

In the study, we use Natural Language Processing (NLP) to determine whether we can accurately classify Reddit posts into their respective subreddit by analyzing the language contained in the post's text. We conducted the research by employing Term Frequency methods and supervised learning algorithms. Our findings show that the language individuals use when participating in discussions within a community context provides sufficient information for models to make excellent predictions. We also offer an overview of the details related to model construction and explore the implications of the results.

Repository Contents

Our repository is organized with the following structure:

Code/: This folder contains all code associated with our analysis, including data collection, cleaning, and modeling
Data/: This folder contains the data associated with our analysis, excluding the raw Reddit data, which totals 300 MB
Output/: This folder contains images associated with the results of our analysis, grouped by the type of model used
Poster/: This folder contains the poster associated with our analysis
Report/: This folder contains all files required for rendering our report in Quarto

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
code		code
data		data
docs		docs
output		output
.gitignore		.gitignore
.nojekyll		.nojekyll
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Community Prediction with NLP: Text Classification Algorithms Using Reddit Data

ANLY-512 Group 4

Abstract

Repository Contents

About

Releases

Packages

Languages

vdelimad/community-prediction-with-nlp

Folders and files

Latest commit

History

Repository files navigation

Community Prediction with NLP: Text Classification Algorithms Using Reddit Data

ANLY-512 Group 4

Abstract

Repository Contents

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages