Skip to content

vdelimad/community-prediction-with-nlp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Community Prediction with NLP: Text Classification Algorithms Using Reddit Data

ANLY-512 Group 4

  • Victor De Lima (vad49)
  • Matt Moriarty (mdm341)

Abstract

In the study, we use Natural Language Processing (NLP) to determine whether we can accurately classify Reddit posts into their respective subreddit by analyzing the language contained in the post's text. We conducted the research by employing Term Frequency methods and supervised learning algorithms. Our findings show that the language individuals use when participating in discussions within a community context provides sufficient information for models to make excellent predictions. We also offer an overview of the details related to model construction and explore the implications of the results.

Repository Contents

Our repository is organized with the following structure:

  • Code/: This folder contains all code associated with our analysis, including data collection, cleaning, and modeling
  • Data/: This folder contains the data associated with our analysis, excluding the raw Reddit data, which totals 300 MB
  • Output/: This folder contains images associated with the results of our analysis, grouped by the type of model used
  • Poster/: This folder contains the poster associated with our analysis
  • Report/: This folder contains all files required for rendering our report in Quarto

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published