Restaurant Recommendation System - Delta Architecture

ETL Data Engineering + Machine Learning Project

Last updated: 20 Nov 2022

Introduction

As an avid-user of recommedation system and a big fan of food in general, I have always been curious about 2 things:

What makes certain restaurants popular? Is there a way to quantify and even predict what makes restaurants popular?
How does the recommendation algorithm work? In this project, we will expand what attributes decide the famous of restaurants and this thing may be effect to the degree accuracy of recommendation systems ?

We then realised that the Yelp datasets, this dataset is a subset of Yelp's businesses, reviews, and user data. It was originally put together for the Yelp Dataset Challenge which is a chance for students to conduct research or analysis on Yelp's data and share their discoveries. In the most recent dataset you'll find information about businesses across 8 metropolitan areas in the USA and Canada.

Given the vast amount of review of Yelp's business, we want to be able to handle large amounts of data. Therefore, We decided to use this as an opportunity to learn and use Apache Spark, through Delta Architecture in ETL progress. This project will be done on Databricks. The data will also be stored on Databricks itself, through its data lake, Delta Lake. The machine learning models used will be from the pyspark.ml package, which works with PySpark dataframes.

This project will be split into the following parts:

Methodology
Data Engineering - ETL Pipeline to streaming data from Yelp'datasets into Delta Lake( Bronze, Silver, Gold, Platinum)
EDA
ALS model to recommend restaurant popularity
Conclusion

Architecture

Databricks Notebook

This notebook is the main file for this project that runs on the Databricks Platform. Beside, it also have 4 Kafka files.
For training notebook, visit this notebook.

Running project

Import dbc files to databricks before running. Or import Delta-Architecture.dbc to import full source files. Run the files in sequential order Install_Kafka > Kafka_Server_Start > Kafka_Producer > Kafka_Consumer

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
sources		sources
README.md		README.md
TLCN-report.docx		TLCN-report.docx
TLCN-report.pdf		TLCN-report.pdf
TLCN-slides.pdf		TLCN-slides.pdf
TLCN-slides.pptx		TLCN-slides.pptx
delta-architect.png		delta-architect.png
info-deploy.txt		info-deploy.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Restaurant Recommendation System - Delta Architecture

ETL Data Engineering + Machine Learning Project

Introduction

Architecture

Databricks Notebook

Running project

About

Releases

Packages

harrydevforlife/delta-architecture

Folders and files

Latest commit

History

Repository files navigation

Restaurant Recommendation System - Delta Architecture

ETL Data Engineering + Machine Learning Project

Introduction

Architecture

Databricks Notebook

Running project

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages