This project is a search engine built from the ground up that is capable of handling tens of thousands of the UC Irvine's Information and Computer Sciences department, under harsh operational constriants and having a query response time under 300ms.
We created two separate programs: an indexer and a search component. Running the indexer across an entire entire collection of crawled pages, we were able to prompt the user for a query using a web GUI and respond with a list of URLs where the query appeared.
Use the package manager pip to install nltk, BeautifulSoup, pandas, and Flask.
pip install --user -U nltk
pip install beautifulsoup4
pip install pandas
pip install Flask
In order to run this program, run gui.py. In order to check if app is running, go to localhost:5000/.