Deployment Guide
This web scraping program uses a combination of LinkedIn and Bing APIs, along with a custom web scraping algorithm to generate a list of companies related to specific interest tags. The results are displayed using a custom algorithm to put companies Open Ecosystem Network would likely target (startups that are rapidly growing) first. The program uses a simple web interface to search and navigate the responses.
-
NEED TO DOWNLOAD AND UNZIP CHROMEDRIVER FOR YOUR PLATFORM:
depending on your chrome version you may also need to download a different version of the chrome driver. See the notes.txt files to see which version of Chrome is supported
Our code was tested on Mac OSX with Chrome v62.0.3202.94 with the following chromedriver: https://chromedriver.storage.googleapis.com/index.html?path=2.35/
-
Install Python 3.6.5: https://www.python.org/downloads/
-
Install Python Packages:
pip install -r requirements.txt
- To start the webserver, implement the following command:
python3 app.py
- In a web browser navigate to:
- https://localhost:5000
- Enter Query in search bar
GET: https://api.linkedin.com/v2/search?q=companiesV2
GET: https://api.cognitive.microsoft.com/bing/v7.0/search[?q][&count][&offset][&mkt][&safesearch]
GET: https://api.cognitive.microsoft.com/bing/v7.0/entities/[?q][&mkt][&count][&offset][&safesearch]
GET: https://logo.clearbit.com/:domain
GET: https://company.clearbit.com/v2/companies/find?domain=:domain
An accout is need to use the LinkedIn API. The Client ID and Secret are needed to get OAuth2 authorization. It is also recomended to set up a paid account. LinkedIn trottles API queries and a paid account will remove a lot of the limitations.
- Create a develop account for Linkedin: https://www.linkedin.com/developer/apps
- Enter your login information (username and password), client id and secrete in
config.py
on lines 17-20 - Run:
python3 get_linkedin_token.py
- Your token will be displated on stdout and will be valid for 60 days.
- Copy and paste this token into
config.py
on line 20.
Micosoft has a very long free trial period, but it is recommend to sign up for a paid account.
This program uses the bing entity search (https://azure.microsoft.com/en-us/services/cognitive-services/bing-entity-search-api/) and bing custom search (https://azure.microsoft.com/en-us/services/cognitive-services/bing-custom-search/)
The custom seach is used to use the search the entire internet for companies matching the interest tags. The entity search is used to retrive detailed information on the companies found using the custom search.
This program uses Bing instead of Google because there is no google API equivalent to a standard google search.
Clearbit API: https://clearbit.com
Clearbit has a family of APIs that can be used to get information about any company. They allow 50 free querries a month. Prices for a paid plan range from $100/month (2,500 queries/month) to $500/month (25,000 queries/month)
The also offer additional packages. One of their biggest advantages is their integration with SalesForce.
To use the API, copy your Clearbit key to the file app_clearbit.py