scraping-topcv

This is a mini project which aims to crawl basic info about newest IT jobs on TopCV. The crawled data will be imported into PostgreSQL database.

About the project

Data crawled from each job posting include:

job_id: Job posting ID, as stored in their server backend.
job_title: Job title.
company: Recruiter company.
salary_min, salary_max: Salary range (in million VND).
yrs_of_exp_min, yrs_of_exp_max: Years of experience required.
job_city: Working location (city).
due_date: Deadline for application.
jd: Job description.

Moreover, each entry in the PostgreSQL database also has:

created_at: Timestamp at which the record was created (in GMT+07).
last_modified: Timestamp of most recent modification to record (in GMT+07).

Required programs

git.
docker with docker-compose.

Usage

Clone the git repository

git clone https://github.com/minkminkk/scraping-topcv.git

Set up

To initialize database and crawler, run:

docker compose up

PostgreSQL database with the required table will be set up, then the crawler will start crawling.

Tear down

After you are done, run:

docker compose down

The containers and network will be deleted.

Note

The crawler is not yet able to crawl the whole data as TopCV limits the request rate. In the future, crawling using rotating proxies could be implemented to overcome this.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
pg-setup		pg-setup
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scraping-topcv

About the project

Required programs

Usage

Clone the git repository

Set up

Tear down

Note

About

Releases

Packages

Languages

License

minkminkk/scraping-topcv

Folders and files

Latest commit

History

Repository files navigation

scraping-topcv

About the project

Required programs

Usage

Clone the git repository

Set up

Tear down

Note

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages