Skip to content

Automatically download all PDF files of searching results & their patent families found on Google Patents.

License

Notifications You must be signed in to change notification settings

wenyalintw/Google-Patents-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spoken-Digit Recognizer

Google Patents Scraper

(1) Automatically download all PDF files of searching results & their patent families.
(2) Generate an overview report of searching results.

Table of contents

Application Demo

Introduction

This application scrape Google Patents by two steps:

  • Set Proxy (Optional)
  • Search & Download Patents

Set Proxy (Optional)

  • Set proxy to avoid current ip blocked by Google Patents

preprocessing

Search & Download Patents

  • Select an output directory to store downloaded/generated files
  • Search whatever you like (search terms' format same as Google Patents)
  • Download PDF files of searching results & their patent families

PDF files and auto-generated overview.md will then be stored in selected directory

preprocessing

File Structure of Output Directory

├── PDFs
│   ├── CN104321947A.pdf
│   ├── ...
│   └── readme.txt
├── Family_PDFs
│   ├── CN104321947A's\ Family
│   │   ├── EP2850716B1.pdf
│   │   ├── ...
│   │   └── readme.txt
│   ├── ...
│   └── ...
└── overview.md

Built With

Modules besides python built-ins

Getting Started

Prerequisites

Installation

  • Clone the repo
git clone https://github.com/wenyalintw/Google-Patents-Scraper.git
pip install -r /path/to/requirements.txt
  • Ready to go
cd src
python main.py

Acknowledgments

MIT License (2019), Wen-Ya Lin

Releases

No releases published

Packages

No packages published

Languages