-
Notifications
You must be signed in to change notification settings - Fork 17
Home
This extension is an interactive web scrapping tool targeting text/image heavy sites (such as web novels, documentation, blogs, etc).
I specifically wrote this tool to create Epubs for offline reading. As of now, I have gotten it to work on:
- Novel Updates
- Wuxiaworld
- Royal Road
- Python documentation
- Firefox: https://addons.mozilla.org/en-US/firefox/addon/epublifier/
- Chrome: https://chrome.google.com/webstore/detail/epublifier/eopjnahefjhnhfanplcjpbbdkpbagikk
- Support novels with many chapters (tested up to 300 chapters)
- Downloads and embeds images
- Can selectively parse/compile chapters with check boxes
- Automatically catches main content with readability.js
- Cover image, author, title, description are auto-parsed from some sites.
- Novel Updates
- Royal Road
- Configurable parsers for list of links or webapps.
- 🛈 (For novelupdates) Click on the ☰ menu button (Show all chapters) above the chapter list
- Click Epublifier's icon on your browser's extension bar, which will open a popup. It will automatically try to load the series' metadata.
- Select some chapters (or all of them)
- 🛈 You may use Shift+Click to select a range of chapters to include or delete.
- Click
Parse
, if all is well, the parsed column should turn from circle to checkmark - Click
Epub
to generate the ePub as a download
- Navigate to the first chapter of a series
- Click Epublifier's icon on your browser's extension bar, which will open a popup.
- Go to the
Add Page Parser
options tab, configure time to wait (depending on your internet speed, this supports decimals), and maximum chapter to parse with each click. - Click the
Add This Page
button to parse # of chapters as defined in max chaps. - Select a list of chaps with the check box or shift-click
- Click
Epub
to generate the ePub as a download
- Go to any series table of contents page
- Click Epublifier's icon on your browser's extension bar, which will open a popup. It will automatically try to load the series' metadata.
- Select some chapters (or all of them)
- 🛈 You may use Shift+Click to select a range of chapters to include or delete.
- Click
Parse
, if all is well, the parsed column should turn from circle to checkmark - Click
Epub
to generate the ePub as a download
If a website has a list of links that can be defined user query selectors and regex on the link text, you can try the Chapter Links
parser
- Click Epublifier's icon on your browser's extension bar, which will open a popup. It will say "No parser available"
- Go to the
Links Parser
tab, and selectChapter Links
in the list box - Configure regex and query selector for links
- Click the
(Re)Parse links
button - Select some chapters (or all of them)
- 🛈 You may use Shift+Click to select a range of chapters to include or delete.
- Click
Parse
, if all is well, the parsed column should turn from circle to checkmark - Click
Epub
to generate the ePub as a download
If a webapp has a Next
button, you can try the Add page parser
- Start from the first chapter/page of the book
- Click Epublifier's icon on your browser's extension bar, which will open a popup. It will say "No parser available"
- Select Parse as
app
- Open the
Add Page Parser
tab - Click the search button next the
Next Element
textbox, and select the next button, it should highlight red - Click the search button next to the
Title Element
textbox, and select the title text, or leave it blank and have the parser try to auto detect - Click
Add This Page
button until you reach the end
Note: You DO NOT need to click the "Parse" button for webapps, the "Add Page" parser does the parsing automatically.
- Select a list of chaps with the check box or shift-click
- Click
Epub
to generate the ePub as a download
Warning: Advanced configuration requires javascript knowledge
Currently this extension does not save any modification to the parser definition, so keep a copy locally.
main_def
- This object defines all the parsers in the current file. If you add a new parser, it must go in here.
Detector - The detector function tries to detect which parser to set for the current page.
There are two types of parsers:
- Links parser - This type of parser detects a list of links given a website URL and DOM
- Text parser - This type of parser extracts text content from a URL and DOM
Links parser returns a list of chapters, see one of the two examples.
Mostly just extracts text.