Skip to content

Puppeteer is one of the leading tools for automating browsers. Puppeteer is great for E2E testing but, when it comes to scarping / automation, there are few things we need to do.

Notifications You must be signed in to change notification settings

linvo-io/safer-puppeteer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

Puppeteer is one of the leading tools for automating browsers.
Check out also Selenium and Cypress.

This article assumes you have already worked with Puppeteer before.

Puppeteer is great for E2E testing but, when it comes to scarping / automation, there are few things we need to do.

So let’s start from the basics.

1). User-Agent:

By default your Puppeteer User-Agent is

“Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/79.0.3945.0 Safari/537.36”

You can understand that: This is headless chrome on Linux.

Make sure you change it to something closer to:

“‘Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36′”.

2). Using a proxy:

By default, Puppeteer allows you to proxy your entire browser.
That isn’t good because we don’t want to open a new browser for every task.

In case you want to proxy a single web page, you can check puppeteer-page-proxy.
It has problems with Content Security Policy, and I don’t really find it safe.

We don’t want to open a new browser for every task (it will consume all your memory).

We need to use a backbone network.

What is a backbone network?

You can look at it as a proxy for proxies.

You have one web-address.
When you proxy this address, it will ask you for a username and password.

According to your username and password it will forward your request to a different IP.

Your Puppeteer launch should look like this:

puppeteer.launch({

headless: true,

args: [

“–proxy-server=http://proxy-backbone.server”,

]

});

After you open a new page, you should write:

page.authenticate({

username: ‘username’,

password: ‘password’

});

In case you want to manage your proxies yourself, check Apify library proxy-chain.

If your automation is done via a simple HTTP website (not SSL), check out Nikolai Tschacher tutorial about proxies.

We are currently using Webshare.io.
Check out also PrivateProxy.me.

3). Libraries that will boost-up your safety

Puppeteer-extra is an excellent extension Library for Puppeteer**.**

You can use it with puppeteer-extra-plugin-stealth,

They solve the Mouse & Cat problem.

Also, make sure you use puppeteer-extra-plugin-minmax.
It just another thing to make your behavior looks more natural.

4). Use more of Puppeteer native clicks and typing.
Try to avoid page.evaluate to make actions.
Clicking on Elements without Puppeteer is inhuman, which might be detectable.

5). Try to avoid patterns

Don’t scrape a website every X seconds, do it more randomly.

Change your actions steps in every try to make it looks more random.

6). Make sure you don’t modify the elements on the page.

The article can be found at Linvo.io

About

Puppeteer is one of the leading tools for automating browsers. Puppeteer is great for E2E testing but, when it comes to scarping / automation, there are few things we need to do.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published