Web Automation with Puppeteer: A Complete Guide

Q: Does Puppeteer work in Docker?

Correct, Puppeteer does get along in a Docker container.However, with the help of Puppeteer tests might still become success in the case of applications.

Quick Summary: Learn Puppeteer in Node.js and learn web automation with our complete guide! As you proceed through the step-by-step instructions and real-world examples, you can unleash the potential of seamless online automation. This manual is your key to effective web automation, covering everything from installing Puppeteer to automating interactions, data scraping, and handling dynamic content. Today, utilize Puppeteer’s ability to boost your Node.js applications.

Introduction

Why are you here? Because you were searching for different ways of web automation, or mainly web automation in Node.js?

Well yes… I am right!

Therefore, the capacity to automate web interactions is not just a benefit but also a requirement in this age of technological change. Whether you are a developer, a tester, or someone seeking to simplify repetitive tasks or provide Node.js development services, I will explain how to do web automation or scraping in Puppet.

However, when people compare other technologies with Node.js, web automation in Node.js is the best option for you.

Here, I will explain web automation with clear explanations and real-world examples.

Just read on!

What is Puppeteer

Puppeteer is a Node library that provides a high-level API to control Chrome or Chromium over the DevTools Protocol. Puppeteer runs headless by default but can be configured to run full (non-headless) Chrome or Chromium.

So basically, Puppeteer is a browser you run on Node.js. It contains APIs that mimic the browser. These APIs enable you to carry out different operations.

What can we do with a puppeteer?

Generating PDF from a webpage.
Generating screenshots from a webpage.
Testing Chrome extensions.
Web Scrapping.
Form submission, UI testing, keyboard input, & other tasks may all be automated.
Access web pages & extract information using the standard DOM API.

Let’s Start

Setup

Make a folder (name it whatever).
Open the folder in your terminal or command prompt.
Run, npm init -y This will generate a package.json
Then run npm install puppeteer This will install puppeteer which includes Chromium.

Upon installing Puppeteer, it downloads the latest version of Chromium. And it will ensure you that chromium works with API.

Usage

Now we will learn how to use puppeteer with some code examples!

Examples

Example Code #1 – Take a screenshot and save the image

Let’s start with the first example where we will be navigating to https://www.wikipedia.org/, take a screenshot of the homepage, and save it as an example.png in the same directory.

const puppeteer = require('puppeteer');
        (async () => {
        const browser = await puppeteer.launch({ headless: false });
        const page = await browser.newPage();
        await page.goto('https://www.wikipedia.org/');
        await page.screenshot(
        {
        path: 'example.png',
        type: "png",
        fullPage: true
        });
        await browser.close();
        })();

Now we create a new browser instance using the launch API in the Puppeteer class, Puppeteer.

const page = await browser.newPage()

Browsers can hold so many pages. As a result, the Browser newPage() method produces a new page in the default browser context… A page is an object of a page class.

Now, using the page object, we will load or navigate to the webpage that we want to take a screenshot of

Here, we are loading the Wikipedia home page. When the browser’s load event activates, the ‘goto’ method will resolve, indicating the successful loading of the page.

The screenshot method takes in some configurations:

Path: This indicates the file path where we want to save the image. Here, we will be saving at the current working directory.

type: Indicates the type of image encoding to use either png or jpeg.

Full Page: This will stretch the screenshot to the full width of the page.

Save this code as ‘example.js’ and execute it using the command below to generate a screenshot: node example.js. This will result in the generation of a screenshot, as depicted below.

Example Code #2 – Scrape Google search and get result links

Let’s see the second example where we will be navigating to https://www.google.com, and search on google and get links from it.

const puppeteer = require("puppeteer");
        let browser;
        (async () => {
        const searchQuery = "stack overflow";
        }
        browser = await puppeteer.launch({headless: false);
        const [page] = await browser.pages();
        await page.goto("https://www.google.com/");
        await page.waitForSelector('input[aria-label="Search"]', {
        visible: true
        });
        await page.type('input[aria-label="Search"]', searchQuery);
        await Promise.all([
        page.waitForNavigation(),
        page.keyboard.press("Enter"),
        ]);
        await page.waitForSelector(".LC20lb", {
        visible: true
        });
        const searchResults = await page.evaluate(() => [...document.querySelectorAll(".LC20lb")].map(e => ({
        title: e.innerText,
        link: e.parentNode.href
        })));
        console.log(searchResults);
        })()
        .catch(err => console.error(err))
        .finally(async () => await browser.close());

page.waitForSelector (selector)

selector string A selector of an element to wait.

page.type(selector, text[, options]);

Selector: selector of an element to type into. If more than one element matches the selector, you can utilize the first one.

Text: text to type into a focused element.

Options: Object number Time to wait between key presses in milliseconds. Defaults to 0.

page.evaluate(pageFunction[, …args]) Page Function: Function to be evaluated in the page context.

..arg: Arguments to pass to page function

Now save this code as example2.js and use the below command to execute the code then you will see that a scrape link is fetched.

node example2.js and here’s the result of scrapping.

[
        {
        title: 'Stack Overflow - Where Developers Learn, Share, & Build ...',
        link: 'https://stackoverflow.com/'
        },{
        title: '',
        link: 'https://whatis.techtarget.com/definition/stack-overflow'
        }, {
        title: '',
        link: 'https://medium.com/swlh/the-best-and-worst-ways-to-use-stack-overflow-711a077f2892'
        },{
        title: '',
        link: 'https://stackoverflow.blog/2010/12/17/introducing-programmers-stackexchange-com/'
        }, {
        title: '',
        link: 'https://stackoverflow.blog/2021/03/17/stack-overflow-for-teams-is-now-free-forever-for-up-to-50-users/'
        }, {
        title: 'Stack Overflow Blog - Essays, opinions, and advice on the act ...',
        link: 'https://stackoverflow.blog/'
        },{
        title: 'Stack Overflow - Wikipedia',
        link: 'https://en.wikipedia.org/wiki/Stack_Overflow'
        },{
        title: 'Stack Overflow | LinkedIn',
        link: 'https://www.linkedin.com/company/stack-overflow'
        }, {
        title: 'Logo - Stacks',
        link: 'https://stackoverflow.design/brand/logo/'
        }, {
        title: 'Stack Overflow - Crunchbase Company Profile & Funding',
        link: 'https://www.crunchbase.com/organization/stack-overflow'
        }
        ]

Example Code #3 – Create a PDF of the page

Let’s see the third example where we will be navigating to https://www.wikipedia.org/, and make Pdf and save it as exaple3 .pdf in the same directory.

const puppeteer = require('puppeteer');
        (async () => {
        const browser = await puppeteer.launch({headless:false,
        pipe: true, 
        args: ['--headless', '--disable-gpu',
        '--no-sandbox', '--disable-setuid-sandbox', '--disable-dev-shm-usage'],
        });
        
        const page = await browser.newPage();
        await page.goto('https://www.wikipedia.org/', {
        waitUntil: 'networkidle2',
        });
        
        await page.pdf({ path: 'example3.pdf', format: 'a4' });
        await browser.close();
        }) ();

Here are several options you can employ for the ‘pdf()’ method:

print background: When this option is true, Puppeteer prints any background colors or images that you have use on the web page to the PDF
path: Path specifies where to save the generated PDF file. You can also store it into a memory stream to avoid writing to disk.

format: You can set the PDF format to one of the given options: Letter, A4, A3, A2, etc.

margin: You can specify a margin for the generated PDF with this option.

Now save this code as example3.js and use the below command to execute the code (then you will see that a PDF is generated)

https://drive.google.com/file/d/14yToS3Fd7jKxYOQCIRASkJGJSxgbVxzr/view

Now you have some idea how it works for more you can refer:- https://pptr.dev/

Conclusion

Now that the name of the library is known, Node.js is an application. Through its scripting capabilities, it is possible to automate everything over the web. A wise way to reboot your e-Commerce might be using BigScal opportunities. BigScans also propagates the possibilities of Puppeteer by availing a platform which is user friendly and helps in managing automation at the large scale. Also, BigScal employs convenient script management from end-to-end distributive operations down to intelligent analytics to make the automation processes simpler and enable you to tackle complex jobs.

FAQ

Can we use Xpath in Puppeteer?

For yes, XPath can fit in with Puppeteer in the search for webpage elements. But Puppeteer only provides the means of locating the elements by means of CSS selectors, it also has the page. Create a custom function by the name $x() to execute XPath queries. This way you can walk around the DOM as well as interacting with elements through xpath declaration syntax. However, the use of CSS selectors is advised if for performance reason and compatibility unless a particular situation demands XPath.

What is Puppeteer in automation?

Puppeteer is a Node. its api gives the highest level of abstraction for the automation of browser actions through javascript. It is an open-source library developed by Google. To put it simply, it enables developers to manage a programmable version of the Chrome or Chromium browsers. Puppeteer is popular among Web Developers for web scraping, UI testing. It also generates images, and any other task necessary to interact with webpages. It delivers for developers the powerful mechanisms for scenario building and extracting data from websites.

What is XPath in Puppeteer?

XPath via in Puppeteer means using XPath statements to get and control the elements on a webpage. Most of “.puppeteer’s” functionality is based on CSS selectors, but it also affords the ability to traverse the elements of the page. x() function for the purpose of XPath queries execution.

Which is better: playwright or Puppeteer?

Playwright and Puppeteer are two browser automation libraries. However, Playwright is superior game to other browsers. The browser from the playwright (as it is not built on chromium, firefox or webkit). It will be a lot faster and has features for more robust automation Wherein the Puppeteer is easy and widely available. Playwright has a higher level of abstraction that allows it to cover more complex use-cases better. The decision as to which one to use mainly is influenced by the project needs level, the browser compatibly and the features expectancy.

Does Puppeteer work in Docker?

Correct, Puppeteer does get along in a Docker container. However, with the help of Puppeteer tests might still become success in the case of applications.