Steer into Web Automation with Puppeteer in node JS

A Complete Guide For Web Automation With Puppeteer In Node.JS

Quick Summary: Learn Puppeteer in Node.js and learn web automation with our complete guide! As you proceed through the step-by-step instructions and real-world examples, you can unleash the potential of seamless online automation. This manual is your key to effective web automation, covering everything from installing Puppeteer to automating interactions, data scraping, and handling dynamic content. Today, utilize Puppeteer’s ability to boost your Node.js applications.

Introduction

Why are you here? Because you were searching for different ways of web automation, or mainly web automation in Node.js?

Well yes… I am right!

Therefore, the capacity to automate web interactions is not just a benefit but also a requirement in this age of technological change. Whether you are a developer, a tester, or someone seeking to simplify repetitive tasks or provide Node.js development services, I will explain how to do web automation or scraping in Puppet.

However, when people compare other technologies with Node.js, web automation in Node.js is the best option for you.

Here, I will explain web automation with clear explanations and real-world examples.

Just read on!

What is Puppeteer

Puppeteer is a Node library that provides a high-level API to control Chrome or Chromium over the DevTools Protocol. Puppeteer runs headless by default but can be configured to run full (non-headless) Chrome or Chromium.

So basically, Puppeteer is a browser you run on Node.js. It contains APIs that mimic the browser. These APIs enable you to carry out different operations.

What can we do with a puppeteer?

  • Generating PDF from a webpage.
  • Generating screenshots from a webpage.
  • Testing Chrome extensions.
  • Web Scrapping.
  • Form submission, UI testing, keyboard input, & other tasks may all be automated.
  • Access web pages & extract information using the standard DOM API.

Revolutionize your business with our custom software solutions

Let’s Start

Setup

1. Make a folder (name it whatever).
2. Open the folder in your terminal or command prompt.
3. Run, npm init -y This will generate a package.json
4. Then run npm install puppeteer This will install puppeteer which includes Chromium.

Upon installing Puppeteer, it downloads the latest version of Chromium. And it will ensure you that chromium works with API.

Usage

Now we will learn how to use puppeteer with some code examples!

Examples

Example Code #1 – Take a screenshot and save the image

Let’s start with the first example where we will be navigating to https://www.wikipedia.org/, take a screenshot of the homepage, and save it as an example.png in the same directory.

const puppeteer = require('puppeteer');
        (async () => {
        const browser = await puppeteer.launch({ headless: false });
        const page = await browser.newPage();
        await page.goto('https://www.wikipedia.org/');
        await page.screenshot(
        {
        path: 'example.png',
        type: "png",
        fullPage: true
        });
        await browser.close();
        })();

Now we create a new browser instance using the launch API in the Puppeteer class, Puppeteer.

const page = await browser.newPage()

Browsers can hold so many pages. As a result, the Browser newPage() method produces a new page in the default browser context… A page is an object of a page class.

Now, using the page object, we will load or navigate to the webpage that we want to take a screenshot of

Here, we are loading the Wikipedia home page. When the browser’s load event activates, the ‘goto’ method will resolve, indicating the successful loading of the page.

The screenshot method takes in some configurations:

Path: This indicates the file path where we want to save the image. Here, we will be saving at the current working directory.

type: Indicates the type of image encoding to use either png or jpeg.

Full Page: ­ This will stretch the screenshot to the full width of the page.

Save this code as ‘example.js’ and execute it using the command below to generate a screenshot: node example.js. This will result in the generation of a screenshot, as depicted below.

Example Code #2 – Scrape Google search and get result links

Let’s see the second example where we will be navigating to https://www.google.com, and search on google and get links from it.

const puppeteer = require("puppeteer");
        let browser;
        (async () => {
        const searchQuery = "stack overflow";
        }
        browser = await puppeteer.launch({headless: false);
        const [page] = await browser.pages();
        await page.goto("https://www.google.com/");
        await page.waitForSelector('input[aria-label="Search"]', {
        visible: true
        });
        await page.type('input[aria-label="Search"]', searchQuery);
        await Promise.all([
        page.waitForNavigation(),
        page.keyboard.press("Enter"),
        ]);
        await page.waitForSelector(".LC20lb", {
        visible: true
        });
        const searchResults = await page.evaluate(() => [...document.querySelectorAll(".LC20lb")].map(e => ({
        title: e.innerText,
        link: e.parentNode.href
        })));
        console.log(searchResults);
        })()
        .catch(err => console.error(err))
        .finally(async () => await browser.close());

page.waitForSelector (selector)

selector string A selector of an element to wait.

page.type(selector, text[, options]);

Selector: selector of an element to type into. If more than one element matches the selector, you can utilize the first one.

Text: text to type into a focused element.

Options: Object number Time to wait between key presses in milliseconds. Defaults to 0.

page.evaluate(pageFunction[, …args]) Page Function: Function to be evaluated in the page context.

..arg: Arguments to pass to page function

Now save this code as example2.js and use the below command to execute the code then you will see that a scrape link is fetched.

node example2.js and here’s the result of scrapping.

[
        {
        title: 'Stack Overflow - Where Developers Learn, Share, & Build ...',
        link: 'https://stackoverflow.com/'
        },{
        title: '',
        link: 'https://whatis.techtarget.com/definition/stack-overflow'
        }, {
        title: '',
        link: 'https://medium.com/swlh/the-best-and-worst-ways-to-use-stack-overflow-711a077f2892'
        },{
        title: '',
        link: 'https://stackoverflow.blog/2010/12/17/introducing-programmers-stackexchange-com/'
        }, {
        title: '',
        link: 'https://stackoverflow.blog/2021/03/17/stack-overflow-for-teams-is-now-free-forever-for-up-to-50-users/'
        }, {
        title: 'Stack Overflow Blog - Essays, opinions, and advice on the act ...',
        link: 'https://stackoverflow.blog/'
        },{
        title: 'Stack Overflow - Wikipedia',
        link: 'https://en.wikipedia.org/wiki/Stack_Overflow'
        },{
        title: 'Stack Overflow | LinkedIn',
        link: 'https://www.linkedin.com/company/stack-overflow'
        }, {
        title: 'Logo - Stacks',
        link: 'https://stackoverflow.design/brand/logo/'
        }, {
        title: 'Stack Overflow - Crunchbase Company Profile & Funding',
        link: 'https://www.crunchbase.com/organization/stack-overflow'
        }
        ]

Example Code #3 – Create a PDF of the page

Let’s see the third example where we will be navigating to https://www.wikipedia.org/, and make Pdf and save it as exaple3 .pdf in the same directory.

const puppeteer = require('puppeteer');
        (async () => {
        const browser = await puppeteer.launch({headless:false,
        pipe: true, 
        args: ['--headless', '--disable-gpu',
        '--no-sandbox', '--disable-setuid-sandbox', '--disable-dev-shm-usage'],
        });
        
        const page = await browser.newPage();
        await page.goto('https://www.wikipedia.org/', {
        waitUntil: 'networkidle2',
        });
        
        await page.pdf({ path: 'example3.pdf', format: 'a4' });
        await browser.close();
        }) ();

Here are several options you can employ for the ‘pdf()’ method:

print background: When this option is true, Puppeteer prints any background colors or images that you have use on the web page to the PDF
path: Path specifies where to save the generated PDF file. You can also store it into a memory stream to avoid writing to disk.

format: You can set the PDF format to one of the given options: Letter, A4, A3, A2, etc.

margin: You can specify a margin for the generated PDF with this option.

Now save this code as example3.js and use the below command to execute the code (then you will see that a PDF is generated)

https://drive.google.com/file/d/14yToS3Fd7jKxYOQCIRASkJGJSxgbVxzr/view

Now you have some idea how it works for more you can refer:- https://pptr.dev/

Get your copy of the ultimate guide to web automation with Puppeteer in Node.JS

Conclusion

So, you are now aware that Puppeteer in Node.js can effectively accomplish web automation. To make your web automation even more rewarding, consider leveraging BigScal. Additionally, BigScal amplifies Puppeteer’s potential by providing a user-friendly platform that simplifies automation at scale. Further, From effortless script management to distributed execution and insightful analytics, BigScal streamlines your automation processes, empowering you to handle complex tasks easily.

FAQ

Yes, you can use XPath in Puppeteer to locate elements on a web page. While Puppeteer primarily uses CSS selectors for element identification, it also provides the page.$x() function to execute XPath queries. This allows you to navigate the DOM and interact with elements using XPath syntax. However, using CSS selectors is generally recommended for better performance and compatibility unless XPath is specifically needed.

Puppeteer is a Node.js library developed by Google that provides a high-level API for automating web browser tasks. It enables developers to control headless versions of Chrome or Chromium browsers programmatically. Puppeteer is commonly used for tasks such as web scraping, automating UI testing, taking screenshots, generating PDFs, and interacting with web pages. It offers powerful tools for simulating user interactions and extracting data from websites.

XPath in Puppeteer refers to the ability to use XPath expressions to locate and interact with elements on web pages. While Puppeteer primarily relies on CSS selectors, it also provides the page.$x() function to execute XPath queries. This allows developers to traverse and manipulate the Document Object Model (DOM) of a webpage using XPath syntax, offering an alternative approach to selecting and interacting with elements compared to CSS selectors.

Playwright and Puppeteer are both browser automation libraries, but Playwright is generally considered more advanced. Playwright supports multiple browsers (Chromium, Firefox, WebKit), offers better speed, and has built-in features for robust automation. While Puppeteer is simpler and more widely known, Playwright’s capabilities make it a favorable choice for complex scenarios. The choice between them depends on the project’s requirements, browser compatibility needs, and desired features.

Yes, Puppeteer can work in a Docker container. However, when using Puppeteer inside a Docker container, you need to ensure that you have set up the necessary dependencies and configurations correctly. This includes installing required libraries and ensuring the appropriate sandboxing settings. Some additional setup might be needed to ensure smooth operation within the Docker environment, but Puppeteer can indeed be used effectively in containerized applications.