A Complete Guide For Web Automation With Puppeteer In Node.JS
What is Puppeteer
Puppeteer is a Node library that provides a high-level API to control Chrome or Chromium over the DevTools Protocol. Puppeteer runs headless by default but can be configured to run full (non-headless) Chrome or Chromium.
So basically, Puppeteer is a browser you run on Node.js. It contains APIs that mimic the browser. These APIs enable you to carry out different operations.
What can we do with a puppeteer?
• Generating PDF from a webpage.
• Generating screenshots from a webpage.
• Testing Chrome extensions.
• Web Scrapping.
• Form submission, UI testing, keyboard input, & other tasks may all be automated.
• Access web pages & extract information using the standard DOM API.
Getting Started
Setup
1. Make a folder (name it whatever).
2. Open the folder in your terminal or command prompt.
3. Run, npm init -y This will generate a package.json
4. Then run npm install puppeteer This will install puppeteer which includes Chromium.
When you install the puppeteer, it downloads the recent version of chromium that is guaranteed to work with the API.
Usage
Now we will learn how to use puppeteer with some code examples!
Examples
Example Code #1 – Take a screenshot and save the image
Let’s start with the first example where we will be navigating to https://www.wikipedia.org/, take a screenshot of the homepage, and save it as an example.png in the same directory.
const puppeteer = require('puppeteer'); (async () => { const browser = await puppeteer.launch({ headless: false }); const page = await browser.newPage(); await page.goto('https://www.wikipedia.org/'); await page.screenshot( { path: 'example.png', type: "png", fullPage: true }); await browser.close(); })();
We created a new browser instance using the launch API in the Puppeteer class instance, puppeteer.
const page = await browser.newPage()
Browsers can hold so many pages. As a result, the Browser newPage() method produces a new page in the default browser context… A page is an object of a page class.
Now, using the page object, we will load or navigate to the webpage that we want to take a screenshot of
Here, we are loading the Wikipedia home page. When the browser’s load event is fired, the method goto will resolve, indicating that the page has loaded correctly.
The screenshot method takes in some configurations:
Path: This indicates the file path where we want to save the image. Here, we will be saving at the current working directory.
type: Indicates the type of image encoding to use either png or jpeg.
Full Page: This will stretch the screenshot to the full width of the page.
Now save this code as example.js and use the below command to execute the code (then you will see that a screenshot is generated as shown below) node example.js
And here’s the resulting screenshot
Example Code #2 – Scrape Google search and get result links
Let’s see the second example where we will be navigating to https://www.google.com, and search on google and get links from it.
const puppeteer = require("puppeteer"); let browser; (async () => { const searchQuery = "stack overflow"; } browser = await puppeteer.launch({headless: false); const [page] = await browser.pages(); await page.goto("https://www.google.com/"); await page.waitForSelector('input[aria-label="Search"]', { visible: true }); await page.type('input[aria-label="Search"]', searchQuery); await Promise.all([ page.waitForNavigation(), page.keyboard.press("Enter"), ]); await page.waitForSelector(".LC20lb", { visible: true }); const searchResults = await page.evaluate(() => [...document.querySelectorAll(".LC20lb")].map(e => ({ title: e.innerText, link: e.parentNode.href }))); console.log(searchResults); })() .catch(err => console.error(err)) .finally(async () => await browser.close());
page.waitForSelector (selector)
selector string A selector of an element to wait.
page.type(selector, text[, options]);
Selector: selector of an element to type into. If more than one element matches the selector, the first one will be utilized.
Text: text to type into a focused element.
Options: Object number Time to wait between key presses in milliseconds. Defaults to 0.
page.evaluate(pageFunction[, …args]) Page Function: Function to be evaluated in the page context.
..arg: Arguments to pass to page function
Now save this code as example2.js and use the below command to execute the code then you will see that a scrape link is fetched.
node example2.js and here’s the result of scrapping.
[ { title: 'Stack Overflow - Where Developers Learn, Share, & Build ...', link: 'https://stackoverflow.com/' },{ title: '', link: 'https://whatis.techtarget.com/definition/stack-overflow' }, { title: '', link: 'https://medium.com/swlh/the-best-and-worst-ways-to-use-stack-overflow-711a077f2892' },{ title: '', link: 'https://stackoverflow.blog/2010/12/17/introducing-programmers-stackexchange-com/' }, { title: '', link: 'https://stackoverflow.blog/2021/03/17/stack-overflow-for-teams-is-now-free-forever-for-up-to-50-users/' }, { title: 'Stack Overflow Blog - Essays, opinions, and advice on the act ...', link: 'https://stackoverflow.blog/' },{ title: 'Stack Overflow - Wikipedia', link: 'https://en.wikipedia.org/wiki/Stack_Overflow' },{ title: 'Stack Overflow | LinkedIn', link: 'https://www.linkedin.com/company/stack-overflow' }, { title: 'Logo - Stacks', link: 'https://stackoverflow.design/brand/logo/' }, { title: 'Stack Overflow - Crunchbase Company Profile & Funding', link: 'https://www.crunchbase.com/organization/stack-overflow' } ]
Example Code #3 – Create a PDF of the page
Let’s see the third example where we will be navigating to https://www.wikipedia.org/, and make Pdf and save it as exaple3 .pdf in the same directory.
const puppeteer = require('puppeteer'); (async () => { const browser = await puppeteer.launch({headless:false, pipe: true, args: ['--headless', '--disable-gpu', '--no-sandbox', '--disable-setuid-sandbox', '--disable-dev-shm-usage'], }); const page = await browser.newPage(); await page.goto('https://www.wikipedia.org/', { waitUntil: 'networkidle2', }); await page.pdf({ path: 'example3.pdf', format: 'a4' }); await browser.close(); }) ();
Here are some of the options we used for the pdf() method:
print background: When this option is set to true, Puppeteer prints any background colors or images you have used on the web page to the PDF.
path: Path specifies where to save the generated PDF file. You can also store it into a memory stream to avoid writing to disk.
format: You can set the PDF format to one of the given options: Letter, A4, A3, A2, etc.
margin: You can specify a margin for the generated PDF with this option.
Now save this code as example3.js and use the below command to execute the code (then you will see that a PDF is generated)
node example3.js here is generated pdf :
https://drive.google.com/file/d/14yToS3Fd7jKxYOQCIRASkJGJSxgbVxzr/view
Now you have some idea how it works for more you can refer:- https://pptr.dev/
Thanks for every other magnificent article. The place else may just anyone get that type of information in such an ideal means of writing? I’ve a presentation next week, and I’m on the look for such information.
Wonderful article! 279845525
Excellent article! The right curly bracket is missing after “false” in line 5 of your second example.
Thank you for noticing it.
It is now updated.