You can use Puppeteer, a Node.js library for controlling headless browsers, to extract text content from a webpage. Here's an example code snippet that shows how to get text content from a webpage using Puppeteer:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
// Wait for the page to load
await page.waitForSelector('body');
// Get the text content of the body element
const textContent = await page.evaluate(() => {
return document.querySelector('body').textContent;
});
console.log(textContent);
await browser.close();
})();
In this code snippet, we first launch a headless browser, create a new page, and navigate to the desired webpage. We then wait for the page to load completely using page.waitForSelector
, and use page.evaluate
to run a function in the context of the page to extract the text content of the <body>
element.
Finally, we log the extracted text content and close the browser. You can modify this code to extract text content from other elements on the webpage by changing the selector in the document.querySelector
method.