pyppeteer 类似 selenium ,可以操作Chrome浏览器
文档: https://miyakogi.github.io/pyppeteer/index.html
github : https://github.com/miyakogi/pyppeteer
环境要求:
python 3.6+
pip install pyppeteer
# -*- coding: utf-8 -*- import asyncio from pyppeteer import launch from pyquery import PyQuery as pq # 最好指定一下自己浏览器的位置,如果不指定会自动下载,太慢了... executable_path = "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" # 示例一: 渲染页面 async def crawl_page(): # 打开浏览器 browser = await launch(executablePath=executable_path) # 打开tab page = await browser.newPage() # 输入网址回车 await page.goto('http://quotes.toscrape.com/js/') # 获取内容并解析 doc = pq(await page.content()) print('Quotes:', doc('.quote').length) # 关闭浏览器 await browser.close() # 示例二:截图,保存pdf,执行js async def save_pdf(): browser = await launch(executablePath=executable_path) page = await browser.newPage() await page.goto('http://quotes.toscrape.com/js/') # 网页截图保存 await page.screenshot(path='example.png') # 网页导出 PDF 保存 await page.pdf(path='example.pdf') # 执行 JavaScript dimensions = await page.evaluate('''() => { return { width: document.documentElement.clientWidth, height: document.documentElement.clientHeight, deviceScaleFactor: window.devicePixelRatio, }''') print(dimensions) await browser.close() if __name__ == '__main__': asyncio.get_event_loop().run_until_complete(crawl_page()) # asyncio.get_event_loop().run_until_complete(save_pdf())
异步编程,这个关键字太多了,看的眼花缭乱
参考 别只用 Selenium,新神器 Pyppeteer 绕过淘宝更简单!
参考
别只用 Selenium,新神器 Pyppeteer 绕过淘宝更简单!