我想从这个页面上抓取 "服务/产品 "部分。 https://www.yellowpages.com/deland-fl/mip/ryan-wells-pumps-20533306?lid=1001782175490
该文本在一个dd元素内,该元素总是排在后面。
import requests
from lxml import html
url = ""
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:76.0) Gecko/20100101 Firefox/76.0'}
session = requests.Session()
r = session.get(url, timeout=30, headers=headers)
t = html.fromstring(r.content)
products = t.xpath('//dd[preceding-sibling::dt[contains(.,"Services/Products")]]/text()[1]')[0] if t.xpath('//dd[preceding-sibling::dt[contains(.,"Services/Products")]]') else ''
有什么办法可以用Beautifulsoup(和css选择器,如果可能的话)代替lxml和xpath获得相同的文本吗?