添加链接
link之家
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

selenium.common.exceptions.TimeoutException: Message: timeout: Timed out receiving message from renderer: 298,437

Ask Question

I am doing web scraping using selenium in python using the following code:

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
def get_all_search_details(URL):
    SEARCH_RESULTS = {}
    options = Options()
    options.headless = True
    options.add_argument("--remote-debugging-port=9222")
    driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
    driver.get(URL)
    print(f"Scraping {driver.current_url}")
        medias = WebDriverWait(driver, 30).until(EC.presence_of_all_elements_located((By.CLASS_NAME, 'result-row')))
    except:
        print(f">> Selenium Exception: {URL}")
        return
    for media_idx, media_elem in enumerate(medias):
        outer_html = media_elem.get_attribute('outerHTML')
        result = scrap_newspaper(outer_html) # some function to extract results
        SEARCH_RESULTS[f"result_{media_idx}"] = result
    return SEARCH_RESULTS
if __name__ == '__main__':
    in_url = "https://digi.kansalliskirjasto.fi/search?query=%22heimo%20kosonen%22&orderBy=RELEVANCE"
    my_res = get_all_search_details(in_url)

I applied try except to ensure I would not get trapped in selenium timeout exception , however, here is the error I obtained:

Scraping https://digi.kansalliskirjasto.fi/search?query=%22heimo%20kosonen%22&orderBy=RELEVANCE
>> Selenium Exception: https://digi.kansalliskirjasto.fi/search?query=%22heimo%20kosonen%22&orderBy=RELEVANCE
Traceback (most recent call last):
  File "nationalbiblioteket_logs.py", line 274, in <module>
    run()
  File "nationalbiblioteket_logs.py", line 262, in run
    all_queries(file_=get_query_log(QUERY=args.query),
  File "nationalbiblioteket_logs.py", line 218, in all_queries
    df = pd.DataFrame( df.apply( check_urls, axis=1, ) )    
  File "/home/xenial/anaconda3/envs/py37/lib/python3.7/site-packages/pandas/core/frame.py", line 8740, in apply
    return op.apply()
  File "/home/xenial/anaconda3/envs/py37/lib/python3.7/site-packages/pandas/core/apply.py", line 688, in apply
    return self.apply_standard()
  File "/home/xenial/anaconda3/envs/py37/lib/python3.7/site-packages/pandas/core/apply.py", line 812, in apply_standard
    results, res_index = self.apply_series_generator()
  File "/home/xenial/anaconda3/envs/py37/lib/python3.7/site-packages/pandas/core/apply.py", line 828, in apply_series_generator
    results[i] = self.f(v)
  File "nationalbiblioteket_logs.py", line 217, in <lambda>
    check_urls = lambda INPUT_DF: analyze_(INPUT_DF)
  File "nationalbiblioteket_logs.py", line 200, in analyze_
    df["search_results"] = get_all_search_details(in_url)
  File "/home/xenial/WS_Farid/DARIAH-FI/url_scraping.py", line 27, in get_all_search_details
    driver.get(URL)
  File "/home/xenial/anaconda3/envs/py37/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 441, in get
    self.execute(Command.GET, {'url': url})
  File "/home/xenial/anaconda3/envs/py37/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 429, in execute
    self.error_handler.check_response(response)
  File "/home/xenial/anaconda3/envs/py37/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 243, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: timeout: Timed out receiving message from renderer: 298,437
  (Session info: headless chrome=109.0.5414.74)
Stacktrace:
#0 0x561d3b04c303 <unknown>
#1 0x561d3ae20d37 <unknown>
#2 0x561d3ae0b549 <unknown>
#3 0x561d3ae0b285 <unknown>
#4 0x561d3ae09c77 <unknown>
#5 0x561d3ae0a408 <unknown>
#6 0x561d3ae1767f <unknown>
#7 0x561d3ae182d2 <unknown>
#8 0x561d3ae28fd0 <unknown>
#9 0x561d3ae2d34b <unknown>
#10 0x561d3ae0a9c5 <unknown>
#11 0x561d3ae28d7f <unknown>
#12 0x561d3ae95aa0 <unknown>
#13 0x561d3ae7d753 <unknown>
#14 0x561d3ae50a14 <unknown>
#15 0x561d3ae51b7e <unknown>
#16 0x561d3b09b32e <unknown>
#17 0x561d3b09ec0e <unknown>
#18 0x561d3b081610 <unknown>
#19 0x561d3b09fc23 <unknown>
#20 0x561d3b073545 <unknown>
#21 0x561d3b0c06a8 <unknown>
#22 0x561d3b0c0836 <unknown>
#23 0x561d3b0dbd13 <unknown>
#24 0x7f80a698a6ba start_thread

Is there a better alternative to get rid of such selenium timeout exception? To be more specific, I added:

options.add_argument("--disable-extensions")
options.add_argument("--no-sandbox")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)

But they were not helpful to tackle selenium.common.exceptions.TimeoutException: Message: timeout: Timed out receiving message from renderer: 298,437!

Here are some more details regarding libraries I use:

>>> selenium.__version__
'4.5.0'
>>> webdriver_manager.__version__
'3.8.4'
                I would try with disable-gpu option/flag.  You should also remove remote-debugging-port=9222 unless you are already running the site in a debugger/ide.  (This is usually used to tie into an existing dev-mode session....)
– pcalkins
                Jan 12 at 23:44
                Hasn’t ‘disable-gpu’ already deprecated? I also run the web scraping in vscode ide! So I do I steel have remove ‘remote-debugging-port’?!
– Farid Alijani
                Jan 13 at 9:53
                not sure if disable-gpu is deprecated but it's sometimes necessary for headless mode.  You should just let the driver/browser choose the dev port.  No need to set that unless you have a specific need for it.  (this is more to troubleshoot an option at a time... it may not be causing any problems, but if you let it choose it's own it should choose an available one)
– pcalkins
                Jan 13 at 17:24

I think the problem in this url

https://digi.kansalliskirjasto.fi/search?query=%22heimo%20kosonen%22&orderBy=RELEVANCE it is not fetching the expected data and showing no records.

Try with this url

https://digi.kansalliskirjasto.fi/search?query=heimo%20kosonen&orderBy=RELEVANCE

in_url = "https://digi.kansalliskirjasto.fi/search?query=heimo%20kosonen&orderBy=RELEVANCE"
                Thanks for the answer, but I cannot preprocess, .i.e. modify the query phrases in my web scraping, I must handle it as it is! I need to return None in such scenarios in which no information can be retrieved! I thought try except would handle such exception
– Farid Alijani
                Jan 12 at 18:45

Here is how I fixed the problem: I added more exception blocks and rearranged and moved for loop inside the try block to ensure it would not get stuck in TimeoutException or StaleElementReferenceException as mentioned in here:

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common import exceptions
def get_all_search_details(URL):
    SEARCH_RESULTS = {}
    options = Options()
    options.headless = True
    options.add_argument("--remote-debugging-port=9222") #
    options.add_argument("--no-sandbox")
    options.add_argument("--disable-gpu")
    options.add_argument("--disable-dev-shm-usage")
    options.add_argument("--disable-extensions")
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)
    driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
    driver.get(URL)
    print(f"Scraping {driver.current_url}")
        medias = WebDriverWait(driver,timeout=5,).until(EC.presence_of_all_elements_located((By.CLASS_NAME, 'result-row')))
        for media_idx, media_elem in enumerate(medias):
            outer_html = media_elem.get_attribute('outerHTML')      
            result = scrap_newspaper(outer_html) # some external functions
            SEARCH_RESULTS[f"result_{media_idx}"] = result
    except exceptions.StaleElementReferenceException as e:
        print(f">> {type(e).__name__}: {e.args}")
        return
    except exceptions.NoSuchElementException as e:
        print(f">> {type(e).__name__}: {e.args}")
        return
    except exceptions.TimeoutException as e:
        print(f">> {type(e).__name__}: {e.args}")
        return
    except exceptions.WebDriverException as e:
        print(f">> {type(e).__name__}: {e.args}")
        return
    except exceptions.SessionNotCreatedException as e:
        print(f">> {type(e).__name__}: {e.args}")
        return
    except Exception as e:
        print(f">> {type(e).__name__} line {e.__traceback__.tb_lineno} of {__file__}: {e.args}")
        return
    except:
        print(f">> General Exception: {URL}")
        return
    return SEARCH_RESULTS
        

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.