添加链接
link之家
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I tried the sample provided within the documentation of the requests library for python.

With async.map(rs) , I get the response codes, but I want to get the content of each page requested. This, for example, does not work:

out = async.map(rs)
print out[0].content
                Most answers are outdated. In the year 2021 the current bandwagon-effect winner is: docs.aiohttp.org/en/stable
– guettli
                Jun 7, 2021 at 13:59

The below answer is not applicable to requests v0.13.0+. The asynchronous functionality was moved to grequests after this question was written. However, you could just replace requests with grequests below and it should work.

I've left this answer as is to reflect the original question which was about using requests < v0.13.0.

To do multiple tasks with async.map asynchronously you have to:

  • Define a function for what you want to do with each object (your task)
  • Add that function as an event hook in your request
  • Call async.map on a list of all the requests / actions
  • Example:

    from requests import async
    # If using requests > v0.13.0, use
    # from grequests import async
    urls = [
        'http://python-requests.org',
        'http://httpbin.org',
        'http://python-guide.org',
        'http://kennethreitz.com'
    # A simple task to do to each response object
    def do_something(response):
        print response.url
    # A list to hold our things to do via async
    async_list = []
    for u in urls:
        # The "hooks = {..." part is where you define what you want to do
        # Note the lack of parentheses following do_something, this is
        # because the response will be used as the first argument automatically
        action_item = async.get(u, hooks = {'response' : do_something})
        # Add the task to our list of things to do via async
        async_list.append(action_item)
    # Do our list of things to do via async
    async.map(async_list)
                    Nice idea to have left your comment : due to compatibility issues between latest requests and grequests (lack of max_retries option in requests 1.1.0) i had to downgrade requests to retrieve async and I have found that the asynchronous functionality was moved with versions 0.13+ (pypi.python.org/pypi/requests)
    – outforawhile
                    Feb 20, 2013 at 16:13
                    from grequests import async do not work.. and this definition of dosomething work for me def do_something(response, **kwargs):, I find it from stackoverflow.com/questions/15594015/…
    – Allan Ruin
                    Nov 10, 2014 at 1:50
                    if the async.map call still blocks, then how is this asynchronous? Besides the requests themselves being sent asynchronously, the retrieval is still synchronous?
    – bryanph
                    May 27, 2015 at 11:21
    

    async is now an independent module : grequests.

    See here :https://github.com/spyoungtech/grequests

    And there: Ideal method for sending multiple HTTP requests over Python?

    installation:

    $ pip install grequests
    

    usage:

    build a stack:

    import grequests
    urls = [
        'http://www.heroku.com',
        'http://tablib.org',
        'http://httpbin.org',
        'http://python-requests.org',
        'http://kennethreitz.com'
    rs = (grequests.get(u) for u in urls)
    

    send the stack

    grequests.map(rs)
    

    result looks like

    [<Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>]
    

    grequests don't seem to set a limitation for concurrent requests, ie when multiple requests are sent to the same server.

    With regards to the limitation on concurrent requests - you can specify a pool size when running the map()/imap(). i.e. grequests.map(rs, size=20) to have 20 concurrent grabs. – synthesizerpatel Nov 17, 2012 at 8:46 I not quite understand the async part. if I let results = grequests.map(rs) the the code after this line is block, I can see the async effect? – Allan Ruin Nov 10, 2014 at 1:46 On the github, repo, the author of grequests recommends using requests-threads or requests-futures instead. – theberzi May 22, 2021 at 8:12 Does someone know if the response list are ordered by response time or it is ordered as the same order as the url lists that we pass as arg in the map func? – luisvenezian Dec 19, 2022 at 19:41

    I tested both requests-futures and grequests. Grequests is faster but brings monkey patching and additional problems with dependencies. requests-futures is several times slower than grequests. I decided to write my own and simply wrapped requests into ThreadPoolExecutor and it was almost as fast as grequests, but without external dependencies.

    import requests
    import concurrent.futures
    def get_urls():
        return ["url1","url2"]
    def load_url(url, timeout):
        return requests.get(url, timeout = timeout)
    with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:
        future_to_url = {executor.submit(load_url, url, 10): url for url in     get_urls()}
        for future in concurrent.futures.as_completed(future_to_url):
            url = future_to_url[future]
                data = future.result()
            except Exception as exc:
                resp_err = resp_err + 1
            else:
                resp_ok = resp_ok + 1
                    Sorry I dont understand your question. Use only single url in multiple threads? Only one case DDoS attacks ))
    – Hodza
                    Nov 27, 2015 at 8:14
                    I don't understand why this answer got so many upvotes. The OP question was about async requests. ThreadPoolExecutor runs threads. Yes, you can make requests in multiple threads, but that will never be an async program, so I how could it be an answer for the original question?
    – nagylzs
                    Feb 14, 2019 at 17:52
                    Actually, the question was about how to load URLs in parallel. And yes thread pool executor is not the best option, it is better to use async io, but it works well in Python. And I don't understand why threads couldn't be used for async? What if you need to run CPU bound task asynchronously?
    – Hodza
                    Feb 15, 2019 at 10:33
    

    Unfortunately, as far as I know, the requests library is not equipped for performing asynchronous requests. You can wrap async/await syntax around requests, but that will make the underlying requests no less synchronous. If you want true async requests, you must use other tooling that provides it. One such solution is aiohttp (Python 3.5.3+). It works well in my experience using it with the Python 3.7 async/await syntax. Below I write three implementations of performing n web requests using

  • Purely synchronous requests (sync_requests_get_all) using the Python requests library
  • Synchronous requests (async_requests_get_all) using the Python requests library wrapped in Python 3.7 async/await syntax and asyncio
  • A truly asynchronous implementation (async_aiohttp_get_all) with the Python aiohttp library wrapped in Python 3.7 async/await syntax and asyncio
  • Tested in Python 3.5.10 import time import asyncio import requests import aiohttp from asgiref import sync def timed(func): records approximate durations of function calls def wrapper(*args, **kwargs): start = time.time() print('{name:<30} started'.format(name=func.__name__)) result = func(*args, **kwargs) duration = "{name:<30} finished in {elapsed:.2f} seconds".format( name=func.__name__, elapsed=time.time() - start print(duration) timed.durations.append(duration) return result return wrapper timed.durations = [] @timed def sync_requests_get_all(urls): performs synchronous get requests # use session to reduce network overhead session = requests.Session() return [session.get(url).json() for url in urls] @timed def async_requests_get_all(urls): asynchronous wrapper around synchronous requests session = requests.Session() # wrap requests.get into an async function def get(url): return session.get(url).json() async_get = sync.sync_to_async(get) async def get_all(urls): return await asyncio.gather(*[ async_get(url) for url in urls # call get_all as a sync function to be used in a sync context return sync.async_to_sync(get_all)(urls) @timed def async_aiohttp_get_all(urls): performs asynchronous get requests async def get_all(urls): async with aiohttp.ClientSession() as session: async def fetch(url): async with session.get(url) as response: return await response.json() return await asyncio.gather(*[ fetch(url) for url in urls # call get_all as a sync function to be used in a sync context return sync.async_to_sync(get_all)(urls) if __name__ == '__main__': # this endpoint takes ~3 seconds to respond, # so a purely synchronous implementation should take # little more than 30 seconds and a purely asynchronous # implementation should take little more than 3 seconds. urls = ['https://postman-echo.com/delay/3']*10 async_aiohttp_get_all(urls) async_requests_get_all(urls) sync_requests_get_all(urls) print('----------------------') [print(duration) for duration in timed.durations]

    On my machine, this is the output:

    async_aiohttp_get_all          started
    async_aiohttp_get_all          finished in 3.20 seconds
    async_requests_get_all         started
    async_requests_get_all         finished in 30.61 seconds
    sync_requests_get_all          started
    sync_requests_get_all          finished in 30.59 seconds
    ----------------------
    async_aiohttp_get_all          finished in 3.20 seconds
    async_requests_get_all         finished in 30.61 seconds
    sync_requests_get_all          finished in 30.59 seconds
                    Your async_aiohttp_get_all() is a nice solution. I came up with something similar, but had an extra async def fetch_all(urls): return await asyncio.gather(*[fetch(url) for url in urls]) outside of it, which had my solution creating separate aiohttp.ClientSession() instances for each URL whereas by embedding a local function, you're able to reuse the same session... much more Pythonic IMO. Can you remind me of the benefit of using sync.async_to_sync() with the existence of get_all() vs. asyncio.run() without get_all()?
    – wescpy
                    Nov 25, 2021 at 9:04
                    @CpILL it wraps a function which returns a coroutine (the result of asyncio.gather) so it can be called in a synchronous context. I like doing it that way. You can instead use, e.g. asyncio.run to execute the result of asyncio.gather directory.
    – DragonBobZ
                    Jul 22, 2022 at 15:22
    

    maybe requests-futures is another choice.

    from requests_futures.sessions import FuturesSession
    session = FuturesSession()
    # first request is started in background
    future_one = session.get('http://httpbin.org/get')
    # second requests is started immediately
    future_two = session.get('http://httpbin.org/get?foo=bar')
    # wait for the first request to complete, if it hasn't already
    response_one = future_one.result()
    print('response one status: {0}'.format(response_one.status_code))
    print(response_one.content)
    # wait for the second request to complete, if it hasn't already
    response_two = future_two.result()
    print('response two status: {0}'.format(response_two.status_code))
    print(response_two.content)
    

    It is also recommended in the office document. If you don't want involve gevent, it's a good one.

    One of the easiest solutions. Number of concurrent requests can be increased by defining max_workers parameter – Jose Cherian Sep 20, 2015 at 22:22 It'd be nice to see an example of this scaled so we're not using one variable name per item to loop over. – user1717828 Nov 13, 2017 at 21:36 having one thread per request is a hell waste of resources! it is not possible to do for example 500 requests simultaneously, it will kill your cpu. this should never be considered a good solution. – Corneliu Maftuleac Feb 7, 2018 at 8:52 @CorneliuMaftuleac good point. Regarding the thread usage, you definitely need to care about it and the library provide an option to enable the threading pool or processing pool. ThreadPoolExecutor(max_workers=10) – Dreampuf Feb 13, 2018 at 20:02 async def get_async(url): async with httpx.AsyncClient() as client: return await client.get(url) urls = ["http://google.com", "http://wikipedia.org"] # Note that you need an async context to use `await`. await asyncio.gather(*map(get_async, urls))

    if you want a functional syntax, the gamla lib wraps this into get_async.

    Then you can do

    await gamla.map(gamla.get_async(10))(["http://google.com", "http://wikipedia.org"])

    The 10 is the timeout in seconds.

    (disclaimer: I am its author)

    Hi @Uri, I am getting below error in trying the code you mentioned in this answer. await asyncio.gather(*map(get_async, urls)) ^ SyntaxError: invalid syntax Please guide – AJ. Oct 5, 2020 at 8:42

    I have a lot of issues with most of the answers posted - they either use deprecated libraries that have been ported over with limited features, or provide a solution with too much magic on the execution of the request, making it difficult to error handle. If they do not fall into one of the above categories, they're 3rd party libraries or deprecated.

    Some of the solutions works alright purely in http requests, but the solutions fall short for any other kind of request, which is ludicrous. A highly customized solution is not necessary here.

    Simply using the python built-in library asyncio is sufficient enough to perform asynchronous requests of any type, as well as providing enough fluidity for complex and usecase specific error handling.

    import asyncio
    loop = asyncio.get_event_loop()
    def do_thing(params):
        async def get_rpc_info_and_do_chores(id):
            # do things
            response = perform_grpc_call(id)
            do_chores(response)
        async def get_httpapi_info_and_do_chores(id):
            # do things
            response = requests.get(URL)
            do_chores(response)
        async_tasks = []
        for element in list(params.list_of_things):
           async_tasks.append(loop.create_task(get_chan_info_and_do_chores(id)))
           async_tasks.append(loop.create_task(get_httpapi_info_and_do_chores(ch_id)))
        loop.run_until_complete(asyncio.gather(*async_tasks))
    

    How it works is simple. You're creating a series of tasks you'd like to occur asynchronously, and then asking a loop to execute those tasks and exit upon completion. No extra libraries subject to lack of maintenance, no lack of functionality required.

    If I understand correctly, this will block the event loop while doing the GRPC and HTTP call? So if these calls take seconds to complete, your entire event loop will block for seconds? To avoid this, you need to use GRPC or HTTP libraries that are async. Then you can for example do await response = requests.get(URL). No? – Coder Nr 23 Feb 27, 2020 at 7:32 Unfortunately, when trying this out, I found that making a wrapper around requests is barely faster (and in some cases slower) than just calling a list of URLs synchronously. E.g, requesting an endpoint that takes 3 seconds to respond 10 times using the strategy above takes about 30 seconds. If you want true async performance, you need to use something like aiohttp. – DragonBobZ Jun 6, 2020 at 4:36 @arshbot Yes, if your chores are asynchronous, then you will see speed-ups, despite waiting on synchronous calls to requests.get. But the question is how to perform asynchronous requests with the python requests library. This answer does not do that, so my criticism stands. – DragonBobZ Jul 30, 2020 at 18:19 I think this should be bumped. Using async event loops seems enough to fire asynchronous requests. No need to install external dependencies. – iedmrc Aug 21, 2020 at 8:29 @iedmrc sadly, this is not the case. For a task to be non-blocking it has to be implemented using the newer async tools in Python, and this is not the case with the requests library. If you just use stick requests tasks in an async event loop, those would still be blocking. That being said, you can (as suggested in other responses) use things like gevent or threads with requests, but certainly not asyncio. – Sergio Chumacero Mar 9, 2021 at 1:27

    I know this has been closed for a while, but I thought it might be useful to promote another async solution built on the requests library.

    list_of_requests = ['http://moop.com', 'http://doop.com', ...]
    from simple_requests import Requests
    for response in Requests().swarm(list_of_requests):
        print response.content
    

    The docs are here: http://pythonhosted.org/simple-requests/

    @YSY Feel free to post an issue: github.com/ctheiss/simple-requests/issues; I literally use this library thousands of times a day. – Monkey Boson Apr 9, 2015 at 16:06 Boston, how do you handle 404/500 errors? what about https urls? will appreciate a snipping that supports thousands of urls. can you please paste an example? thanks – YSY Apr 12, 2015 at 8:37 @YSY By default 404/500 errors raise an exception. This behaviour can be overridden (see pythonhosted.org/simple-requests/…). HTTPS urls are tricky due to the reliance on gevent, which currently has an outstanding bug on this (github.com/gevent/gevent/issues/477). There is a shim in the ticket you can run, but it will still throw warnings for SNI servers (but it will work). As for snipping, I'm afraid all my usages are at my company and closed. But I assure you we execute thousands of requests over tens of jobs. – Monkey Boson Apr 13, 2015 at 15:03 Library looks sleek with respect to interaction. Is Python3+ usable? Sorry could not see any mention. – Isaac Philip Jan 28, 2020 at 3:16 @Jethro absolutely right, the library would need a total re-write since the underlying technologies are quite different in Python 3. For right now, the library is "complete" but only works for Python 2. – Monkey Boson Apr 8, 2020 at 14:05

    DISCLAMER: Following code creates different threads for each function.

    This might be useful for some of the cases as it is simpler to use. But know that it is not async but gives illusion of async using multiple threads, even though decorator suggests that.

    You can use the following decorator to give a callback once the execution of function is completed, the callback must handle the processing of data returned by the function.

    Please note that after the function is decorated it will return a Future object.

    import asyncio
    ## Decorator implementation of async runner !!
    def run_async(callback, loop=None):
        if loop is None:
            loop = asyncio.get_event_loop()
        def inner(func):
            def wrapper(*args, **kwargs):
                def __exec():
                    out = func(*args, **kwargs)
                    callback(out)
                    return out
                return loop.run_in_executor(None, __exec)
            return wrapper
        return inner
    

    Example of implementation:

    urls = ["https://google.com", "https://facebook.com", "https://apple.com", "https://netflix.com"]
    loaded_urls = []  # OPTIONAL, used for showing realtime, which urls are loaded !!
    def _callback(resp):
        print(resp.url)
        print(resp)
        loaded_urls.append((resp.url, resp))  # OPTIONAL, used for showing realtime, which urls are loaded !!
    # Must provide a callback function, callback func will be executed after the func completes execution
    # Callback function will accept the value returned by the function.
    @run_async(_callback)
    def get(url):
        return requests.get(url)
    for url in urls:
        get(url)
    

    If you wish to see which url are loaded in real-time then, you can add the following code at the end as well:

    while True:
        print(loaded_urls)
        if len(loaded_urls) == len(urls):
            break
                    This works but it generates a new thread for each request, which seems to defeat the purpose of using asyncio.
    – rtaft
                    Jan 29, 2021 at 20:42
    

    I second the suggestion above to use HTTPX, but I often use it in a different way so am adding my answer.

    I personally use asyncio.run (introduced in Python 3.7) rather than asyncio.gather and also prefer the aiostream approach, which can be used in combination with asyncio and httpx.

    As in this example I just posted, this style is helpful for processing a set of URLs asynchronously even despite the (common) occurrence of errors. I particularly like how that style clarifies where the response processing occurs and for ease of error handling (which I find async calls tend to give more of).

    It's easier to post a simple example of just firing off a bunch of requests asynchronously, but often you also want to handle the response content (compute something with it, perhaps with reference to the original object that the URL you requested was to do with).

    The core of that approach looks like:

    async with httpx.AsyncClient(timeout=timeout) as session:
        ws = stream.repeat(session)
        xs = stream.zip(ws, stream.iterate(urls))
        ys = stream.starmap(xs, fetch, ordered=False, task_limit=20)
        process = partial(process_thing, things=things, pbar=pbar, verbose=verbose)
        zs = stream.map(ys, process)
        return await zs
    

    where:

  • process_thing is an async response content handling function
  • things is the input list (which the urls generator of URL strings came from), e.g. a list of objects/dictionaries
  • pbar is a progress bar (e.g. tqdm.tqdm) [optional but useful]
  • All of that goes in an async function async_fetch_urlset which is then run by calling a synchronous 'top-level' function named e.g. fetch_things which runs the coroutine [this is what's returned by an async function] and manages the event loop:

    def fetch_things(urls, things, pbar=None, verbose=False):
        return asyncio.run(async_fetch_urlset(urls, things, pbar, verbose))
    

    Since a list passed as input (here it's things) can be modified in-place, you can effectively get output back (as we're used to from synchronous function calls)

    Non of the answers above helped me because they assume that you have a predefined list of requests, while in my case i need to be able to listen to requests and respond asynchronously (in similar way to how it works in nodejs).

    def handle_finished_request(r, **kwargs):
        print(r)
    # while True:
    def main():
        while True:
            address = listen_to_new_msg()  # based on your server
            # schedule async requests and run 'handle_finished_request' on response
            req = grequests.get(address, timeout=1, hooks=dict(response=handle_finished_request))
            job = grequests.send(req)  # does not block! for more info see https://stackoverflow.com/a/16016635/10577976
    main()
    

    the handle_finished_request callback would be called when a response is received. note: for some reason timeout (or no response) does not trigger error here

    This simple loop can trigger async requests similarly to how it would work in nodejs server

    I would highly recommend hyper_requests (https://github.com/edjones84/hyper-requests) for this which allows for a list of the urls and parameters to be generated and then requests to be run asynchronously:

    import hyper_requests
    # Define the request parameters
    params = [
        {'url': 'http://httpbin.org/get' , 'data': 'value1'},
        {'url': 'http://httpbin.org/get' , 'data': 'value3'},
        {'url': 'http://httpbin.org/get' , 'data': 'value5'},
        {'url': 'http://httpbin.org/get' , 'data': 'value7'},
        {'url': 'http://httpbin.org/get' , 'data': 'value9'}
    # Create an instance of AsyncRequests and execute the requests
    returned_data = hyper_requests.get(request_params=params, workers=10)
    # Process the returned data
    for response in returned_data:
        print(response)
            

    Thanks for contributing an answer to Stack Overflow!

    • Please be sure to answer the question. Provide details and share your research!

    But avoid

    • Asking for help, clarification, or responding to other answers.
    • Making statements based on opinion; back them up with references or personal experience.

    To learn more, see our tips on writing great answers.