elm.web.search.run.search_all_se

async search_all_se(queries, search_engines=('PlaywrightGoogleLinkSearch', 'PlaywrightDuckDuckGoLinkSearch', 'DuxDistributedGlobalSearch'), num_urls=None, ignore_url_parts=None, browser_semaphore=None, task_name=None, **kwargs)[source]

Retrieve search query URLs using multiple search engines if needed

Parameters:
  • queries (collection of str) – Collection of strings representing google queries. Documents for the top num_urls google search results (from all of these queries _combined_ will be returned from this function.

  • search_engines (iterable of str) – Ordered collection of search engine names to attempt for web search. If the first search engine in the list returns a set of URLs, then iteration will end and documents for each URL will be returned. Otherwise, the next engine in this list will be used to run the web search. If this also fails, the next engine is used and so on. If all web searches fail, an empty list is returned. See SEARCH_ENGINE_OPTIONS for supported search engine options. By default, ("PlaywrightGoogleLinkSearch", ).

  • num_urls (int, optional) – Number of unique top Google search result to return as docs. The google search results from all queries are interleaved and the top num_urls unique URL’s are downloaded as docs. If this number is less than len(queries), some of your queries may not contribute to the final output. By default, None, which sets num_urls = 3 * len(queries).

  • ignore_url_parts (iterable of str, optional) – Optional URL components to blacklist. For example, supplying ignore_url_parts={“wikipedia.org”} will ignore all URLs that contain “wikipedia.org”. By default, None.

  • browser_semaphore (asyncio.Semaphore, optional) – Semaphore instance that can be used to limit the number of playwright browsers open concurrently. If None, no limits are applied. By default, None.

  • task_name (str, optional) – Optional task name to use in asyncio.create_task(). By default, None.

  • **kwargs – Keyword-argument pairs to initialize search engines. This input can include and any/all of the following keywords:

    • ddg_api_kwargs

    • google_cse_api_kwargs

    • google_serper_api_kwargs

    • google_serpapi_kwargs

    • tavily_api_kwargs

    • ddgs_kwargs

    • cf_google_se_kwargs

    • pw_bing_se_kwargs

    • pw_ddg_se_kwargs

    • pw_google_cse_kwargs

    • pw_google_se_kwargs

    • pw_yahoo_se_kwargs

    • pw_launch_kwargs

    Each of these inputs should be a dictionary with keyword-argument pairs that you can use to initialize the search engines in the search_engines input. If pw_launch_kwargs is detected, it will be added to the kwargs for all of the PLaywright-based search engines so that you do not have to repeatedly specify the launch parameters. For example, you may specify pw_launch_kwargs={"headless": False} to have all Playwright-based searches show the browser and _also_ specify google_serper_api_kwargs={"api_key": "..."} to specify the API key for the Google Serper search.

Returns:

list of list of dict – List of search results for each query, where each search result is represented as a dictionary containing the following keys:

  • url: URL of the search result

  • query: The search query that resulted in this search result

  • search_engine: The search engine that returned this result

  • query_rank: The rank of this search result for the query

Other keys such as “attrs” may also be included depending on the search engine.

Raises:

ELMInputError – If search_engines input is empty.