Skip to content

search_google

thvid.search_google

Functions:

  • Distance

    Get google distance between words

  • get_user_agent

    Get a random user agent string.

  • get_hits

    This function return the amount of hits on search query

  • Download

    Download url as html file

  • search

    SEARCH

Distance(term1, term2)

Get google distance between words Returns float

get_user_agent()

Get a random user agent string. Return string

get_hits(query, tld='com', lang='sv', tbs='0', safe='off', extra_params={}, tpe='', user_agent=None)

This function return the amount of hits on search query Return int

Download(url_list, out_format, download_dir)

Download url as html file Returns folder

search(query, tld='com', lang='en', tbs='0', safe='off', num=10, start=0, stop=10, pause=2.0, only_standard=False, extra_params={}, tpe='', user_agent=None, type='text', rights='', download=False, download_dir='downloads', out_format='html')

SEARCH This is a simplified search function implementation. I added some parameters to make it more generic towards google and google_search_image import. I have not experimented with all different parameters. Code assume from examples on the imported libraries github repos. ARGUMENTS: query (str) – Query string. Must NOT be url-encoded. tld (str) – Top level domain. lang (str) – Language. tbs (str) – Time limits (i.e “qdr:h” => last hour, “qdr:d” => last 24 hours, “qdr:m” => last month). safe (str) – Safe search. num (int) – Number of results per page. start (int) – First result to retrieve. or None stop (int) – Last result to retrieve. Use None to keep searching forever. of str or None (list) – A list of web to constrain the search. pause (float) – Lapse to wait between HTTP requests. A lapse too long will make the search slow, but a lapse too short may cause Google to block your IP. Your mileage may vary! only_standard (bool) – If True, only returns the standard results from each page. If False, it returns every possible link from each page, except for those that point back to Google itself. Defaults to False for backwards compatibility with older versions of this module. of str to str extra_params (dict) – A dictionary of extra HTTP GET parameters, which must be URL encoded. For example if you don’t want Google to filter similar results you can set the extra_params to {‘filter’: ‘0’} which will append ‘&filter=0’ to every query. tpe (str) – Search type (images, videos, news, shopping, books, apps) Use the following values {videos: ‘vid’, images: ‘isch’, news: ‘nws’, shopping: ‘shop’, books: ‘bks’, applications: ‘app’} or None user_agent (str) – User agent for the HTTP requests. Use None for the default. type - Changes which function to use. ----- For images only ----- download_dir - if download is active, download_dir will discribe output directory rights - (str) - Values labeled-for-reuse-with-modifications,labeled-for-reuse, labeled-for-noncommercial-reuse-with-modification,labeled-for-nocommercial-reuse download - Download html, pdf or image, Takes a set of urls and tries to download them to download_dir, If download_dir is None, won't save on drive, Return reference list to images, download_dir file to save to # Read more here: https://python-googlesearch.readthedocs.io/en/latest/

Returns:

  • Generator (iterator) that yields found URLs. If the stop parameter is None the iterator will loop forever.