Skip to content

vid

thml.vid

Modules:

audio_book

download_youtube

Functions:

videos_from_channel(channel_URL, period=1)

get all video links from a channel in "period" days

download_video(URLs, only_video=False, only_audio=False)

Download videos from list of URLs Args: URLs (list of str): list of URL only_video (bool = True): dowload video only only_audio (bool = True): dowload audio only

download_srt_caption(url, lang='en', out_file='transcript_srt')

Download video caption in .srt format Args: download_cap (bool = False): download caption lang (str = 'en'): language of caption

Notes" Error now: https://github.com/pytube/pytube/issues/1085

download_json_caption(URL, out_file='transcript_json.json')

Download video caption in .json format

make_video

Functions:

mknews_video_intro(vid_size=(1280, 720), lang='vi', rate=150, bg_video='default', bg_audio='default', bg_audio_factor=0.3, out_file='vid_intro.mp4')

Make INTRO video

Parameters:

  • vid_size (tuple, default: (1280, 720) ) –

    Video size.

  • lang (str, default: 'vi' ) –

    language of news

  • rate (float, default: 150 ) –

    speed of voice

  • bg_video (str, default: 'default' ) –

    filenames of video/image background. Possible: 'default', 'filename'

  • bg_audio (str, default: 'default' ) –

    file name of audio background. Possible: 'default', 'filename'

  • bg_audio_factor (float, default: 0.3 ) –

    factor of backgroun audio with main voice.

Info

Video size for 1 minute video, ref

Type Resolution File Size
Ultra HD or 4K 3840 x 2160 320 MB
Full HD 1920x1080 149 MB
HD 1280x720 105 MB
SD 720x480 26 MB

mknews_video_outro(vid_size=(1280, 720), lang='vi', rate=150, bg_video='default', bg_audio='default', bg_audio_factor=0.3, out_file='vid_outro.mp4')

Make OUTRO video Args: vid_size (tuple): Video size. lang (str): language of news rate (float): speed of voice bg_video (str): filenames of video/image background. Possible: 'default', 'filename' bg_audio (str): file name of audio background. Possible: 'default', 'filename' bg_audio_factor (float): factor of backgroun audio with main voice.

mknews_audio(file_text, lang='vi', rate=150, greet_word='', end_word='', out_file='audio_news.mp3')

create audio from file_text

Parameters:

  • file_text (str) –

    lain text file.

  • lang (str, default: 'vi' ) –

    language of news

  • greet_word (str, default: '' ) –

    Add speech at begin text. Possible: 'intro', 'middle', ''

  • end_word (str, default: '' ) –

    Add speech at begin text. Possible: 'outro', ''

Returns:

  • out_file ( Obj ) –

    audio file

mknews_1_video(lang='vi', rate=150, greet_word='', end_word='', vid_size=(1280, 720), img_duration=15, bg_video='', bg_audio='random', bg_audio_factor=0.2, out_file='vid_news.mp4')

Make a video with concept
  • put a text file and all videos, images into a folder
  • function will convert text to audio
  • make video base on length of audio
  • first use videos, if not enough duration then add images into video
  • if bg_video: make video with only background
    • random_short: use short videos from computer
    • random_download: random download long videos from a predefined list

Parameters:

  • lang (str, default: 'vi' ) –

    language of news

  • rate(float)

    speed of voice

  • vid_size (tuple, default: (1280, 720) ) –

    Video size.

  • img_duration (float, default: 15 ) –

    duration of an image in video.

  • bg_video (str, default: '' ) –

    filenames of video/image background. Possible: '', 'random_short', 'random_long'

  • bg_audio (str, default: 'random' ) –

    file name of audio background. Possible: "filename", 'random'

  • bg_audio_factor (float, default: 0.2 ) –

    factor of backgroun audio with main voice.

  • greet_word (str, default: '' ) –

    Add speech at begin text. Possible: 'intro', 'middle', ''

  • end_word (str, default: '' ) –

    Add speech at begin text. Possible: 'outro', ''

Note

video/image files should begin with a number to specify its order: '1_video_...' or '3_image_...' Only fist ".txt" files is used

mknews_lists_videos(sub_folder='news*', lang='vi', rate=150, greet_word='', end_word='', vid_size=(1280, 720), img_duration=15, bg_video='', bg_audio='', bg_audio_factor=0.2, padding=0, logo='STV', logo_pos='left', out_file='vid_all_news.mp4')

Make a videos in subfolder:

Parameters:

  • sub_folder (str, default: 'news*' ) –

    keyword to search subfolders.

  • bg_audio (str, default: '' ) –

    file name of audio background. Possible: "filename", 'random'

  • padding (float, default: 0 ) –

    gap between successive video

set_bg_audio(file_video, bg_audio='random', bg_audio_factor=0.2, keep_original=False)

Set background_audio for video

Parameters:

  • bg_audio (str, default: 'random' ) –

    file name of audio background. Possible: "filename", 'random'

  • bg_audio_factor (float, default: 0.2 ) –

    factor of backgroun audio with main voice.

  • keep_original (bool, default: False ) –

    keep original video or not

mknews_video_toc(sub_folder='news*', file_video_news='vid_news.mp4', vid_size=(1280, 720), bg_video='default', bg_audio='default', bg_audio_factor=0.3, border_factor=0.2, with_title=False, out_file='vid_TOC.mp4')

Make a TOC video:

Parameters:

  • sub_folder (str, default: 'news*' ) –

    keyword to search subfolders.

  • file_video_news (str, default: 'vid_news.mp4' ) –

    filename of breakingNews in each subfolder.

concate_audio_files(list_files, padding=0, out_file='concate_audio.mp3')

concate a list of audios

Parameters:

  • list_files (list) –

    list contains all audio files.

Returns:

  • file ( obj ) –

    audio file.

concate_video_files(list_files, padding=0, out_file='concate_videoNews.mp4')

concate a list of videos

Parameters:

  • list_files (list) –

    list contains all video files.

  • vid_size (tuple) –

    Video size.

Returns:

  • file ( obj ) –

    audio file.

add_logo_spokeman(file_video, vid_size=(1280, 720), logo='STV', logo_pos='left', spokeman='', spokeman_pos='left', h_spokeman=320, keep_original=False)

add Logo on videos

Parameters:

  • file_video (str) –

    video filename.

  • vid_size (tuple, default: (1280, 720) ) –

    Video size.

  • logo (str, default: 'STV' ) –

    Put logo on video. Possible: "N5_1", "N5_2", 'X7', 'STV', ""

  • logo_pos (float, default: 'left' ) –

    Position of logo. Possible: "left", "rigt"

  • spokeman (str, default: '' ) –

    Spokeman on video. Possible: '', 'Anonymous'

  • h_spokeman (float, default: 320 ) –

    height of Spokeman

  • background (str) –

    file name of video/image background. Possible: "filename", 'random'

  • bg_audio (str) –

    file name of audio background

split_video(video_file, n=3)

Split video into n parts

speech_word_by_word(text, lang='VN', rate=150, vol=1.0, audio_file='word_by_word.mp3')

speech_1_pair_lang(text1, text2, lang1='VN', lang2='EN', voice_name1='', voice_name2='', rate1=130, rate2=120, repeat_slow=60, repeat_fast=60, out_file='pair_lang_audio.mp3')

Returns:

  • file ( file ) –

    audio file, if out_file is not None.

  • clip_audio ( Obj ) –

    audio file, if out_file is None.

speech_list_pair_lang(df, lang1='VN', lang2='EN', rate1=130, rate2=120, repeat_slow=60, repeat_fast=60, out_file='pair_lang_audio_all.mp3')

Parameters:

  • df (DataFrame) –

    contains 2 columns for langs.

mkvid_1_pair_lang(text1, text2, lang1='VN', lang2='EN', voice_name1='', voice_name2='', rate1=130, rate2=120, repeat_slow=60, repeat_fast=60, vid_size=(1280, 720), font_size=80, text_color1='blue', text_color2='black', bg_color1='azure3', bg_color2='azure4', padding=1, show_flag=True, out_file='pair_lang.mp4')

Make a video with concept:

Args:

mkvid_list_pair_lang_from_df(df, lang1='VN', lang2='EN', voice_name1='', voice_name2='', rate1=130, rate2=120, repeat_slow=60, repeat_fast=60, vid_size=(1280, 720), font_size=80, text_color1='blue', text_color2='black', bg_color1='azure4', bg_color2='CadetBlue4', padding=1, show_flag=True, out_file='vid_pair_lang.mp4')

Parameters:

  • df (DataFrame) –

    contains 2 columns for langs.

news

Modules:

making_news

Functions:

  • filter_images

    Filter images based on their width and height.

  • AI_rewrite_text

    Using LLM to rewrite an article text in a limited length.

filter_images(filenames: list[str], width: float = 300, height: float = 300) -> list[str]

Filter images based on their width and height.

AI_rewrite_text(llm: object, text: str, max_length: int = 300, target_language: str = 'English') -> str

Using LLM to rewrite an article text in a limited length.

search_google

Functions:

  • Distance

    Get google distance between words

  • get_user_agent

    Get a random user agent string.

  • get_hits

    This function return the amount of hits on search query

  • Download

    Download url as html file

  • search

    SEARCH

Distance(term1, term2)

Get google distance between words Returns float

get_user_agent()

Get a random user agent string. Return string

get_hits(query, tld='com', lang='sv', tbs='0', safe='off', extra_params={}, tpe='', user_agent=None)

This function return the amount of hits on search query Return int

Download(url_list, out_format, download_dir)

Download url as html file Returns folder

search(query, tld='com', lang='en', tbs='0', safe='off', num=10, start=0, stop=10, pause=2.0, only_standard=False, extra_params={}, tpe='', user_agent=None, type='text', rights='', download=False, download_dir='downloads', out_format='html')

SEARCH This is a simplified search function implementation. I added some parameters to make it more generic towards google and google_search_image import. I have not experimented with all different parameters. Code assume from examples on the imported libraries github repos. ARGUMENTS: query (str) – Query string. Must NOT be url-encoded. tld (str) – Top level domain. lang (str) – Language. tbs (str) – Time limits (i.e “qdr:h” => last hour, “qdr:d” => last 24 hours, “qdr:m” => last month). safe (str) – Safe search. num (int) – Number of results per page. start (int) – First result to retrieve. or None stop (int) – Last result to retrieve. Use None to keep searching forever. of str or None (list) – A list of web to constrain the search. pause (float) – Lapse to wait between HTTP requests. A lapse too long will make the search slow, but a lapse too short may cause Google to block your IP. Your mileage may vary! only_standard (bool) – If True, only returns the standard results from each page. If False, it returns every possible link from each page, except for those that point back to Google itself. Defaults to False for backwards compatibility with older versions of this module. of str to str extra_params (dict) – A dictionary of extra HTTP GET parameters, which must be URL encoded. For example if you don’t want Google to filter similar results you can set the extra_params to {‘filter’: ‘0’} which will append ‘&filter=0’ to every query. tpe (str) – Search type (images, videos, news, shopping, books, apps) Use the following values {videos: ‘vid’, images: ‘isch’, news: ‘nws’, shopping: ‘shop’, books: ‘bks’, applications: ‘app’} or None user_agent (str) – User agent for the HTTP requests. Use None for the default. type - Changes which function to use. ----- For images only ----- download_dir - if download is active, download_dir will discribe output directory rights - (str) - Values labeled-for-reuse-with-modifications,labeled-for-reuse, labeled-for-noncommercial-reuse-with-modification,labeled-for-nocommercial-reuse download - Download html, pdf or image, Takes a set of urls and tries to download them to download_dir, If download_dir is None, won't save on drive, Return reference list to images, download_dir file to save to # Read more here: https://python-googlesearch.readthedocs.io/en/latest/

Returns:

  • Generator (iterator) that yields found URLs. If the stop parameter is None the iterator will loop forever.

search_image

Functions:

search_image(keywords, safe=False, download=False, num=10, pause=2.0, output_dir='download_image', time='past-7-days', time_range=None, rights='', similar_images=False, img_format=None, color=None, color_type=None, size='>640*480', img_type=None, url=None, specific_site=None, single_image=None, ignore_urls=None)

Search and download images

Agrs

keywords (str): Query string. Must NOT be url-encoded. tld (str) : Top level domain. format (str): format/extension of the image. Possible values: jpg, gif, png, bmp, svg, webp, ico, raw

safe (str) : Safe search. num (int) : Number of results per page. start (int) : First result to retrieve. or None stop (int) : Last result to retrieve. Use None to keep searching forever. of str or None (list) : A list of web to constrain the search. pause (float) : Lapse to wait between HTTP requests. A lapse too long will make the search slow, but a lapse too short may cause Google to block your IP. Your mileage may vary! only_standard (bool) : If True, only returns the standard results from each page. If False, it returns every possible link from each page, except for those that point back to Google itself. Defaults to False for backwards compatibility with older versions of this module. of str to str extra_params (dict) : A dictionary of extra HTTP GET parameters, which must be URL encoded. For example if you don't want Google to filter similar results you can set the extra_params to {'filter': '0'} which will append '&filter=0' to every query. tpe (str) : Search type (images, videos, news, shopping, books, apps) Use the following values {videos: 'vid', images: 'isch', news: 'nws', shopping: 'shop', books: 'bks', applications: 'app'} or None user_agent (str) : User agent for the HTTP requests. Use None for the default. type: Changes which function to use. output_dir: if download is active, output_dir will discribe output directory rights (str): Values labeled-for-reuse-with-modifications,labeled-for-reuse, labeled-for-noncommercial-reuse-with-modification,labeled-for-nocommercial-reuse download: Download html, pdf or image, Takes a set of urls and tries to download them to output_dir, If output_dir is None, won't save on drive, Return reference list to images, output_dir file to save to

Returns:

  • Generator (iterator) that yields found URLs. If the stop parameter is None the iterator will loop forever.

Refs

text_tool

Functions:

Attributes:

DATA_PATH = os.path.dirname(os.path.abspath(__file__)) + '/data' module-attribute

pre_defined_term = {'\\[\\d+\\]': '', '\\s+\\.': '.', ' ': ' '} module-attribute

replace_term(text: str, *, term_dict: dict[str] = None, term_file: str = None, use_predefine_term: bool = False) -> str

Replace terms in text by term_dict Args: term_dict (dict): dict of terms to be replaced. Key is the term to be replaced by its value. Key can be regex pattern. term_file (str): path to yaml file that contains term_dict. Use this if term_dict is None. use_predefine_term (bool): use pre_defined_term for post processing. Default is False.

mark_heading(text: str = None) -> str

Split markdown text into chapters

remove_paragraph(text: str, start_term: str, end_term: str, keep_end_term=True) -> str

Remove paragraph that start with 'start_term' and end with 'end_term'.

pre_text(filename=None)

Load preTEXT

Parameters:

  • filename (Str, default: None ) –

    name of preTEXT file

Returns:

  • ds ( Series ) –

    Series

read_text_pair(filename, separator=':')

read file that format as pair, separated by semicolon (:).

Parameters:

  • filename (Str) –

    name of text file

Returns:

  • df ( DataFrame ) –

    DataFrame contains 2 columns c1 and c2, corresponds to pair text

read_text_column(filename, separator=':', column_line=0)

read file that format as pair, separated by semicolon (:).

Parameters:

  • filename (Str) –

    name of text file

Returns:

  • df ( DataFrame ) –

    DataFrame contains 2 columns c1 and c2, corresponds to pair text

read_text_news(file_text, whole_text=False, replace_Abbrev=False, file_list_abbrev=DATA_PATH + '/list_abbrev.txt')

Read text Args: file_text (str): plain text file. whole_text (bool): read whole text and not decompose author, title,... replace_Abbrev (bool): cho0se to auto replace Abbrev file_list_abbrev (str): filename of .json file, contains list of Abbrev

Note

no empty line between 1st and 2nd lines

read_text_subtitle(filename, format_=None)

Read caption file, support format: json, rst

json_caption_to_text(transcript_dict, out_file=None)

transcript (dict): download by youtubesearchpython.Transcript["segments"]

json_caption_unify(transcript_dict)

transcript (dict): download by youtubesearchpython.Transcript["segments"]

convert_json_to_srt(transcript_dict, fps=25, out_file='transcript_rst.rst')

transcript (dict): download by youtubesearchpython.Transcript["segments"] fps (int): Frame per second. Can check using moviepy package

tts

Functions:

tts_edge(text: str, voice: str = 'vi-VN-NamMinhNeural', rate: int = 0, volume: int = 0, pitch: int = 0, audio_file: str = 'voice.mp3')

TTS using Edge browser. Args: text (str): text string voice (str): name of voice. rate (int): voice speed, form in Pos/Neg percentage. E.g., "+5%" or "-10%" volume (int): voice volume, form in Pos/Neg percentage. pitch (int): voice pitch, form in Pos/Neg Hz. E.g., "+5Hz" or "-10Hz". Pitch determine how high or low a sound is (determined by the frequency of the sound waves). Adjusting the pitch can change the perceived tone or melody. For example, a high-pitched sound might be like a whistle or a child's voice, while a low-pitched sound might resemble a bass drum or an adult male voice. audio_file (str): name of output audio file.

tts_edge_voices(lang: str = 'vi')

Get available voices in Edge browser.

tts_gTTS(text, lang='en', audio_file='voice.mp3')

Parameters:

  • text (str) –

    text string

tts_off_pyttsx3(text, lang='vi', voice_name='', voice_id=None, rate=150, vol=1.0, audio_file='voice.mp3')

Convert text to speech using Windows' voices. Agrs: lang (str): select the language. Possible with all voices available in local computer: 'vi', 'VN', 'US',... text (str): string of text rate (float): voice speed voice_name (str): name of speaker. voice_id (int): id of voices in Windows, this parameter sets both languages and voice.