vid
thml.vid
¶
Modules:
-
audio_book
– -
download_youtube
– -
make_video
– -
news
– -
search_google
– -
search_image
– -
text_tool
– -
tts
–
audio_book
¶
download_youtube
¶
Functions:
-
videos_from_channel
–get all video links from a channel in "period" days
-
download_video
–Download videos from list of URLs
-
download_srt_caption
–Download video caption in .srt format
-
download_json_caption
–Download video caption in .json format
videos_from_channel(channel_URL, period=1)
¶
get all video links from a channel in "period" days
download_video(URLs, only_video=False, only_audio=False)
¶
Download videos from list of URLs Args: URLs (list of str): list of URL only_video (bool = True): dowload video only only_audio (bool = True): dowload audio only
download_srt_caption(url, lang='en', out_file='transcript_srt')
¶
Download video caption in .srt format Args: download_cap (bool = False): download caption lang (str = 'en'): language of caption
Notes" Error now: https://github.com/pytube/pytube/issues/1085
download_json_caption(URL, out_file='transcript_json.json')
¶
Download video caption in .json format
make_video
¶
Functions:
-
mknews_video_intro
–Make INTRO video
-
mknews_video_outro
–Make OUTRO video
-
mknews_audio
–create audio from file_text
-
mknews_1_video
–Make a video with concept:
-
mknews_lists_videos
–Make a videos in subfolder:
-
set_bg_audio
–Set background_audio for video
-
mknews_video_toc
–Make a TOC video:
-
concate_audio_files
–concate a list of audios
-
concate_video_files
–concate a list of videos
-
add_logo_spokeman
–add Logo on videos
-
split_video
–Split video into n parts
-
speech_word_by_word
– -
speech_1_pair_lang
–Returns:
-
speech_list_pair_lang
–Args:
-
mkvid_1_pair_lang
–Make a video with concept:
-
mkvid_list_pair_lang_from_df
–Args:
mknews_video_intro(vid_size=(1280, 720), lang='vi', rate=150, bg_video='default', bg_audio='default', bg_audio_factor=0.3, out_file='vid_intro.mp4')
¶
Make INTRO video
Parameters:
-
vid_size
(tuple
, default:(1280, 720)
) –Video size.
-
lang
(str
, default:'vi'
) –language of news
-
rate
(float
, default:150
) –speed of voice
-
bg_video
(str
, default:'default'
) –filenames of video/image background. Possible: 'default', 'filename'
-
bg_audio
(str
, default:'default'
) –file name of audio background. Possible: 'default', 'filename'
-
bg_audio_factor
(float
, default:0.3
) –factor of backgroun audio with main voice.
Info
Video size for 1 minute video, ref
Type | Resolution | File Size |
---|---|---|
Ultra HD or 4K | 3840 x 2160 | 320 MB |
Full HD | 1920x1080 | 149 MB |
HD | 1280x720 | 105 MB |
SD | 720x480 | 26 MB |
mknews_video_outro(vid_size=(1280, 720), lang='vi', rate=150, bg_video='default', bg_audio='default', bg_audio_factor=0.3, out_file='vid_outro.mp4')
¶
Make OUTRO video Args: vid_size (tuple): Video size. lang (str): language of news rate (float): speed of voice bg_video (str): filenames of video/image background. Possible: 'default', 'filename' bg_audio (str): file name of audio background. Possible: 'default', 'filename' bg_audio_factor (float): factor of backgroun audio with main voice.
mknews_audio(file_text, lang='vi', rate=150, greet_word='', end_word='', out_file='audio_news.mp3')
¶
create audio from file_text
Parameters:
-
file_text
(str
) –lain text file.
-
lang
(str
, default:'vi'
) –language of news
-
greet_word
(str
, default:''
) –Add speech at begin text. Possible: 'intro', 'middle', ''
-
end_word
(str
, default:''
) –Add speech at begin text. Possible: 'outro', ''
Returns:
-
out_file
(Obj
) –audio file
mknews_1_video(lang='vi', rate=150, greet_word='', end_word='', vid_size=(1280, 720), img_duration=15, bg_video='', bg_audio='random', bg_audio_factor=0.2, out_file='vid_news.mp4')
¶
Make a video with concept
- put a text file and all videos, images into a folder
- function will convert text to audio
- make video base on length of audio
- first use videos, if not enough duration then add images into video
- if bg_video: make video with only background
- random_short: use short videos from computer
- random_download: random download long videos from a predefined list
Parameters:
-
lang
(str
, default:'vi'
) –language of news
-
rate(float)
–speed of voice
-
vid_size
(tuple
, default:(1280, 720)
) –Video size.
-
img_duration
(float
, default:15
) –duration of an image in video.
-
bg_video
(str
, default:''
) –filenames of video/image background. Possible: '', 'random_short', 'random_long'
-
bg_audio
(str
, default:'random'
) –file name of audio background. Possible: "filename", 'random'
-
bg_audio_factor
(float
, default:0.2
) –factor of backgroun audio with main voice.
-
greet_word
(str
, default:''
) –Add speech at begin text. Possible: 'intro', 'middle', ''
-
end_word
(str
, default:''
) –Add speech at begin text. Possible: 'outro', ''
Note
video/image files should begin with a number to specify its order: '1_video_...' or '3_image_...' Only fist ".txt" files is used
mknews_lists_videos(sub_folder='news*', lang='vi', rate=150, greet_word='', end_word='', vid_size=(1280, 720), img_duration=15, bg_video='', bg_audio='', bg_audio_factor=0.2, padding=0, logo='STV', logo_pos='left', out_file='vid_all_news.mp4')
¶
Make a videos in subfolder:
Parameters:
-
sub_folder
(str
, default:'news*'
) –keyword to search subfolders.
-
bg_audio
(str
, default:''
) –file name of audio background. Possible: "filename", 'random'
-
padding
(float
, default:0
) –gap between successive video
set_bg_audio(file_video, bg_audio='random', bg_audio_factor=0.2, keep_original=False)
¶
Set background_audio for video
Parameters:
-
bg_audio
(str
, default:'random'
) –file name of audio background. Possible: "filename", 'random'
-
bg_audio_factor
(float
, default:0.2
) –factor of backgroun audio with main voice.
-
keep_original
(bool
, default:False
) –keep original video or not
mknews_video_toc(sub_folder='news*', file_video_news='vid_news.mp4', vid_size=(1280, 720), bg_video='default', bg_audio='default', bg_audio_factor=0.3, border_factor=0.2, with_title=False, out_file='vid_TOC.mp4')
¶
Make a TOC video:
Parameters:
-
sub_folder
(str
, default:'news*'
) –keyword to search subfolders.
-
file_video_news
(str
, default:'vid_news.mp4'
) –filename of breakingNews in each subfolder.
concate_audio_files(list_files, padding=0, out_file='concate_audio.mp3')
¶
concate a list of audios
Parameters:
-
list_files
(list
) –list contains all audio files.
Returns:
-
file
(obj
) –audio file.
concate_video_files(list_files, padding=0, out_file='concate_videoNews.mp4')
¶
concate a list of videos
Parameters:
-
list_files
(list
) –list contains all video files.
-
vid_size
(tuple
) –Video size.
Returns:
-
file
(obj
) –audio file.
add_logo_spokeman(file_video, vid_size=(1280, 720), logo='STV', logo_pos='left', spokeman='', spokeman_pos='left', h_spokeman=320, keep_original=False)
¶
add Logo on videos
Parameters:
-
file_video
(str
) –video filename.
-
vid_size
(tuple
, default:(1280, 720)
) –Video size.
-
logo
(str
, default:'STV'
) –Put logo on video. Possible: "N5_1", "N5_2", 'X7', 'STV', ""
-
logo_pos
(float
, default:'left'
) –Position of logo. Possible: "left", "rigt"
-
spokeman
(str
, default:''
) –Spokeman on video. Possible: '', 'Anonymous'
-
h_spokeman
(float
, default:320
) –height of Spokeman
-
background
(str
) –file name of video/image background. Possible: "filename", 'random'
-
bg_audio
(str
) –file name of audio background
split_video(video_file, n=3)
¶
Split video into n parts
speech_word_by_word(text, lang='VN', rate=150, vol=1.0, audio_file='word_by_word.mp3')
¶
speech_1_pair_lang(text1, text2, lang1='VN', lang2='EN', voice_name1='', voice_name2='', rate1=130, rate2=120, repeat_slow=60, repeat_fast=60, out_file='pair_lang_audio.mp3')
¶
Returns:
-
file
(file
) –audio file, if
out_file
is notNone
. -
clip_audio
(Obj
) –audio file, if
out_file
isNone
.
speech_list_pair_lang(df, lang1='VN', lang2='EN', rate1=130, rate2=120, repeat_slow=60, repeat_fast=60, out_file='pair_lang_audio_all.mp3')
¶
Parameters:
-
df
(DataFrame
) –contains 2 columns for langs.
mkvid_1_pair_lang(text1, text2, lang1='VN', lang2='EN', voice_name1='', voice_name2='', rate1=130, rate2=120, repeat_slow=60, repeat_fast=60, vid_size=(1280, 720), font_size=80, text_color1='blue', text_color2='black', bg_color1='azure3', bg_color2='azure4', padding=1, show_flag=True, out_file='pair_lang.mp4')
¶
Make a video with concept:
Args:
mkvid_list_pair_lang_from_df(df, lang1='VN', lang2='EN', voice_name1='', voice_name2='', rate1=130, rate2=120, repeat_slow=60, repeat_fast=60, vid_size=(1280, 720), font_size=80, text_color1='blue', text_color2='black', bg_color1='azure4', bg_color2='CadetBlue4', padding=1, show_flag=True, out_file='vid_pair_lang.mp4')
¶
Parameters:
-
df
(DataFrame
) –contains 2 columns for langs.
news
¶
Modules:
making_news
¶
Functions:
-
filter_images
–Filter images based on their width and height.
-
AI_rewrite_text
–Using LLM to rewrite an article text in a limited length.
filter_images(filenames: list[str], width: float = 300, height: float = 300) -> list[str]
¶
Filter images based on their width and height.
AI_rewrite_text(llm: object, text: str, max_length: int = 300, target_language: str = 'English') -> str
¶
Using LLM to rewrite an article text in a limited length.
search_google
¶
Functions:
-
Distance
–Get google distance between words
-
get_user_agent
–Get a random user agent string.
-
get_hits
–This function return the amount of hits on search query
-
Download
–Download url as html file
-
search
–SEARCH
Distance(term1, term2)
¶
Get google distance between words Returns float
get_user_agent()
¶
Get a random user agent string. Return string
get_hits(query, tld='com', lang='sv', tbs='0', safe='off', extra_params={}, tpe='', user_agent=None)
¶
This function return the amount of hits on search query Return int
Download(url_list, out_format, download_dir)
¶
Download url as html file Returns folder
search(query, tld='com', lang='en', tbs='0', safe='off', num=10, start=0, stop=10, pause=2.0, only_standard=False, extra_params={}, tpe='', user_agent=None, type='text', rights='', download=False, download_dir='downloads', out_format='html')
¶
SEARCH This is a simplified search function implementation. I added some parameters to make it more generic towards google and google_search_image import. I have not experimented with all different parameters. Code assume from examples on the imported libraries github repos. ARGUMENTS: query (str) – Query string. Must NOT be url-encoded. tld (str) – Top level domain. lang (str) – Language. tbs (str) – Time limits (i.e “qdr:h” => last hour, “qdr:d” => last 24 hours, “qdr:m” => last month). safe (str) – Safe search. num (int) – Number of results per page. start (int) – First result to retrieve. or None stop (int) – Last result to retrieve. Use None to keep searching forever. of str or None (list) – A list of web to constrain the search. pause (float) – Lapse to wait between HTTP requests. A lapse too long will make the search slow, but a lapse too short may cause Google to block your IP. Your mileage may vary! only_standard (bool) – If True, only returns the standard results from each page. If False, it returns every possible link from each page, except for those that point back to Google itself. Defaults to False for backwards compatibility with older versions of this module. of str to str extra_params (dict) – A dictionary of extra HTTP GET parameters, which must be URL encoded. For example if you don’t want Google to filter similar results you can set the extra_params to {‘filter’: ‘0’} which will append ‘&filter=0’ to every query. tpe (str) – Search type (images, videos, news, shopping, books, apps) Use the following values {videos: ‘vid’, images: ‘isch’, news: ‘nws’, shopping: ‘shop’, books: ‘bks’, applications: ‘app’} or None user_agent (str) – User agent for the HTTP requests. Use None for the default. type - Changes which function to use. ----- For images only ----- download_dir - if download is active, download_dir will discribe output directory rights - (str) - Values labeled-for-reuse-with-modifications,labeled-for-reuse, labeled-for-noncommercial-reuse-with-modification,labeled-for-nocommercial-reuse download - Download html, pdf or image, Takes a set of urls and tries to download them to download_dir, If download_dir is None, won't save on drive, Return reference list to images, download_dir file to save to # Read more here: https://python-googlesearch.readthedocs.io/en/latest/
Returns:
-
–
Generator (iterator) that yields found URLs. If the stop parameter is None the iterator will loop forever.
search_image
¶
Functions:
-
search_image
–Search and download images
search_image(keywords, safe=False, download=False, num=10, pause=2.0, output_dir='download_image', time='past-7-days', time_range=None, rights='', similar_images=False, img_format=None, color=None, color_type=None, size='>640*480', img_type=None, url=None, specific_site=None, single_image=None, ignore_urls=None)
¶
Search and download images
Agrs
keywords (str): Query string. Must NOT be url-encoded. tld (str) : Top level domain. format (str): format/extension of the image. Possible values: jpg, gif, png, bmp, svg, webp, ico, raw
safe (str) : Safe search. num (int) : Number of results per page. start (int) : First result to retrieve. or None stop (int) : Last result to retrieve. Use None to keep searching forever. of str or None (list) : A list of web to constrain the search. pause (float) : Lapse to wait between HTTP requests. A lapse too long will make the search slow, but a lapse too short may cause Google to block your IP. Your mileage may vary! only_standard (bool) : If True, only returns the standard results from each page. If False, it returns every possible link from each page, except for those that point back to Google itself. Defaults to False for backwards compatibility with older versions of this module. of str to str extra_params (dict) : A dictionary of extra HTTP GET parameters, which must be URL encoded. For example if you don't want Google to filter similar results you can set the extra_params to {'filter': '0'} which will append '&filter=0' to every query. tpe (str) : Search type (images, videos, news, shopping, books, apps) Use the following values {videos: 'vid', images: 'isch', news: 'nws', shopping: 'shop', books: 'bks', applications: 'app'} or None user_agent (str) : User agent for the HTTP requests. Use None for the default. type: Changes which function to use. output_dir: if download is active, output_dir will discribe output directory rights (str): Values labeled-for-reuse-with-modifications,labeled-for-reuse, labeled-for-noncommercial-reuse-with-modification,labeled-for-nocommercial-reuse download: Download html, pdf or image, Takes a set of urls and tries to download them to output_dir, If output_dir is None, won't save on drive, Return reference list to images, output_dir file to save to
Returns:
-
–
Generator (iterator) that yields found URLs. If the stop parameter is None the iterator will loop forever.
Refs
text_tool
¶
Functions:
-
replace_term
–Replace terms in text by term_dict
-
mark_heading
–Split markdown text into chapters
-
remove_paragraph
–Remove paragraph that start with 'start_term' and end with 'end_term'.
-
pre_text
–Load preTEXT
-
read_text_pair
–read file that format as pair, separated by semicolon (:).
-
read_text_column
–read file that format as pair, separated by semicolon (:).
-
read_text_news
–Read text
-
find_abbrev
– -
read_text_subtitle
–Read caption file, support format: json, rst
-
json_caption_to_text
–transcript (dict): download by youtubesearchpython.Transcript["segments"]
-
json_caption_unify
–transcript (dict): download by youtubesearchpython.Transcript["segments"]
-
convert_json_to_srt
–transcript (dict): download by youtubesearchpython.Transcript["segments"]
Attributes:
DATA_PATH = os.path.dirname(os.path.abspath(__file__)) + '/data'
module-attribute
¶
pre_defined_term = {'\\[\\d+\\]': '', '\\s+\\.': '.', ' ': ' '}
module-attribute
¶
replace_term(text: str, *, term_dict: dict[str] = None, term_file: str = None, use_predefine_term: bool = False) -> str
¶
Replace terms in text by term_dict Args: term_dict (dict): dict of terms to be replaced. Key is the term to be replaced by its value. Key can be regex pattern. term_file (str): path to yaml file that contains term_dict. Use this if term_dict is None. use_predefine_term (bool): use pre_defined_term for post processing. Default is False.
mark_heading(text: str = None) -> str
¶
Split markdown text into chapters
remove_paragraph(text: str, start_term: str, end_term: str, keep_end_term=True) -> str
¶
Remove paragraph that start with 'start_term' and end with 'end_term'.
pre_text(filename=None)
¶
Load preTEXT
Parameters:
-
filename
(Str
, default:None
) –name of preTEXT file
Returns:
-
ds
(Series
) –Series
read_text_pair(filename, separator=':')
¶
read file that format as pair, separated by semicolon (:).
Parameters:
-
filename
(Str
) –name of text file
Returns:
-
df
(DataFrame
) –DataFrame contains 2 columns
c1
andc2
, corresponds to pair text
read_text_column(filename, separator=':', column_line=0)
¶
read file that format as pair, separated by semicolon (:).
Parameters:
-
filename
(Str
) –name of text file
Returns:
-
df
(DataFrame
) –DataFrame contains 2 columns
c1
andc2
, corresponds to pair text
read_text_news(file_text, whole_text=False, replace_Abbrev=False, file_list_abbrev=DATA_PATH + '/list_abbrev.txt')
¶
Read text Args: file_text (str): plain text file. whole_text (bool): read whole text and not decompose author, title,... replace_Abbrev (bool): cho0se to auto replace Abbrev file_list_abbrev (str): filename of .json file, contains list of Abbrev
Note
no empty line between 1st and 2nd lines
find_abbrev(text)
¶
read_text_subtitle(filename, format_=None)
¶
Read caption file, support format: json, rst
json_caption_to_text(transcript_dict, out_file=None)
¶
transcript (dict): download by youtubesearchpython.Transcript["segments"]
json_caption_unify(transcript_dict)
¶
transcript (dict): download by youtubesearchpython.Transcript["segments"]
convert_json_to_srt(transcript_dict, fps=25, out_file='transcript_rst.rst')
¶
transcript (dict): download by youtubesearchpython.Transcript["segments"] fps (int): Frame per second. Can check using moviepy package
tts
¶
Functions:
-
tts_edge
–TTS using Edge browser.
-
tts_edge_voices
–Get available voices in Edge browser.
-
tts_gTTS
–Args:
-
tts_off_pyttsx3
–Convert text to speech using Windows' voices.
tts_edge(text: str, voice: str = 'vi-VN-NamMinhNeural', rate: int = 0, volume: int = 0, pitch: int = 0, audio_file: str = 'voice.mp3')
¶
TTS using Edge browser. Args: text (str): text string voice (str): name of voice. rate (int): voice speed, form in Pos/Neg percentage. E.g., "+5%" or "-10%" volume (int): voice volume, form in Pos/Neg percentage. pitch (int): voice pitch, form in Pos/Neg Hz. E.g., "+5Hz" or "-10Hz". Pitch determine how high or low a sound is (determined by the frequency of the sound waves). Adjusting the pitch can change the perceived tone or melody. For example, a high-pitched sound might be like a whistle or a child's voice, while a low-pitched sound might resemble a bass drum or an adult male voice. audio_file (str): name of output audio file.
tts_edge_voices(lang: str = 'vi')
¶
Get available voices in Edge browser.
tts_gTTS(text, lang='en', audio_file='voice.mp3')
¶
Parameters:
-
text
(str
) –text string
tts_off_pyttsx3(text, lang='vi', voice_name='', voice_id=None, rate=150, vol=1.0, audio_file='voice.mp3')
¶
Convert text to speech using Windows' voices. Agrs: lang (str): select the language. Possible with all voices available in local computer: 'vi', 'VN', 'US',... text (str): string of text rate (float): voice speed voice_name (str): name of speaker. voice_id (int): id of voices in Windows, this parameter sets both languages and voice.