Skip to content

API

thutil

The package for general ulitities.

Developed and maintained by C.Thang Nguyen

Modules:

Attributes:

__description__ = 'Python package' module-attribute

__long_description__ = 'ML based applications ' module-attribute

__author__ = 'thangckt' module-attribute

__version

Attributes:

TYPE_CHECKING = False module-attribute

VERSION_TUPLE = Tuple[Union[int, str], ...] module-attribute

version: str = '0.1.dev150+gb76f61a.d20241231' module-attribute

__version__: str = '0.1.dev150+gb76f61a.d20241231' module-attribute

__version_tuple__: VERSION_TUPLE = (0, 1, 'dev150', 'gb76f61a.d20241231') module-attribute

version_tuple: VERSION_TUPLE = (0, 1, 'dev150', 'gb76f61a.d20241231') module-attribute

config

Functions:

validate_config(config_dict=None, config_file=None, schema_dict=None, schema_file=None, allow_unknown=False, require_all=False)

Validate the config file with the schema file.

Parameters:

  • config_dict (dict, default: None ) –

    config dictionary. Defaults to None.

  • config_file (str, default: None ) –

    path to the YAML config file, will override config_dict. Defaults to None.

  • schema_dict (dict, default: None ) –

    schema dictionary. Defaults to None.

  • schema_file (str, default: None ) –

    path to the YAML schema file, will override schema_dict. Defaults to None.

  • allow_unknown (bool, default: False ) –

    whether to allow unknown fields in the config file. Defaults to False.

  • require_all (bool, default: False ) –

    whether to require all fields in the schema file to be present in the config file. Defaults to False.

Raises:

  • ValueError

    if the config file does not match the schema

load_setting_file(filename: Union[str, Path]) -> dict

Load data from a JSON or YAML file.

Parameters

filename : str or os.PathLike The filename to load data from, whose suffix should be .json, .yaml, or .yml

Returns

dict jdata: (dict) The data loaded from the file

Raises

ValueError If the file format is not supported

load_jsonc(filename: str) -> dict

Load data from a JSON file that allow comments.

unpack_dict(nested_dict: dict) -> dict

Unpack one level of nested dictionary.

write_yaml(jdata: dict, filename: Union[str, Path])

Write data to a YAML file.

read_yaml(filename: Union[str, Path]) -> dict

Read data from a YAML file.

io

Functions:

  • combine_text_files

    Combine text files into a single file in a memory-efficient. Read and write in chunks to avoid loading large files into memory

  • download_rawtext

    Download raw text from a URL.

combine_text_files(files: list[str], output_file: str, chunk_size: int = 1024)

Combine text files into a single file in a memory-efficient. Read and write in chunks to avoid loading large files into memory

Parameters:

  • files (list[str]) –

    List of file paths to combine.

  • output_file (str) –

    Path to the output file.

  • chunk_size (int, default: 1024 ) –

    Size of each chunk in KB to read/write. Defaults to 1024 KB.

download_rawtext(url: str, outfile: str = None) -> str

Download raw text from a URL.

path

Functions:

  • make_dir

    Create a directory with a backup option.

  • make_dir_ask_backup

    Make a directory and ask for backup if the directory already exists.

  • ask_yes_no

    Asks a yes/no/backup question and returns the response.

  • list_paths

    List all files/folders in given directories and their subdirectories that match the given patterns.

  • collect_files

    Collect files from a list of paths (files/folders). Will search files in folders and their subdirectories.

  • change_pathname

    change path names

  • remove_files

    Remove files from a given list of file paths.

  • remove_dirs

    Remove a list of directories.

  • remove_files_in_paths

    Remove files in the files list in the paths list.

  • remove_dirs_in_paths

    Remove directories in the dirs list in the paths list.

  • copy_file

    Copy a file/folder from the source path to the destination path.

  • move_file

    Move a file/folder from the source path to the destination path.

  • scan_dirs

    Check if the folders contains and not contains some files.

make_dir(path: str, backup: bool = True)

Create a directory with a backup option.

make_dir_ask_backup(dir_path: str)

Make a directory and ask for backup if the directory already exists.

ask_yes_no(question: str) -> str

Asks a yes/no/backup question and returns the response.

list_paths(paths: list[str], patterns: list[str], recursive=True) -> list[str]

List all files/folders in given directories and their subdirectories that match the given patterns.

Parameters

paths : list[str] The list of paths to search files/folders. patterns : list[str] The list of patterns to apply to the files. Each filter can be a file extension or a pattern.

Returns:

List[str]: A list of matching paths.

Example:
folders = ["path1", "path2", "path3"]
patterns = ["*.ext1", "*.ext2", "something*.ext3", "*folder/"]
files = list_files_in_dirs(folders, patterns)
Note:
  • glob() does not list hidden files by default. To include hidden files, use glob(".*", recursive=True).
  • When use recursive=True, must include ** in the pattern to search subdirectories.
    • glob("*", recursive=True) will search all FILES & FOLDERS in the CURRENT directory.
    • glob("*/", recursive=True) will search all FOLDERS in the current CURRENT directory.
    • glob("**", recursive=True) will search all FILES & FOLDERS in the CURRENT & SUB subdirectories.
    • glob("**/", recursive=True) will search all FOLDERS in the current CURRENT & SUB subdirectories.
    • "/*" is equivalent to "".
    • "/*/" is equivalent to "/".
  • IMPORTANT: "/" will replicate the behavior of "**", then give unexpected results.

collect_files(paths: list[str], patterns: list[str]) -> list[str]

Collect files from a list of paths (files/folders). Will search files in folders and their subdirectories.

Parameters

paths : list[str] The list of paths to collect files from. patterns : list[str] The list of patterns to apply to the files. Each filter can be a file extension or a pattern.

Returns:

List[str]: A list of paths matching files.

change_pathname(paths: list[str], old_string: str, new_string: str, replace: bool = False) -> None

change path names

Parameters:

  • paths (list[str]) –

    paths to the files/dirs

  • old_string (str) –

    old string in path name

  • new_string (str) –

    new string in path name

  • replace (bool, default: False ) –

    replace the old path name if the new one exists. Defaults to False.

remove_files(files: list[str]) -> None

Remove files from a given list of file paths.

Parameters:

  • files (list[str]) –

    list of file paths

remove_dirs(dirs: list[str]) -> None

Remove a list of directories.

Parameters:

  • dirs (list[str]) –

    list of directories to remove.

remove_files_in_paths(files: list, paths: list) -> None

Remove files in the files list in the paths list.

remove_dirs_in_paths(dirs: list, paths: list) -> None

Remove directories in the dirs list in the paths list.

copy_file(src_path: str, dest_path: str)

Copy a file/folder from the source path to the destination path.

move_file(src_path: str, dest_path: str)

Move a file/folder from the source path to the destination path.

scan_dirs(dirs: list[str], with_files: list[str], without_files: list[str] = []) -> list[str]

Check if the folders contains and not contains some files.

Parameters:

  • dirs (list[str]) –

    The paths of dirs to scan.

  • with_files (list[str]) –

    The files that should exist in the path.

  • without_files (list[str], default: [] ) –

    The files that should not exist in the work_dir. Defaults to [].

Returns:

  • list[str]

    list[str]: The paths that meet the conditions.

pkg

Functions:

create_logger(logger_name: str = None, log_file: str = None, level: str = 'INFO', level_logfile: str = None, format_: str = 'info') -> logging.Logger

Create and configure a logger with console and optional file handlers.

check_package(package_name: str, git_repo: str = None, auto_install: bool = False, extra_commands: list[str] = None) -> None

Check if the required packages are installed

_install_package(package_name: str, git_repo: str = None) -> None

Install the required package


package_name (str): package name
git_repo (str): git path for the package

get_func_args(func)

Get the arguments of a function

dependency_info(modules=['numpy', 'polars', 'thutil', 'ase']) -> str

Get the dependency information

sth2sth

Functions:

file2str(file_path: Union[str, Path]) -> str

str2file(text: str, file_path: Union[str, Path]) -> None

file2list(file_path: Union[str, Path]) -> list[str]

list2file(text_list: list, file_path: Union[str, Path]) -> None

float2str(floatnum, decimals=6)

convert float number to str REF: https://stackoverflow.com/questions/2440692/formatting-floats-without-trailing-zeros

Parameters:

  • floatnum (float) –

    float number

  • fmt (str) –

    format of the output string

Returns:

  • s ( str ) –

    string of the float number

stuff

Functions:

chunk_list(input_list: list, n: int) -> Generator

Yield successive n-sized chunks from input_list.

fill_text_center(input_text='example', fill='-', max_length=60)

Create a line with centered text.

fill_text_left(input_text='example', left_margin=15, fill='-', max_length=60)

Create a line with left-aligned text.

fill_text_box(input_text='', fill=' ', sp='|', max_length=60)

Put the string at the center of | |.