Skip to content

Io read data

thmd.io.read_data

This module contains functions to read numeric data from various formats of TEXT files.

Functions:

  • matrix_lost

    Function to read data in matrix form, in which number of values in each line are NOT equal (missing values)

  • matrix

    Function to read Data that is as a regular matrix.

  • logMFD

    Function to read data from LogMFD calculation.

  • lammps_var

    Function to extract variable values from LAMMPS input file.

  • plumed_var

    Function to extract variable values from PLUMED input file.

  • list_matrix_in_dir

    read data from all *.txt files in current and sub-folders.

matrix_lost(file_name: str, header_line: int = None, column_names: list[str] = None, comment: str = '#', sep: str = ' ', read_note: bool = False) -> pl.DataFrame

Function to read data in matrix form, in which number of values in each line are NOT equal (missing values) This cannot be read by Numpy, polars,...

The names of columns are extracted from header_line or set by column_names. If both column_names and header_line are not available, the default column's name is: 0 1 2...

Parameters:

  • file_name (str) –

    the text file.

  • header_line (int, default: None ) –

    the lines to extract column-names. Defaults to None.

  • column_names (list, default: None ) –

    Names of columns to extract. Defaults to None.

  • comment (str, default: '#' ) –

    comment-line mark. Defaults to "#".

  • sep (str, default: ' ' ) –

    separator. Defaults to " ".

  • read_note (bool, default: False ) –

    read 'note' column (any text beyond comment mark). Defaults to False.

Returns:

  • df ( DataFrame ) –

    polars DataFrame

Notes
  • To return 2 lists from list comprehension, it is better (may faster) running 2 separated list comprehensions.
  • .strip() function removes trailing and leading space in string.

matrix(file_name: str, header_line: int = None, column_names: list[str] = None, usecols: tuple[int] = None) -> pl.DataFrame

Function to read Data that is as a regular matrix. The names of columns are exatract based on column_names or header_line. If both column_names and header_line are not available, the default column's name is: 0 1 2...

Parameters:

  • file_name (str) –

    the text file.

  • header_line (int, default: None ) –

    the line to extract column-names. Defaults to None.

  • column_names (list[str], default: None ) –

    Names of columns to extract. Defaults to None.

  • usecols (tuple[int], default: None ) –

    only extract some columns. Defaults to None.

Returns:

  • df ( DataFrame ) –

    polars DataFrame

logMFD(file_name, dim=1) -> pl.DataFrame

Function to read data from LogMFD calculation.

Parameters:

  • file_name (str) –

    the logmfd.out file.

  • dim (int, default: 1 ) –

    dimension of LogMFD calulation. Defaults to 1.

Raises:

  • Exception

    description

Returns:

  • df ( DataFrame ) –

    polars DataFrame

lammps_var(file_name, var_names=None)

Function to extract variable values from LAMMPS input file.

Parameters:

  • file_name (str) –

    the text file in LAMMPS input format.

  • var_names (list, default: None ) –

    list of varibalbes to be extracted. Default to None. mean extract all variables.

Returns:

  • df ( DataFrame ) –

    polars DataFrame contains variable in Lammps file

plumed_var(file_name, var_name, block_name=None)

Function to extract variable values from PLUMED input file.

Parameters:

  • file_name (str) –

    the text file in LAMMPS input format.

  • var_name (str) –

    list of keyworks in PLUMED, ex: INTERVAL,...

  • block_name (str, default: None ) –

    block command in Plumed, ex: METAD, LOGMFD. Defaults to None.

Returns:

  • value ( float ) –

    value of plumed_var.

Refs

Include negative decimal numbers in regular expression

list_matrix_in_dir(search_key='deform_', file_ext='.txt', read_note=False, recursive=True)

read data from all *.txt files in current and sub-folders.

Parameters:

  • search_key (str, default: 'deform_' ) –

    a string to search file_name.

  • file_ext (str, default: '.txt' ) –

    file extension. Default to '.txt'

  • read_note (bool, default: False ) –

    read 'note' column in pl.DataFrame. Default to False.

  • recursive (bool, default: True ) –

    search in sub-folders. Default to True.

Returns:

  • ldf ( list ) –

    list of DataFrames.

  • files ( list ) –

    list of filenames.