rag

`thml.rag` ¶

Classes:

RAG –

Retrieval Augmented Generation (RAG) system.

Functions:

embedding_model –

Define the embedding function to use. See the list of available models of Langchain
llm_model –

Predefined LangChain's style LLM models to be used in the RAG system.
load_document –

Load documents from the given path, using langchain's [document_loaders].

`RAG(rag_path: str = '', doc_path: str = None, llm: object = None, embedding: object = None, text_splitter: object = None, db: object = None, rerank: bool = False, style: str = 'simple')` ¶

Retrieval Augmented Generation (RAG) system. Support vectorstore types: 'FAISS' or 'Chroma'. Default is 'FAISS'.

Use reranker to improve the quality of the retrieved documents. Default is False. Note that the reranker model must different from the embedding model.

Initialize the RAG system.

Methods:

ask –

Perform an question-answering task and generate the answer to the given query that content come from the documents.
ask_llm –

Ask the LLM model to generate the answer to the given query. It is different from qa function, which just return the answer if the documents contain the information. This function will generate the answer from the LLM model.
search –

Search information from the documents. This is actually perform retriever.invoke to retrieve information from vectorstore. Refer: https://python.langchain.com/docs/use_cases/question_answering/quickstart#retrieval-and-generation-retrieve.
set_retriever –

Define parameters for the retriever. See vectorstore.as_retriever for more information.
set_chain –

Set the style of the RAG system. The style can be 'simple', 'multi_query', or 'fusion'.

Attributes:

embedding –
db –
text_splitter –
retriever –
compressor –
compression_retriever –
llm –
info –

`embedding = embedding` `instance-attribute` ¶

`db = db` `instance-attribute` ¶

`text_splitter = text_splitter` `instance-attribute` ¶

`retriever = self.set_retriever(search_type='similarity', search_kwargs={'k': 6})` `instance-attribute` ¶

`compressor = reranker_model()` `instance-attribute` ¶

`compression_retriever = ContextualCompressionRetriever(base_compressor=self.compressor, base_retriever=self.retriever)` `instance-attribute` ¶

`llm = llm` `instance-attribute` ¶

`info` `property` ¶

`ask(question='what are the documents about?') -> str` ¶

Perform an question-answering task and generate the answer to the given query that content come from the documents.

Refer: https://python.langchain.com/docs/use_cases/question_answering/quickstart#retrieval-and-generation-retrieve

`ask_llm(question='Who are you?') -> str` ¶

Ask the LLM model to generate the answer to the given query. It is different from qa function, which just return the answer if the documents contain the information. This function will generate the answer from the LLM model.

`search(question: str = 'summary') -> list[Document]` ¶

Search information from the documents. This is actually perform retriever.invoke to retrieve information from vectorstore. Refer: https://python.langchain.com/docs/use_cases/question_answering/quickstart#retrieval-and-generation-retrieve.

Parameters:

query (str) –

The query to search for.
k (int) –

The number of documents to return.

Returns: results (list[Document]): The documents that match the query.

`set_retriever(search_type: str = 'similarity', search_kwargs: dict = None)` ¶

Define parameters for the retriever. See vectorstore.as_retriever for more information. Ref: https://python.langchain.com/docs/modules/data_connection/retrievers/vectorstore

Parameters:

search_type (Optional[str], default: 'similarity' ) –

Defines the type of search that the Retriever should perform. Can be "similarity" (default), "mmr", or
search_kwargs (Optional[Dict], default: None ) –

Keyword arguments to pass to the search function. Can include things like: k: Amount of documents to return (Default: 4) score_threshold: Minimum relevance threshold for similarity_score_threshold fetch_k: Amount of documents to pass to MMR algorithm (Default: 20) lambda_mult: Diversity of results returned by MMR; 1 for minimum diversity and 0 for maximum. (Default: 0.5) filter: Filter by document metadata

`set_chain(style: str = 'simple') -> None` ¶

Set the style of the RAG system. The style can be 'simple', 'multi_query', or 'fusion'.

`embedding_model(provider: str = 'huggingface', model_name: str = None, model_kwargs: dict = None) -> object` ¶

Define the embedding function to use. See the list of available models of Langchain

Check the latest performance benchmarks for text embedding models at MTEB leaderboards hosted by Hugging Face. The fields to consider are: - Score: the score we should focus on is "average" and "retrieval average". Both are highly correlated, so focusing on either works. - Sequence length tells us how many tokens a model can consume and compress into a single embedding. Generally speaking, we wouldn't recommend stuffing more than a paragraph of heft into a single embedding - so models supporting up to 512 tokens are usually more than enough. - Model size: the size of a model indicates how easy it will be to run. All models near the top of MTEB are reasonably sized. One of the largest is instructor-xl (requiring 4.96GB of memory), which we can easily run on consumer hardware.

Note

Embedding model may be referred to as SentenceTransformer in HF.

Some HF's embedding models: - mixedbread-ai/mxbai-embed-large-v1 - BAAI/bge-large-en-v1.5

Parameters:

provider (str, default: 'huggingface' ) –

The provider of the embeddings.
model_name (str, default: None ) –

The name of the model to use for the reranker.
model_kwargs (dict, default: None ) –

The parameters to use for the reranker.

`llm_model(service: str = 'web_opengpts', **kwargs: dict) -> LLM` ¶

Predefined LangChain's style LLM models to be used in the RAG system. Args: service (str): The LLM model service. Available options: 'openai', 'web_openai', 'web_opengpts', 'web_phind', 'web_llama2', 'web_bing' **kwargs: The model parameters, depend on the service. Returns: LLM: The LangChain's LLM.

`load_document(doc_path: str = '', ext: str = None) -> list[Document]` ¶

Load documents from the given path, using langchain's [document_loaders]. Supported file types: .pdf, .docx, .txt, .md, .lnk (Windows' shortcuts).

Parameters:

doc_path (str, default: '' ) –

The path to the folder containing the documents.
ext (str, default: None ) –

The file extension of the documents to be loaded, e.g., '.pdf'. Default, loads all files in the folder.

Returns: list[Document]: A list of Document objects, containing the loaded documents.

rag

thml.rag ¶

RAG(rag_path: str = '', doc_path: str = None, llm: object = None, embedding: object = None, text_splitter: object = None, db: object = None, rerank: bool = False, style: str = 'simple') ¶

embedding = embedding instance-attribute ¶

db = db instance-attribute ¶

text_splitter = text_splitter instance-attribute ¶

retriever = self.set_retriever(search_type='similarity', search_kwargs={'k': 6}) instance-attribute ¶

compressor = reranker_model() instance-attribute ¶

compression_retriever = ContextualCompressionRetriever(base_compressor=self.compressor, base_retriever=self.retriever) instance-attribute ¶

llm = llm instance-attribute ¶

info property ¶

ask(question='what are the documents about?') -> str ¶

ask_llm(question='Who are you?') -> str ¶

search(question: str = 'summary') -> list[Document] ¶

set_retriever(search_type: str = 'similarity', search_kwargs: dict = None) ¶

set_chain(style: str = 'simple') -> None ¶

embedding_model(provider: str = 'huggingface', model_name: str = None, model_kwargs: dict = None) -> object ¶

llm_model(service: str = 'web_opengpts', **kwargs: dict) -> LLM ¶

load_document(doc_path: str = '', ext: str = None) -> list[Document] ¶

`thml.rag` ¶

`RAG(rag_path: str = '', doc_path: str = None, llm: object = None, embedding: object = None, text_splitter: object = None, db: object = None, rerank: bool = False, style: str = 'simple')` ¶

`embedding = embedding` `instance-attribute` ¶

`db = db` `instance-attribute` ¶

`text_splitter = text_splitter` `instance-attribute` ¶

`retriever = self.set_retriever(search_type='similarity', search_kwargs={'k': 6})` `instance-attribute` ¶

`compressor = reranker_model()` `instance-attribute` ¶

`compression_retriever = ContextualCompressionRetriever(base_compressor=self.compressor, base_retriever=self.retriever)` `instance-attribute` ¶

`llm = llm` `instance-attribute` ¶

`info` `property` ¶

`ask(question='what are the documents about?') -> str` ¶

`ask_llm(question='Who are you?') -> str` ¶

`search(question: str = 'summary') -> list[Document]` ¶

`set_retriever(search_type: str = 'similarity', search_kwargs: dict = None)` ¶

`set_chain(style: str = 'simple') -> None` ¶

`embedding_model(provider: str = 'huggingface', model_name: str = None, model_kwargs: dict = None) -> object` ¶

`llm_model(service: str = 'web_opengpts', **kwargs: dict) -> LLM` ¶

`load_document(doc_path: str = '', ext: str = None) -> list[Document]` ¶