0 / 0
Customizing RAG experiment settings
Last updated: Dec 12, 2024
Customizing RAG experiment settings

When you build a retrieval-augmented generation solution in AutoAI, you can customize experiment settings to tailor your results.

Important: This feature is a beta release. It is not intended for production use.

If you run a RAG experiment based on default settings, the AutoAI process selects:

  • The optimization metric for ranking the RAG pipelines
  • An embeddings model for encoding input data
  • The foundation models to try, based on the available list

To exercise more control over the RAG experiment, you can customize the experiment settings. After entering the required experiment definition information, click Experiment settings to customize options before running the experiment. Settings you can review or edit fall into three categories:

  • Retrieval & generation: choose which metric to use for optimizing the RAG pattern, how to retrieve the data, and the models AutoAI can use for the experiment.
  • Indexing: choose how the data is broken down, the metric used to measure data relevancy, and which embedding model AutoAI can use for the experiment.
  • Additional information: review the watsonx.ai Runtime instance and the environment to use for the experiment.

Retrieval and generation settings

View or edit the settings that are used to generate the RAG pipelines.

Optimization metric

Choose a metric to use for optimizing and ranking the RAG pipelines.

  • Answer faithfulness measures the accuracy of the generated response is to the retrieved text, including how closely it aligns semantically and syntactically.
  • Answer correctness measures the correctness of the generated answer including both the relevance of the retrieved context and the quality of the generated response.
  • Context correctness measures the relevancy of the retrieved content to the original question.

Retrieval methods

Choose the method for retrieving relevant data. Retrieval methods differ in the ways that they filter and rank documents.

  • Window retrieval method divides the indexed documents into windows, or chunks, and add content before and after the retrieved chunk, based on what was in the original document.
  • Simple retrieval method retrieves all relevant passages from the index documents and ranks them according to relevancy against the question. The highest-ranked document is presented as the answer.

Window retrieval can be a more efficient choice for queries against a relatively small collection of documents. Simple retrieval can produce more accurate results for queries against a larger collection.

Foundation models to include

Edit the list of foundation models that AutoAI can consider for generating the RAG pipelines. For each model, you can click Model details to view or export details about the model, including a description of the intended use.

For the list of available foundation models along with descriptions, see Foundation models by task.

Max RAG patterns to complete

You can specify the number of RAG patterns to complete, up to a maximum of 20. A higher number provides more patterns to compare, but consumes more compute resources.

Indexing settings

View or edit the settings for creating the text vector database from the document collection.

Embedding models

Embedding models are used in retrieval-augmented generation solutions for encoding text data as vectors to capture the semantic meaning of natural language strings. The vectorized input data can be used to retrieve similar data from the indexed document collection to generate output text. Edit the list of embedding models that AutoAI can consider when the experiment is running.

For a list of embedding models available for use with AutoAI RAG experiments, see Supported embedding models available with watsonx.ai.

Additional information

Review the watsonx.ai Runtime instance used for this experiment and the environment definition.

Learn more

Retrieval-Augmented Generation (RAG)

Parent topic: Creating a RAG experiment

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more