Creating a RAG experiment (fast path)

Last updated: Jun 27, 2025

Create a Retrieval-augmented generation (RAG) experiment by using AutoAI. Upload a collection of documents and transform them into vectors that can be used to improve the output from a large language model. Compare optimized pipelines to select the best RAG pattern for your application.

Preparing your data sources

Before you create a RAG experiment, prepare your document collection and evaluation data assets. The document collection provides context for the answers to prompt input. The evaluation data is a JSON file with sample questions and answers to use for measuring the performance of the RAG patterns.

To prepare your data collection, follow these guidelines:

Supported formats for the document collection: PDF, HTML, DOCX, MD, PPTX, JSON, YAML, or plain text
Supported format for the evaluation data file: JSON

Template for the JSON evaluation file

The evaluation data file provides a series of sample questions and correct answers to evaluate the performance of the RAG pattern. Use this format for the JSON file:

[
    {
        "question": "<text>",
        "correct_answer": "<text>",
        "correct_answer_document_ids": [
            "<file>",
            "<file>"
        ]
    },
    {
        "question": "<text>",
        "correct_answer": "<text>",
        "correct_answer_document_ids": [
            "<file>",
            "<file>"
        ]
    },
    {
        "question": "<text>",
        "correct_answer": "<text>",
        "correct_answer_document_ids": [
            "<file>",
            "<file>"
        ]
    }
]

To add multiple correct answers for a question, provide the answer text and the document ID for each correct answer. Use this format for multiple correct answers:

[
  {
    "question": "Question 1",
    "correct_answer": "Correct answer 1 for question 1.",
    "correct_answer_document_ids": ["id1a1q1", "id2a1q1"]
  },
  {
    "question": "Question 1",
    "correct_answer": "Correct answer 2 for question 1.",
    "correct_answer_document_ids": ["id1a2q1", "id2a2q1"]
  },
  {
    "question": "Question 2",
    "correct_answer": "Correct answer 1 for question 2.",
    "correct_answer_document_ids": ["id1a1q2", "id2a1q2"]
  }
]

For example, the following are sample questions and answers for the pattern that is trained with the watsonx.ai Python library documentation.

[
    {
        "question": "What foundation models are available in watsonx.ai?",
        "correct_answer": "The following models are available in watsonx.ai: \nflan-t5-xl-3b\nFlan-t5-xxl-11b\nflan-ul2-20b\ngpt-neox-20b\ngranite-13b-chat-v2\ngranite-13b-chat-v1\ngranite-13b-instruct-v2\ngranite-13b-instruct-v1\nllama-2-13b-chat\nllama-2-70b-chat\nmpt-7b-instruct2\nmt0-xxl-13b\nstarcoder-15.5b",
        "correct_answer_document_ids": [
            "5B37710FE7BBD6EFB842FEB7B49B036302E18F81_0.txt"
        ]
    },
    {
        "question": "What foundation models are available on Watsonx, and which of these has IBM built?",
        "correct_answer": "The following foundation models are available on Watsonx:\n\n1. flan-t5-xl-3b\n2. flan-t5-xxl-11b\n3. flan-ul2-20b\n4. gpt-neox-20b\n5. granite-13b-chat-v2 (IBM built)\n6. granite-13b-chat-v1 (IBM built)\n7. granite-13b-instruct-v2 (IBM built)\n8. granite-13b-instruct-v1 (IBM built)\n9. llama-2-13b-chat\n10. llama-2-70b-chat\n11. mpt-7b-instruct2\n12. mt0-xxl-13b\n13. starcoder-15.5b\n\n The Granite family of foundation models, including granite-13b-chat-v2, granite-13b-chat-v1, and granite-13b-instruct-v2 has been build by IBM.",
        "correct_answer_document_ids": [
            "5B37710FE7BBD6EFB842FEB7B49B036302E18F81_0.txt",
            "B2593108FA446C4B4B0EF5ADC2CD5D9585B0B63C_0.txt"
        ]
    },
    {
        "question": "What is greedy decoding?",
        "correct_answer": "Greedy decoding produces output that closely matches the most common language in the model's pretraining data and in your prompt text, which is desirable in less creative or fact-based use cases. A weakness of greedy decoding is that it can cause repetitive loops in the generated output.",
        "correct_answer_document_ids": [
            "42AE491240EF740E6A8C5CF32B817E606F554E49_1.txt"
        ]
    },
    {
        "question": "When to tune a foundation model?",
        "correct_answer": "Tune a foundation model when you want to do the following things:\n\nReduce the cost of inferencing at scale\nGet the model's output to use a certain style or format\nImprove the model's performance by teaching the model a specialized task\nGenerate output in a reliable form in response to zero-shot prompts\"",
        "correct_answer_document_ids": [
            "FBC3C5F81D060CD996489B772ABAC886F12130A3_0.txt"
        ]
    },
    {
        "question": "What tuning parameters are available for IBM foundation models?",
        "correct_answer": "Tuning parameter values for IBM foundation models:\nInitialization method\ninitialization text\nbatch_size\naccumulate_steps\nlearning_rate\nnum_epochs\"",
        "correct_answer_document_ids": [
            "51747F17F413F1F34CFD73D170DE392D874D03DD_2.txt"
        ]
    },
    {
        "question": "How do I avoid generating personal information with foundation models?",
        "correct_answer": "To exclude personal information, try these techniques:\n- In your prompt, instruct the model to refrain from mentioning names, contact details, or personal information.\n- In your larger application, pipeline, or solution, post-process the content that is generated by the foundation model to find and remove personal information.\"",
        "correct_answer_document_ids": [
            "E59B59312D1EB3B2BA78D7E78993883BB3784C2B_4.txt"
        ]
    },
    {
        "question": "What is Watson OpenScale?",
        "correct_answer": "Watson OpenScale is a tool that helps organizations evaluate and monitor the performance of their AI models. It tracks and measures outcomes from AI models, and helps ensure that they remain fair, explainable, and compliant no matter where the models were built or are running. Watson OpenScale also detects and helps correct the drift in accuracy when an AI model is in production.",
        "correct_answer_document_ids": [
            "777F72F32FD20E96C4A5F0CCA461FE9A79334E96_0.txt"
        ]
    }
]

Choosing a vector store

You must provide a location for storing the vectorized documents in a database that is used to store and retrieve content for the question and answer process. For details about the available database options, see Choosing a vector store.

The default in-memory Chroma database is a temporary vector store for running the experiment. The index does not persist beyond the experiment, so it is not a good choice for production use.
Connect to or set up a Milvus database if you want a permanent vector store. Use this option if you plan to deploy your RAG pattern. For details, see Setting up a watsonx.data Milvus vector store

Important:

To connect to a Milvus vector store, you can choose the generic Milvus connector type, or the watsonx.data Milvus connector.

Watch this video to see how to create an AutoAI RAG experiment.

This video provides a visual method to learn the concepts and tasks in this documentation.

Creating the AutoAI RAG experiment

Follow these steps to define and run an experiment to search for the optimal RAG pattern for your use case, by using the default configuration settings as the fastpath.

From the watsonx.ai welcome page or from the New assets page for a project, click Automatically build AI solutions.
Type a name and optional description for the experiment, select a configuration size, and click Create. You must also have a watsonx.ai Runtime service instance associated with the project.
Tip: If you get an error configuring an experiment, check that you have sufficient resources with your plan. If the problem persists, try rotating your IBM Cloud task credentials.
Select Build a RAG solution as the type of experiment.
Upload or connect to the document collection and evaluation data. Select up to 20 document folders and files for the document collection. The evaluation data file must be a single JSON file.
Choose where to store the vector-based index for the document collection.
Specify the JSON file that you want to use as benchmark data to evaluate the experiment results.
Click Run experiment to create the RAG pipelines by using the default settings:
- Optimized metric: Optimizes the creation of the RAG patterns for the Answer faithfulness metric.
- Models to consider: The default of All model types considers all available foundation models for generating the RAG patterns.

The experiment settings are configurable if you want to customize the experiment for your use case. See Customizing RAG experiment settings.

Viewing the results

Use the following tools to view progress and review the results.

As the experiment runs, a progress map provides a visualization of how the pipelines are created and optimized. Hover over any node for more detail.
When the experiment completes, click Setting importance to review the importance of each setting for creating and ranking the optimized patterns.
Review the leaderboard that shows the experiment pipelines, ranked according to results for the optimized metric.
Click a pipeline name to review the details. Review how the pipeline scored for the various metrics as evaluated against the sample question and answer.
When your analysis is complete, click Save to automatically generate notebook assets that you can use to test and use the RAG pattern.
Review and run the resulting notebook or notebooks to test or use your RAG pattern. For details, see Saving a RAG pattern.

Next steps

To edit configuration options, see Customizing RAG experiment settings.
To analyze a pattern, see AutoAI RAG pattern details.
To save and use a RAG pattern, see Saving a RAG pattern.

Parent topic: Building RAG experiments with AutoAI

Was the topic helpful?

0/1000