0 / 0
Creating a RAG experiment (fastpath) (Beta)
Last updated: Dec 12, 2024
Creating a RAG experiment (fastpath) (Beta)

Create a Retrieval-augmented generation (RAG) experiment by using AutoAI. Upload a collection of documents and transform them into vectors that can be used to improve the output from a large language model. Compare optimized pipelines to select the best RAG pattern for your application.

Important: This feature is a beta release. While this feature is in beta, there is no charge for running the experiment, and no tokens are consumed. However, calls to RAG patterns and their derivatives that are done after the experiment completes consume resources at the standard rate.

Preparing your data sources

Before you create a RAG experiment, prepare your test data and evaluation data assets. The test data is the document collection that provides context for the answers to prompt input. The evaluation data is a JSON file with sample questions and answers to use for measuring the performance of the RAG patterns.

To prepare your data collection, follow these guidelines:

  • Supported formats for the document collection: PDF, HTML, DOCX, plain text
  • Supported format for the evaluation data file: JSON

Template for the JSON evaluation file

The evaluation data file provides a series of sample questions and correct answers to evaluate the performance of the RAG pattern. Use this format for the JSON file:

[
    {
        "question": "<text>",
        "correct_answer": "<text>",
        "correct_answer_document_ids": [
            "<file>",
            "<file>"
        ]
    },
    {
        "question": "<text>",
        "correct_answer": "<text>",
        "correct_answer_document_ids": [
            "<file>",
            "<file>"
        ]
    },
    {
        "question": "<text>",
        "correct_answer": "<text>",
        "correct_answer_document_ids": [
            "<file>",
            "<file>"
        ]
    }
]

For example, the following are sample questions and answers for the pattern that is trained with the watsonx.ai Python library documentation.

[
    {
        "question": "What foundation models are available in watsonx.ai?",
        "correct_answer": "The following models are available in watsonx.ai: \nflan-t5-xl-3b\nFlan-t5-xxl-11b\nflan-ul2-20b\ngpt-neox-20b\ngranite-13b-chat-v2\ngranite-13b-chat-v1\ngranite-13b-instruct-v2\ngranite-13b-instruct-v1\nllama-2-13b-chat\nllama-2-70b-chat\nmpt-7b-instruct2\nmt0-xxl-13b\nstarcoder-15.5b",
        "correct_answer_document_ids": [
            "5B37710FE7BBD6EFB842FEB7B49B036302E18F81_0.txt"
        ]
    },
    {
        "question": "What foundation models are available on Watsonx, and which of these has IBM built?",
        "correct_answer": "The following foundation models are available on Watsonx:\n\n1. flan-t5-xl-3b\n2. flan-t5-xxl-11b\n3. flan-ul2-20b\n4. gpt-neox-20b\n5. granite-13b-chat-v2 (IBM built)\n6. granite-13b-chat-v1 (IBM built)\n7. granite-13b-instruct-v2 (IBM built)\n8. granite-13b-instruct-v1 (IBM built)\n9. llama-2-13b-chat\n10. llama-2-70b-chat\n11. mpt-7b-instruct2\n12. mt0-xxl-13b\n13. starcoder-15.5b\n\n The Granite family of foundation models, including granite-13b-chat-v2, granite-13b-chat-v1, and granite-13b-instruct-v2 has been build by IBM.",
        "correct_answer_document_ids": [
            "5B37710FE7BBD6EFB842FEB7B49B036302E18F81_0.txt",
            "B2593108FA446C4B4B0EF5ADC2CD5D9585B0B63C_0.txt"
        ]
    },
    {
        "question": "What is greedy decoding?",
        "correct_answer": "Greedy decoding produces output that closely matches the most common language in the model's pretraining data and in your prompt text, which is desirable in less creative or fact-based use cases. A weakness of greedy decoding is that it can cause repetitive loops in the generated output.",
        "correct_answer_document_ids": [
            "42AE491240EF740E6A8C5CF32B817E606F554E49_1.txt"
        ]
    },
    {
        "question": "When to tune a foundation model?",
        "correct_answer": "Tune a foundation model when you want to do the following things:\n\nReduce the cost of inferencing at scale\nGet the model's output to use a certain style or format\nImprove the model's performance by teaching the model a specialized task\nGenerate output in a reliable form in response to zero-shot prompts\"",
        "correct_answer_document_ids": [
            "FBC3C5F81D060CD996489B772ABAC886F12130A3_0.txt"
        ]
    },
    {
        "question": "What tuning parameters are available for IBM foundation models?",
        "correct_answer": "Tuning parameter values for IBM foundation models:\nInitialization method\ninitialization text\nbatch_size\naccumulate_steps\nlearning_rate\nnum_epochs\"",
        "correct_answer_document_ids": [
            "51747F17F413F1F34CFD73D170DE392D874D03DD_2.txt"
        ]
    },
    {
        "question": "How do I avoid generating personal information with foundation models?",
        "correct_answer": "To exclude personal information, try these techniques:\n- In your prompt, instruct the model to refrain from mentioning names, contact details, or personal information.\n- In your larger application, pipeline, or solution, post-process the content that is generated by the foundation model to find and remove personal information.\"",
        "correct_answer_document_ids": [
            "E59B59312D1EB3B2BA78D7E78993883BB3784C2B_4.txt"
        ]
    },
    {
        "question": "What is Watson OpenScale?",
        "correct_answer": "Watson OpenScale is a tool that helps organizations evaluate and monitor the performance of their AI models. It tracks and measures outcomes from AI models, and helps ensure that they remain fair, explainable, and compliant no matter where the models were built or are running. Watson OpenScale also detects and helps correct the drift in accuracy when an AI model is in production.",
        "correct_answer_document_ids": [
            "777F72F32FD20E96C4A5F0CCA461FE9A79334E96_0.txt"
        ]
    }
]

Choosing a vector store

You must provide a location for storing the vectorized documents in a database that is used to store and retrieve content for the question and answer process. For details about the available database options, see Choosing a vector store.

  • The default in-memory Chroma database is a temporary vector store for running the experiment. The index does not persist beyond the experiment, so it is not a good choice for production use.
  • Connect to or set up a Milvus database if you want a permanent vector store. Use this option if you plan to deploy your RAG pattern. For details, see Setting up a watsonx.data Milvus vector store
Important:

To connect to a Milvus vector store, you can choose the generic Milvus connector type, or the watsonx.data Milvus connector.

Watch this video to see how to create an AutoAI RAG experiment.

This video provides a visual method to learn the concepts and tasks in this documentation.

Creating the AutoAI RAG experiment

Follow these steps to define and run an experiment to search for the optimal RAG pattern for your use case, by using the default configuration settings as the fastpath.

  1. From the watsonx.ai welcome page or from the New assets page for a project, click Automatically build AI solutions.
  2. Select Build a RAG solution as the type of experiment.
  3. Upload or connect to the document collection and evaluation data. Select up to 20 document folders and files for the document collection. The evaluation data file must be a single JSON file. Connecting to data for a RAG experiment
  4. Choose where to store the vector-based index for the document collection.
  5. Specify the JSON file that you want to use as benchmark data to evaluate the experiment results.
  6. Click Run experiment to create the RAG pipelines by using the default settings:
    • Optimized metric: Optimizes the creation of the RAG patterns for the Answer faithfulness metric.
    • Models to consider: The default of All model types considers all available foundation models for generating the RAG patterns.

The experiment settings are configurable if you want to customize the experiment for your use case. See Customizing RAG experiment settings.

Viewing the results

Use the following tools to view progress and review the results.

  1. As the experiment runs, a progress map provides a visualization of how the pipelines are created and optimized. Hover over any node for more detail.

    Viewing the progress map for a RAG experiment in progress

  2. When the experiment completes, review the leaderboard that shows the experiment pipelines, ranked according to results for the optimized metric.

    Pipeline leaderboard for a RAG experiment

  3. Click a pipeline name to review the details. Review how the pipeline scored for the various metrics as evaluated against the sample question and answer.

    Pipeline details for a RAG experiment

  4. When your analysis is complete, click Save to automatically generate notebook assets that you can use to test and use the RAG pattern.

    Saving a RAG pipeline

  5. Review and run the resulting notebook or notebooks to test or use your RAG pattern. For details, see Reviewing the RAG notebooks.

Next steps

Parent topic: Building RAG experiments with AutoAI

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more