Test Sets
Test Sets are collections of example data used to evaluate the performance of your RAG system. They provide a standardized set of inputs, LLM-generated resposnes, and, if available, expected outputs to assess how well your RAG system generates responses. Each example within a Test Set is called a Test Case.
Test Sets can be created from various sources, such as production data, previous evaluations, or manually curated examples. In addition to evaluation, Test Sets can be used to assess the performance and scalability of your prompt templates across diverse example data points.
Example
The example below is a Test Set as viewed in the RAG Workbench UI. This Test Set was created from production data. It includes input questions, generated outputs, and associated traces. Each row represents an individual Test Case.
Create a Test Set
You can create a Test Set from existing traces, implicitly when running evaluations, and manually using the SDK.
From Existing Traces
Creating a Test Set from existing traces is a simple process when you have set up tracing for your RAG system using the LastMile Tracing SDK. Follow these steps:
- Launch the RAG Workbench UI by running
rag-debug launch
in your terminal. - Go to the 'Traces' tab, select the desired traces for your Test Set, and click 'Create Test Set'.
- Go to the 'Test Sets' tab and copy the Test Set ID for your newly created Test Set.
- Use the LastMile Eval SDK to download the Test Set and run evaluations:
trace_level_evaluators = {
"rouge1": rouge1
}
# Download Test Set using test_set_id
download_test_set(test_set_id=test_set_id, lastmile_api_token=LASTMILE_API_TOKEN)
# Run Evaluation on Test Set
run_and_store_evaluations(
test_set_id=test_set_id,
project_id=project_id,
trace_level_evaluators=trace_level_evaluators,
dataset_level_evaluators={},
lastmile_api_token=LASTMILE_API_TOKEN,
evaluation_set_name='Evaluation Run 1 - Friday Test Set',
)
The run_and_store_evaluations
function runs your specified evaluators to the Test Set, which consists of input questions and the corresponding generated outputs from your RAG system, as the Test Set is derived from pre-existing Traces.
You can view the evaluation results and the Test Set in the RAG Workbench UI under the "Evaluation Console" and "Test Sets" tabs, respectively.