Concepts
Tracing
Tracing is the process of logging application activity for monitoring and debugging. It consists of two critical components: "traces" and "spans".
- Spans: A span is a specific unit of work within a trace and are characterized by a start and end time. Spans might also contain various optional fields such as input, output, and metadata. For example, a span for a retrieval step may have an input as a user query, output as retrieved documents, and metadata as retrieval parameters.
- Traces: A Trace is a series of spans linked to a single operation, like ingestion or retrieval. For example, a user request that triggers a series of processes in your application forms around a trace. Each trace is marked by a unique ID.
RAG Workbench provides auto-instrumentation for LangChain and LlamaIndex, and a wrapper for OpenAI. We auto-generate traces for code that uses these libraries so you don't have to worry about setting up tracing. Check out our RAG Workbench Cookbook for these examples.
Parameter Set
A Parameter Set is a dictionary of key parameters, such as chunk size or k, that define the behavior of your RAG system. These parameters serve as the adjustable knobs that you can tune and test to optimize your RAG system's performance. In RAG Workbench, the parameter set used for each evaluation run is clearly displayed. This allows for easy comparison of different evaluation runs with varying parameter sets, enabling you to identify the optimal set of parameters for your RAG system.
Evaluation Metrics
Evaluation Metrics (aka Evaluators) allow you to measure the quality of LLM-generated results. These functions can take in various inputs including the generated response, ground truth data, context, etc. and typically output a numeric score from 0 to 1.
Test Set
Test Sets are collections of data used to evaluate the performance of your RAG system. They provide a standardized set of inputs, LLM-generated outputs, and if available, ground truth answers to assess how well your RAG system generates responses. Each example within a Test Set is called a Test Case. A Test Set must contain inputs (example: user questions to your RAG System). The outputs (response from the RAG system) and the ground truth answers are optional fields.
Evaluation Set
An Evaluation Set is your test set with evaluation metrics. Evaluation sets are generated after running evaluators on a Test Set. Within the RAG Workbench UI, you can view all your Evaluation Sets in the Evaluation Console. Each row in this table represents a unique evaluation set featuring aggregate metrics for that set. Click into a Evaluation Set to see the individual evaluation metrics for each test case.
Logging
Logging allows you to annotate a Trace with important information for troubleshooting and debugging. Your log statements are tied to a Trace which makes it easy to collect and analyze important information quickly and systematically. You can use logging to understand user behavior, streamline debugging, and identify areas of improvement for your RAG system.
User Feedback
User Feedback is the feedback responses (ex. 👍/👎) obtained from end users, highlighting the performance and and areas of improvement of your application. RAG Workbench allows you to incorporate user feedback into your trace, providing valuable insights into how users perceive the performance your RAG system.