Running Evaluations
Once you have defined your testsets, applications, and evaluators, you can run evaluations using the aevaluate() function. This function executes your application on test data and scores the outputs using your evaluators.
Basic Usage
The aevaluate() function requires three inputs:
from agenta.sdk.evaluations import aevaluate
result = await aevaluate(
testsets=[testset.id],
applications=[my_application],
evaluators=[my_evaluator],
)
Required Parameters:
testsets: A list of testset IDs or testset dataapplications: A list of application functions or IDsevaluators: A list of evaluator functions or IDs
The function runs each test case through your application and evaluates the output with all specified evaluators.
Passing Testsets
You can provide testsets in two ways:
Using testset IDs:
# Create a testset first
testset = await ag.testsets.acreate(
name="My Test Data",
data=[
{"input": "Hello", "expected": "Hi"},
{"input": "Goodbye", "expected": "Bye"},
],
)
# Use the ID in aevaluate
result = await aevaluate(
testsets=[testset.id],
applications=[my_app],
evaluators=[my_eval],
)
Using inline data:
# Pass test data directly
result = await aevaluate(
testsets=[
[
{"input": "Hello", "expected": "Hi"},
{"input": "Goodbye", "expected": "Bye"},
]
],
applications=[my_app],
evaluators=[my_eval],
)
When you pass inline data, Agenta automatically creates a testset for you.