Evaluation
Get started with the Evaluation API
Create Evaluation
POST /api/v2/serve/evaluate/evaluation
- Curl
- Python
DBGPT_API_KEY=dbgpt
SPACE_ID={YOUR_SPACE_ID}
curl -X POST "http://localhost:5670/api/v2/serve/evaluate/evaluation"
-H "Authorization: Bearer $DBGPT_API_KEY" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-d '{
"scene_key": "recall",
"scene_value":147,
"context":{"top_k":5},
"sys_code":"xx",
"evaluate_metrics":["RetrieverHitRateMetric","RetrieverMRRMetric","RetrieverSimilarityMetric"],
"datasets": [{
"query": "what awel talked about",
"doc_name":"awel.md"
}]
}'
from dbgpt.client import Client
from dbgpt.client.evaluation import run_evaluation
from dbgpt.serve.evaluate.api.schemas import EvaluateServeRequest
DBGPT_API_KEY = "dbgpt"
client = Client(api_key=DBGPT_API_KEY)
request = EvaluateServeRequest(
# The scene type of the evaluation, e.g. support app, recall
scene_key="recall",
# e.g. app id(when scene_key is app), space id(when scene_key is recall)
scene_value="147",
context={"top_k": 5},
evaluate_metrics=[
"RetrieverHitRateMetric",
"RetrieverMRRMetric",
"RetrieverSimilarityMetric",
],
datasets=[
{
"query": "what awel talked about",
"doc_name": "awel.md",
}
],
)
data = await run_evaluation(client, request=request)
Request body
Request Evaluation Object
when scene_key is app, the request body should be like this:
{
"scene_key": "app",
"scene_value":"2c76eea2-83b6-11ef-b482-acde48001122",
"context":{"top_k":5, "prompt":"942acd7e33b54ce28565f89f9b278044","model":"zhipu_proxyllm"},
"sys_code":"xx",
"evaluate_metrics":["AnswerRelevancyMetric"],
"datasets": [{
"query": "what awel talked about",
"doc_name":"awel.md"
}]
}
when scene_key is recall, the request body should be like this:
{
"scene_key": "recall",
"scene_value":"2c76eea2-83b6-11ef-b482-acde48001122",
"context":{"top_k":5, "prompt":"942acd7e33b54ce28565f89f9b278044","model":"zhipu_proxyllm"},
"evaluate_metrics":["RetrieverHitRateMetric", "RetrieverMRRMetric", "RetrieverSimilarityMetric"],
"datasets": [{
"query": "what awel talked about",
"doc_name":"awel.md"
}]
}
Response body
Return Evaluation Object List
The Evaluation Request Object
scene_key string Required
The scene type of the evaluation, e.g. support app, recall
scene_value string Required
The scene value of the evaluation, e.g. app id(when scene_key is app), space id(when scene_key is recall)
context object Required
The context of the evaluation
- top_k int Required
- prompt string prompt code
- model string llm model name
evaluate_metrics array Required
The evaluate metrics of the evaluation, e.g.
- AnswerRelevancyMetric: the answer relevancy metric(when scene_key is app)
- RetrieverHitRateMetric: Hit rate calculates the fraction of queries where the correct answer is found within the top-k retrieved documents. In simpler terms, it’s about how often our system gets it right within the top few guesses. (when scene_key is recall)
- RetrieverMRRMetric: For each query, MRR evaluates the system’s accuracy by looking at the rank of the highest-placed relevant document. Specifically, it’s the average of the reciprocals of these ranks across all the queries. So, if the first relevant document is the top result, the reciprocal rank is 1; if it’s second, the reciprocal rank is 1/2, and so on. (when scene_key is recall)
- RetrieverSimilarityMetric: Embedding Similarity Metric (when scene_key is recall)
datasets array Required
The datasets of the evaluation
The Evaluation Result
prediction string
The prediction result
contexts string
The contexts of RAG Retrieve chunk
score float
The score of the prediction
passing bool
The passing of the prediction
metric_name string
The metric name of the evaluation
prediction_cost int
The prediction cost of the evaluation
query string
The query of the evaluation
raw_dataset object
The raw dataset of the evaluation
feedback string
The feedback of the llm evaluation