Skip to main content
Version: v0.6.2

Evaluation

Get started with the Evaluation API

Create Evaluation

POST /api/v2/serve/evaluate/evaluation
DBGPT_API_KEY=dbgpt
SPACE_ID={YOUR_SPACE_ID}

curl -X POST "http://localhost:5670/api/v2/serve/evaluate/evaluation"
-H "Authorization: Bearer $DBGPT_API_KEY" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-d '{
"scene_key": "recall",
"scene_value":147,
"context":{"top_k":5},
"sys_code":"xx",
"evaluate_metrics":["RetrieverHitRateMetric","RetrieverMRRMetric","RetrieverSimilarityMetric"],
"datasets": [{
"query": "what awel talked about",
"doc_name":"awel.md"
}]
}'

Request body

Request Evaluation Object

when scene_key is app, the request body should be like this:


{
"scene_key": "app",
"scene_value":"2c76eea2-83b6-11ef-b482-acde48001122",
"context":{"top_k":5, "prompt":"942acd7e33b54ce28565f89f9b278044","model":"zhipu_proxyllm"},
"sys_code":"xx",
"evaluate_metrics":["AnswerRelevancyMetric"],
"datasets": [{
"query": "what awel talked about",
"doc_name":"awel.md"
}]
}

when scene_key is recall, the request body should be like this:


{
"scene_key": "recall",
"scene_value":"2c76eea2-83b6-11ef-b482-acde48001122",
"context":{"top_k":5, "prompt":"942acd7e33b54ce28565f89f9b278044","model":"zhipu_proxyllm"},
"evaluate_metrics":["RetrieverHitRateMetric", "RetrieverMRRMetric", "RetrieverSimilarityMetric"],
"datasets": [{
"query": "what awel talked about",
"doc_name":"awel.md"
}]
}

Response body

Return Evaluation Object List

The Evaluation Request Object


scene_key string Required

The scene type of the evaluation, e.g. support app, recall


scene_value string Required

The scene value of the evaluation, e.g. app id(when scene_key is app), space id(when scene_key is recall)


context object Required

The context of the evaluation

  • top_k int Required
  • prompt string prompt code
  • model string llm model name

evaluate_metrics array Required

The evaluate metrics of the evaluation, e.g.

  • AnswerRelevancyMetric: the answer relevancy metric(when scene_key is app)
  • RetrieverHitRateMetric: Hit rate calculates the fraction of queries where the correct answer is found within the top-k retrieved documents. In simpler terms, it’s about how often our system gets it right within the top few guesses. (when scene_key is recall)
  • RetrieverMRRMetric: For each query, MRR evaluates the system’s accuracy by looking at the rank of the highest-placed relevant document. Specifically, it’s the average of the reciprocals of these ranks across all the queries. So, if the first relevant document is the top result, the reciprocal rank is 1; if it’s second, the reciprocal rank is 1/2, and so on. (when scene_key is recall)
  • RetrieverSimilarityMetric: Embedding Similarity Metric (when scene_key is recall)

datasets array Required

The datasets of the evaluation


The Evaluation Result


prediction string

The prediction result


contexts string

The contexts of RAG Retrieve chunk


score float

The score of the prediction


passing bool

The passing of the prediction


metric_name string

The metric name of the evaluation


prediction_cost int

The prediction cost of the evaluation


query string

The query of the evaluation


raw_dataset object

The raw dataset of the evaluation


feedback string

The feedback of the llm evaluation