Fine-Tuning use dbgpt_hub
The DB-GPT-Hub project has released a pip package to lower the threshold for Text2SQL training. In addition to fine-tuning through the scripts provided in the warehouse, you can alse use the Python package we provide for fine-tuning.
Install
pip install dbgpt_hub
Show Baseline
from dbgpt_hub.baseline import show_scores
show_scores()
Fine-tuning
from dbgpt_hub.data_process import preprocess_sft_data
from dbgpt_hub.train import train_sft
from dbgpt_hub.predict import start_predict
from dbgpt_hub.eval import start_evaluate
Preprocessing data into fine-tuned data format.
data_folder = "dbgpt_hub/data"
data_info = [
{
"data_source": "spider",
"train_file": ["train_spider.json", "train_others.json"],
"dev_file": ["dev.json"],
"tables_file": "tables.json",
"db_id_name": "db_id",
"is_multiple_turn": False,
"train_output": "spider_train.json",
"dev_output": "spider_dev.json",
}
]
preprocess_sft_data(
data_folder = data_folder,
data_info = data_info
)