Product Introduction

Overview

Product Pricing

Benefits to Customers

Use Cases

Purchase Guide

Billing Overview

Purchase Mode

Renewal Instructions

Overdue Payment Instructions

Security Compliance

Data Security Protection Mechanism

Monitoring, Auditing, and Logging

Security Compliance Qualifications

Quick Start

Platform Usage Preparation

Operation Guide

Model Hub

Task-Based Modeling

Dev Machine

Model Management

Model Evaluation

Online Services

Resource Group Management

Managing Data Sources

Tikit

GPU Virtualization

Practical Tutorial

Deploying and Reasoning of LLM

LLM Training and Evaluation

Built-In Training Image List

Custom Training Image Specification

Angel Training Acceleration Feature Introduction

Implementing Resource Isolation Between Sub-users Based on Tags

API Documentation

History

Introduction

API Category

Making API Requests

Online Service APIs

Data Types

Error Codes

Related Agreement

Service Level Agreement

Data Processing And Security Agreement

Open-Source Software Information

Built-In Large Model Training Data Format Description

PDF

Focus Mode

Font Size

Last updated: 2025-09-26 21:12:13

Overview
Task-Based Modeling - The built-in large model fine-tuning module uses platform training templates. Users only need to prepare the training dataset for fine-tuning to launch training with one click. This document aims to detail the format requirements for supervised fine-tuning and pretraining.
Note:
The platform supports dataset files in jsonl format (each line in a jsonl file represents a group of sample data), with UTF-8 encoding only. Multiple files in the training data path will be merged for training.
Supervised Fine-Tuning (SFT) Data
The messages in the data sample describe multi-role dialogue and interaction. In multi-round dialogue, the conversation must end with the assistant. Fillable roles include:
Role (Role)
Required or Not
Description
Field
system
No
System character design
Character design info
user
Yes
User-submitted input
User input content
assistant
Yes
Model response
Model response content
Single-round dialogue dataset sample example
{"messages": [{"role": "system", "content": "You are the TIONE intelligent Q&A robot, capable of understanding user questions and providing fluent answers"}, {"role": "user", "content": "Read the following text and summarize it in one sentence.\\nOn Friday, a federation judge ruled in favor of a former University of California Los Angeles basketball star who sued to terminate NCAA control over university athletes' name, image and portrait rights."}, {"role": "assistant", "content": "The judge supported a former UCLA basketball star challenging NCAA control over athlete information."}]}
Multi-round dialogue dataset sample example
{"messages": [{"role": "system", "content": "You are the TIONE intelligent Q&A robot, capable of understanding user questions and providing fluent answers"}, {"role": "user", "content": "Read the following text and summarize it in one sentence.\\nOn Friday, a federation judge ruled in favor of a former University of California Los Angeles basketball star who sued to terminate NCAA control over university athletes' name, image and portrait rights."}, {"role": "assistant", "content": "The judge supported a former UCLA basketball star challenging NCAA control over athlete information."}, {"role": "user", "content": "Read the following text and summarize it in one sentence.\\nPresident Obama apologized on Thursday to Americans who lost their insurance plans under his federal health law, despite his repeated assurances they could keep their coverage if they wanted.\\nIn an exclusive interview with NBC News, he said: 'I'm sorry they found themselves in this situation based on my assurances.'"}, {"role": "assistant", "content": "Obama expressed regret in an NBC interview."}]}
Dataset with Thought Chain
The messages in the data sample describe multi-role dialogue and interaction. In the response with Role=assistant, the thinking content is surrounded by <think> and </think>. In multi-round dialogue, the conversation must end with the assistant. Fillable roles include:
Role (Role)
Required or Not
Description
Field
system
No
System character design
Character design info
user
Yes
User-submitted input
User input content
assistant
Yes
The model's response, with thinking content surrounded by <think> and </think>.
Model response content
Sample example
{"messages": [{"role": "system", "content": "You are the TIONE intelligent Q&A robot, capable of understanding user questions and providing fluent answers"}, {"role": "user", "content": "Read the following text and summarize it in one sentence.\\nOn Friday, a federal judge ruled in favor of a former UCLA basketball star who sued to terminate the NCAA's control over college athletes' names, images, and portrait rights."}, {"role": "assistant", "content": "<think>The user asked me to read the above text and summarize it in one sentence. I need to find the subject, object, and predicate of this event, identify suitable modifiers to limit it, and connect them to form a summarized sentence.</think>A judge supported a former UCLA basketball star challenging the NCAA's control over athletes' info."}]}
Pay attention, the thought chain requirements for the Hunyuan series models vary slightly in format, as follows:
The messages in the data sample describe multi-role dialogue and interaction. In the response with Role=assistant, the thinking content is surrounded by <think> and </think>. In multi-round dialogue, the conversation must end with the assistant. The last round of dialogue should be enclosed by <answer>\\n and \\n</answer>, with a line break \\n separating the thinking content and answer content. Fillable roles include:
Role (Role)
Required or Not
Description
Field
system
No
System character design
Character design info
user
Yes
User-submitted input
User input content
assistant
Yes
The model's response; the last round of response has thinking content surrounded by <think>\\n and \\n</think>, and the answer content is enclosed by <answer>\\n and \\n</answer>.
Model response content
Sample example
{"messages": [{"role": "system", "content": "You are the TIONE intelligent Q&A bot, capable of understanding user questions and providing fluent answers"}, {"role": "user", "content": "Read the following text and summarize it in one sentence.\\nOn Friday, a federal judge ruled in favor of a former UCLA basketball star who sued to terminate NCAA's control over college athletes' names, images, and portrait rights."}, {"role": "assistant", "content": "<think>\\nThe user asks me to read the above text and summarize it in one sentence. I need to identify the subject, object, and predicate of this event, find suitable modifiers to limit it, and connect them to form a summarized sentence.\\n</think>\\n<answer>\\nA judge supported a former UCLA basketball star challenging the NCAA's control over athlete information.\\n</answer>"}]}
Dataset with Tool Calls
The messages in the data sample describe multi-role dialogue and interaction. Tools use json format strings to define callable model tools. Please note the tool call dataset should be used for models supporting tool calls (such as Qwen3). In multi-round dialogue, the conversation must end with the assistant or tool_call. Fillable roles include:
Role (Role)
Required or Not
Description
Field
system
No
System character design
Character design info
user
Yes
User-submitted input must appear in odd-numbered entries (starting from 1) except for system character design.
User input content
assistant
Yes
Model responses must appear in even-numbered entries (starting from 1) except for system character design.
Model response content
tool_call
Yes
Tool calls must appear in even-numbered entries (starting from 1) except for system character design.
Call the tool name, parameter
tool
Yes
SDK API call results must appear in odd-numbered entries (starting from 1) except for system character design.
Call returned results
Sample example ending with assistant
{"messages": [{"role": "system", "content": "You are TIONE's intelligent Q&A bot, capable of understanding user questions and providing fluent answers"}, {"role": "user", "content": "Please answer the following question: Xiao Li is 25 this year, and Xiao Wang is 20. What is the sum of their ages"}, {"role": "tool_call", "content": "{\\"name\\":\\"calculate_age_sum\\", \\"arguments\\": {\\"age_1\\": 25, \\"age_2\\": 20}}"}, {"role": "tool", "content": "/no_think{\\"age_sum\\": 45}"}, {"role": "assistant", "content": "<think>\\n\\n</think>\\n<answer>\\nAccording to my calculation, the sum of Xiao Li and Xiao Wang's ages is 45\\n</answer>"}], "tools": "[{\\"name\\": \\"calculate_age_sum\\", \\"description\\": \\"Calculate and return the sum of two people's ages\\", \\"parameters\\": {\\"type\\": \\"object\\", \\"properties\\": {\\"age_1\\": {\\"type\\": \\"number\\", \\"description\\": \\"The first person's age\\"}, \\"age_2\\": {\\"type\\": \\"number\\", \\"description\\": \\"The second person's age\\"}}, \\"required\\": [\\"age_1\\", \\"age_2\\"]}}]"}
Sample example ending with tool_call
{"messages": [{"role": "system", "content": "You are TIONE's intelligent Q&A bot, capable of understanding user questions and providing fluent answers"}, {"role": "user", "content": "/no_think Please answer the following question: Xiao Li is 25 this year, and Xiao Wang is 20. What is the sum of their ages"}, {"role": "tool_call", "content": "<think>\\n\\n</think>\\n<answer>\\n{\\"name\\":\\"calculate_age_sum\\", \\"arguments\\": {\\"age_1\\": 25, \\"age_2\\": 20}}\\n</answer>"}], "tools": "[{\\"name\\": \\"calculate_age_sum\\", \\"description\\": \\"Calculate and return the sum of two people's ages\\", \\"parameters\\": {\\"type\\": \\"object\\", \\"properties\\": {\\"age_1\\": {\\"type\\": \\"number\\", \\"description\\": \\"The first person's age\\"}, \\"age_2\\": {\\"type\\": \\"number\\", \\"description\\": \\"The second person's age\\"}}, \\"required\\": [\\"age_1\\", \\"age_2\\"]}}]"}
Tool call content description:
Tool description tools: (Unfold results)
"tools": "[
  {
      \\"name\\": \\"calculate_age_sum\\", 
      "description": "calculate and return the sum of their ages"
      \\"parameters\\": {
          \\"type\\": \\"object\\", 
          \\"properties\\": {
              \\"age_1\\": {
                  \\"type\\": \\"number\\", 
                  "description": "the first person's age"
              },
              \\"age_2\\": {
                  \\"type\\": \\"number\\", 
                  "description": "the second person's age"
              }
          }, 
          \\"required\\": [\\"age_1\\", \\"age_2\\"]
      }
  }
]"  
In the dialogue, tool call Role=tool_call: (unfold result)
{
  "role": "tool_call", 
  "content": "{
      \\"name\\":\\"calculate_age_sum\\", 
      \\"arguments\\": {
          \\"age_1\\": 25, 
          \\"age_2\\": 20
      }
  }"
}
In the dialogue, tool call result Role=tool: (unfold result)
{
  "role": "tool", 
  "content": "{
      \\"age_sum\\": 45
  }"
}
Pretraining (PT) Data
Describe unlabeled text from the data sample's text.
Sample example
{"text": "The layout of Luzhen's taverns differs from others: each has a large L-shaped counter facing the street, with hot water ready inside to warm wine at any time. Laborers, after finishing work at noon or evening, often spend four coppers on a bowl of wine—this was twenty years ago, now it costs ten—stand by the counter, drink it hot, and rest. For one more copper, they can get a dish of salted bamboo shoots or fennel beans to go with the wine. Spending over ten coppers buys a meat dish, but most customers are short-coated laborers who rarely afford such luxury. Only those in long gowns stroll into the inner room, order wine and dishes, and sit drinking leisurely."}
{"text": "From the age of twelve, I worked as a waiter at the Xianheng Tavern by the town gate. The boss said I looked too foolish to serve long-gowned customers, so I was assigned tasks outside. The short-coated customers, though easier to deal with, often rambled on endlessly. They would insist on watching the yellow wine being ladled from the vat, checking if the pot had water at the bottom, and seeing it placed in hot water before feeling confident. Under such serious supervision, diluting the wine was nearly impossible. Therefore, after a few days, the boss said I couldn’t handle the job. Luckily, the recommender’s influence was strong, so I wasn’t dismissed but changed to the dull task of warming wine exclusively."}
Custom Format
In addition to placing data files in the training data path that meet the description format mentioned above, you can also prepare a dataset_info.json according to the LLaMA-Factory dataset format to specify the training dataset. When dataset_info.json already exists in your training data path, the platform will directly read all datasets specified in this description file and only these datasets.
Example:
Training data directory (for example, /opt/ml/input/data/train)
/opt/ml/input/data/train
├── dataset1.json
└── dataset_info.json
training data dataset1.json
[
    {
        "messages": [
            {
                "role": "system", 
                "content": "You are the TIONE intelligent Q&A robot, capable of understanding user questions and providing fluent answers"
            }, 
            {
                "role": "user", 
                "content": "Read the following text and summarize it in one sentence.\\nText: (CNN) - A federal judge ruled in favor of a former UCLA basketball star on Friday, who sued to terminate NCAA's control over college athletes' name, image, and portrait rights."
            }, 
            {
                "role": "assistant", 
                "content": "The judge supported a former UCLA basketball star who challenged NCAA's information control."
            }
        ]
    }, 
    {
        "messages": [
            {
                "role": "system", 
                "content": "You are the TIONE intelligent Q&A robot, capable of understanding user questions and providing fluent answers"
            }, 
            {
                "role": "user", 
                "content": "Read the following text and summarize it in one sentence.\\nText: President Obama apologized on Thursday to Americans who lost their insurance plans due to the federal health law he supported, despite his repeated claims that they could retain their coverage if they wished."
            },
            {
                "role": "assistant", 
                "content": "Obama expressed regret in an interview with NBC about the cancellation of the insurance plan."
            }
        ]
    }
]
﻿
Dataset description file dataset_info.json
{
    "dataset1": {
        "file_name": "dataset1.json",
        "formatting": "sharegpt",
        "columns": {
            "messages": "messages",
        },
        "tags": {
            "role_tag": "role",
            "content_tag": "content",
            "user_tag": "user",
            "assistant_tag": "assistant"
        }
    },
}
﻿
﻿

Help and Support

Was this page helpful?

You can also Contact sales or Submit a Ticket for help.

Help us improve! Rate your documentation experience in 5 mins.

Feedback

tencent cloud

Tencent Cloud TI Platform

Built-In Large Model Training Data Format Description

Overview

Supervised Fine-Tuning (SFT) Data

Dataset with Thought Chain

Dataset with Tool Calls

Pretraining (PT) Data

Custom Format

Help and Support

Role (Role)	Required or Not	Description	Field
system	No	System character design	Character design info
user	Yes	User-submitted input	User input content
assistant	Yes	Model response	Model response content