tencent cloud

TDSQL Boundless

Creating a Partitioned Table

ダウンロード
フォーカスモード
フォントサイズ
最終更新日: 2026-05-26 17:37:40

Overview

The Partition table is the core mechanism for TDSQL Boundless to handle massive data and achieve horizontal scaling. Unlike a single table, data is distributed across multiple storage nodes in a cluster through a Partition Key. This enables the database to leverage the collective computing power and storage space of the cluster, overcoming the performance limitations of a single machine.

When Should Partitioned Tables Be Used?

Consider using a partition table if your business meets any of the following characteristics:
1. Large data volume or high-concurrency writes: The system must support extremely high write throughput (TPS), as a single machine cannot handle the load.
2. Unbounded data growth: The business is in a phase of rapid expansion, making it impossible to predict the future data ceiling.

Select a Partitioned Strategy

TDSQL Boundless supports the following partitioning policies. Selecting the correct policy is the most critical step in creating a table.
1. Partition using HASH
HASH partitioning
Use case: This is the most recommended default choice. It is suitable for scenarios where the partition key is an integer (INT / BIGINT), such as a user ID or a purely numeric order number.
Advantages
2. Partition using KEY
KEY partitioning: It is similar to HASH but supports multiple data types except BLOB / TEXT. The system uses an internal hash function to process data.
Use case: This is suitable for scenarios where the partition key is a string (VARCHAR / CHAR), such as a UUID, ID card number, or business serial number.
Advantages
3. RANGE partitioning
RANGE partitioning
Use case
Advantages: Range queries are fast, and expired historical data can be deleted in seconds using DROP PARTITION.
4. LIST partitioning
LIST partitioning
Use case
Advantages
Risk
5. Secondary Partitioning
Introduction: It further subdivides each existing primary partition. To align with the MySQL 8.0 specification, only RANGE or LIST partitioning is supported for primary partitions, and only HASH or KEY partitioning is supported for secondary partitions.
Applicable Scenarios: The data volume is extremely large and involves multi-dimensional query and management requirements. For example, you can first create a primary partition by time (RANGE) to facilitate historical data cleanup, and then create a secondary partition by user ID (HASH/KEY) to distribute hot data and improve concurrent write performance.
Advantages: It combines the management convenience of RANGE/LIST partitioning with the load balancing capability of HASH/KEY partitioning.
Risk: The total number of partitions is "primary × secondary". An excessive number of partitions will drastically increase the system's metadata management overhead, consume a large amount of memory and file handles, and slow down DDL operations. It is recommended to keep the total number of partitions per table under 8192.

Syntax and Example

Scenario A: Standard E-commerce Orders (Using HASH)

The partition key is a purely numeric ID.
CREATE TABLE `mall`.`orders` (
`order_id` bigint NOT NULL,
`user_id` bigint NOT NULL,
`amount` decimal(10,2),
`create_time` datetime DEFAULT CURRENT_TIMESTAMP,
-- The primary key must include the partition key
PRIMARY KEY (`order_id`)
)
PARTITION BY HASH(`order_id`)
PARTITIONS 64;

Scenario B: User Center (Using KEY)

The partition key is a UUID string. KEY partitioning automatically handles the hashing.
CREATE TABLE `user_center`.`user_profiles` (
`user_uuid` varchar(36) NOT NULL,
`nick_name` varchar(50),
`register_time` datetime,
PRIMARY KEY (`user_uuid`)
)
PARTITION BY KEY(`user_uuid`)
PARTITIONS 64;

Scenario C: Operation Logs (Monthly RANGE Partitioning)

It is suitable for scenarios involving time-series data such as operational logs, transaction records, and monitoring data, and where periodic time-based cleanup is required.
TDSQL Boundless currently supports the following partitioning types: RANGE, RANGE COLUMNS, LIST, LIST COLUMNS, HASH, and KEY.
Note:
For time-series scenarios, note: "automatic pre-creation of future partitions" is not supported. When creating a table, you must predefine partitions for a future period and periodically pre-create new partitions through Ops means (external scheduling tasks).

Creating a Table: Predefined RANGE Partitioning

CREATE TABLE `sys`.`audit_logs` (
`log_id` bigint NOT NULL,
`content` text,
`log_time` datetime NOT NULL,
PRIMARY KEY (`log_id`, `log_time`)
)
PARTITION BY RANGE COLUMNS(`log_time`) (
PARTITION p202601 VALUES LESS THAN ('2026-02-01'),
PARTITION p202602 VALUES LESS THAN ('2026-03-01'),
PARTITION p202603 VALUES LESS THAN ('2026-04-01')
);
Note:
It is not recommended to use PARTITION p_future VALUES LESS THAN (MAXVALUE) as a catch-all partition.
MAXVALUE is a static catch-all partition and does not automatically split on a monthly basis over time. Once data is written that exceeds the upper bound of the last partition, all new data will continuously flow into p_future, resulting in:
p_future expands indefinitely and loses the ability to be cleaned up on a monthly basis;
Subsequently, if you attempt to ADD PARTITION p202304, the operation will fail directly because MAXVALUE already occupies the end position. You must first REORGANIZE PARTITION p_future, which is a complex and costly operation.
Recommended practice: Predefine partitions for the next 3 to 6 months when creating a table, and use a scheduled task to ADD PARTITION in advance at the end of each month to pre-create the partition for the upcoming month.

Common Maintenance Operations for RANGE Partitions

Scenarios
SQL Sample
Description
Deleting expired partitions
ALTER TABLE sys.audit_logs DROP PARTITION p202601;
Cleans up expired data in seconds, with performance far superior to DELETE.
Clearing partition data
ALTER TABLE sys.audit_logs TRUNCATE PARTITION p202601;
Retains the partition structure and only clears the data.
Pre-creating partitions for the next month
ALTER TABLE sys.audit_logs ADD PARTITION (PARTITION p202604 VALUES LESS THAN ('2026-05-01'));
Can only be added at the end, and the values must be strictly increasing.
Splitting the MAXVALUE partition
ALTER TABLE sys.audit_logs REORGANIZE PARTITION p_future INTO (PARTITION p202603 VALUES LESS THAN ('2026-04-01'), PARTITION p_future VALUES LESS THAN (MAXVALUE));
A remedial measure after MAXVALUE is misused
Viewing the partition list
SELECT PARTITION_NAME, PARTITION_DESCRIPTION, TABLE_ROWS FROM information_schema.PARTITIONS WHERE TABLE_SCHEMA='sys' AND TABLE_NAME='audit_logs';
Querying the current partition status and data volume

Recommended Solution: Periodic Pre-creation of Partitions Externally

Phase
Recommended Solution
Description
Pre-creating future partitions
Executing ALTER TABLE ... ADD PARTITION via an external scheduling task.
The system does not automatically expand partitions based on time; it must be triggered periodically by an external source.

Scenario D: Distribution by Region (Using LIST)

The business is explicitly partitioned by city, and data is isolated across different cities.
CREATE TABLE `crm`.`customers` (
`cust_id` bigint NOT NULL,
`city_id` int NOT NULL,
`name` varchar(50),
PRIMARY KEY (`cust_id`, `city_id`)
)
PARTITION BY LIST(`city_id`) (
PARTITION p_bj VALUES IN (10),
PARTITION p_sh VALUES IN (20),
PARTITION p_others VALUES IN (30)
);

Scenario E: High-Volume Historical Transaction Table (Using RANGE + HASH Subpartitioning)

It is suitable for scenarios that require both periodic time-based cleanup of historical data and the ability to handle extremely high concurrent writes, necessitating further distribution of data within a single month.
The primary partition uses RANGE partitioning by month, and the secondary partition uses HASH partitioning by user ID.
CREATE TABLE `sys`.`massive_logs` (
log_id bigint NOT NULL,
user_id bigint NOT NULL,
content text,
log_time datetime NOT NULL,
PRIMARY KEY (log_id, log_time, user_id))
PARTITION BY RANGE COLUMNS(log_time)
SUBPARTITION BY HASH(user_id)
SUBPARTITIONS 4
(PARTITION p202601 VALUES LESS THAN ('2026-02-01'),
PARTITION p202602 VALUES LESS THAN ('2026-03-01'),
PARTITION p202603 VALUES LESS THAN ('2026-04-01')
);

Key Constraints

Partitioned tables have a mandatory hard constraint that must be adhered to: violating this constraint will result in table creation failure.
The primary key and unique index must include the partition key.
Error example:
-- Error: The partition key user_id is not included in the primary key.
CREATE TABLE `error_table` (
`id` bigint NOT NULL PRIMARY KEY,
`user_id` bigint NOT NULL
) PARTITION BY HASH(`user_id`);
Correct example:
-- Correct: Add user_id to the composite primary key.
CREATE TABLE `correct_table` (
`id` bigint NOT NULL,
`user_id` bigint NOT NULL,
PRIMARY KEY (`id`, `user_id`)
) PARTITION BY HASH(`user_id`);

Best Practices

1. Number of partitions
Strategy: When performing pre-sharding, do not consider only the current number of physical nodes. You must also account for the future cluster scale and the projected total data volume.
Recommended value: Set it to "maximum expected future number of nodes × 2" to ensure adequate data dispersion.
2. Avoid modifying the partition key.
Avoid modifying the partition key value with UPDATE statements, as this can cause data rows to migrate between physical nodes (via DELETE + INSERT operations), which is a high-cost operation.
3. Query Optimization
Include the partition key: When querying, try to include the partition key in the WHERE clause (for example, WHERE user_id = ?). This allows TDSQL Boundless to route directly to a specific node, resulting in extremely high performance.
Avoid full table scans: Queries without a partition key trigger broadcast scans (Scatter-Gather), which query all nodes and aggregate results, consuming significant resources.

FAQs

What Is the Difference Between HASH and KEY Partitioning?

HASH primarily handles integers, and its algorithm is simple and straightforward. KEY primarily handles strings (it also supports integers) and internally uses a complex hash function. If the ID is a string, use KEY partitioning directly.

Can Multiple Columns Be Used as Partition Keys?

Yes, for example, PARTITION BY HASH(col1 + col2) or KEY(col1, col2). However, this increases complexity. It is generally recommended to use a single high-cardinality column as the partition key.

Can Partition Keys or Partition Types Be Modified After Table Creation?

Strongly not recommended. Although syntactically supported, this operation essentially performs a full data "reshuffle", which consumes significant system resources (IO and CPU) and may cause prolonged service blockage.

Will a Partition Be Automatically Created Next Month for RANGE Partitioning with MAXVALUE as the Fallback?

No. The MAXVALUE partition is a static catch-all partition. New data will only flow into this partition and is not automatically split on a monthly basis. The TDSQL Boundless system does not currently support automatically pre-creating future partitions. You need to periodically execute ALTER TABLE ... ADD PARTITION via an external scheduled task to pre-create partitions for the next time period.

Can the Partitioning Policy of a Partition Table Be Modified?

Yes, you can remove partitioning and redefine it via ALTER TABLE ... REMOVE PARTITIONING, or directly rebuild the partitioning policy using ALTER TABLE ... PARTITION BY RANGE COLUMNS(...) (...). Note: Rebuilding partitions involves data reorganization. For large tables, evaluate the cost and perform this operation during off-peak business hours.

What Happens When Data Exceeds the Upper Bound of the Last Partition?

If a MAXVALUE catch-all partition is not defined when a table is created, a write operation will report error ER_NO_PARTITION_FOR_GIVEN_VALUE, with the message "Table has no partition for value xxx". This is expected behavior. It serves as a catch-all alarm signal for "forgetting to pre-create partitions", preventing data from silently falling into the MAXVALUE partition and causing subsequent cleanup difficulties.

Related Documentation

ヘルプとサポート

この記事はお役に立ちましたか?

フィードバック