tencent cloud

Tencent Cloud TCHouse-C

Release Notes
Product Introduction
Overview
Basic Concepts
Cluster Architecture
Strengths
Use Cases
Purchase Guide
Billing Overview
Expiration and Payment Overdue
Refund
Configuration Adjustment Billing
Getting Started
Operation Guide
Cluster Management
Parameter Configuration
Monitoring and Alarming
Hot/Cold Data Tiering
Account and Authorization
Query Management
Log Search
Data Dictionary
Backup and Restore
Multi-ZooKeeper Cluster
External Data Import
Configuring DDL on Cluster Feature
Data Redistribution
Scale-in and Migration
Development Guide
Database Engine
Table Engines
ClickHouse SQL Syntax Reference
ClickHouse Client Overview
Self-Built ClickHouse Migration Solution
Service Level Agreement
CDWCH Policy
Privacy Policy
Data Privacy and Security Agreement
FAQs
Contact Us
Glossary

HDFS Data Import

PDF
Focus Mode
Font Size
Last updated: 2025-03-31 14:55:26
This document describes how to import data from HDFS to Tencent Cloud TCHouse-C.

Prerequisites

1. Read permissions of HDFS are required for HDFS data access. See Access Control Overview for how to set permissions.
2. The HDFS instance and Tencent Cloud TCHouse-C cluster must be in the same VPC.

Directions

1. Log in to Tencent Cloud TCHouse-C and create an HDFS table.
CREATE TABLE hdfs_engine_table
(
`int_id` UInt32
)
ENGINE = ENGINE=HDFS('hdfs://hdfs1:9000/other_storage', 'TSV')
Reference
ENGINE = HDFS(URI, format) URI is the URI of the entire file in HDFS, and format specifies an available file format. For more formats, see Formats for Input and Output Data. A path URI may contain glob wildcards. In this case, the table will be read-only.
2. Create a ClickHouse target table.
If your cluster has one replica:
CREATE TABLE test.test on cluster default_cluster
(
`int_id` UInt32
)
engine = MergeTree()
order by int_id;
If your cluster has two replicas:
create table test.test on cluster default_cluster
(
`int_id` UInt32
)
engine = ReplicatedMergeTree('/clickhouse/tables/test/test/{shard}', '{replica}')
order by int_id;
Create a distributed table:
create table test.test_dis on cluster default
AS test.test
engine = Distributed('default_cluster', 'test', 'test', rand());
3. Write data to the target table.
INSERT INTO test.test SELECT * FROM hdfs_engine_table;
4. Query the data.
select * from test.test


Help and Support

Was this page helpful?

Help us improve! Rate your documentation experience in 5 mins.

Feedback