tencent cloud

Data Lake Compute

Release Notes
Product Introduction
Overview
Strengths
Use Cases
Purchase Guide
Billing Overview
Refund
Payment Overdue
Configuration Adjustment Fees
Getting Started
Complete Process for New User Activation
DLC Data Import Guide
Quick Start with Data Analytics in Data Lake Compute
Quick Start with Permission Management in Data Lake Compute
Quick Start with Partition Table
Enabling Data Optimization
Cross-Source Analysis of EMR Hive Data
Standard Engine Configuration Guide
Configuring Data Access Policy
Operation Guide
Console Operation Introduction
Development Guide
Runtime Environment
SparkJar Job Development Guide
PySpark Job Development Guide
Query Performance Optimization Guide
UDF Function Development Guide
System Restraints
Client Access
JDBC Access
TDLC Command Line Interface Tool Access
Third-party Software Linkage
Python Access
Practical Tutorial
Accessing DLC Data with Power BI
Table Creation Practice
Using Apache Airflow to Schedule DLC Engine to Submit Tasks
Direct Query of DLC Internal Storage with StarRocks
Spark cost optimization practice
DATA + AI
Using DLC to Analyze CLS Logs
Using Role SSO to Access DLC
Resource-Level Authentication Guide
Implementing Tencent Cloud TCHouse-D Read and Write Operations in DLC
DLC Native Table
SQL Statement
SuperSQL Statement
Overview of Standard Spark Statement
Overview of Standard Presto Statement
Reserved Words
API Documentation
History
Introduction
API Category
Making API Requests
Data Table APIs
Task APIs
Metadata APIs
Service Configuration APIs
Permission Management APIs
Database APIs
Data Source Connection APIs
Data Optimization APIs
Data Engine APIs
Resource Group for the Standard Engine APIs
Data Types
Error Codes
General Reference
Error Codes
Quotas and limits
Operation Guide on Connecting Third-Party Software to DLC
FAQs
FAQs on Permissions
FAQs on Engines
FAQs on Features
FAQs on Spark Jobs
DLC Policy
Privacy Policy
Data Privacy And Security Agreement
Service Level Agreement
Contact Us

FAQs on Spark Jobs

PDF
Focus Mode
Font Size
Last updated: 2025-07-30 15:10:53

Does Data Skew in PySpark Tasks Cause OOMKilled Due to Python+JVM Memory Usage Exceeding Kubernetes Requests?

Issue description: During PySpark task execution, executor logs show "Kubernetes OOMKilled" with memory usage exceeding Kubernetes limits.
Cause analysis: The memory requested by Kubernetes is calculated based on Spark executor memory multiplied by memoryOverheadFactor. If the data processed by Python is skewed or a single data entry is too large, it may cause memory usage to exceed the memory allocated by Kubernetes.
Solution: Add a task parameter spark.kubernetes.memoryOverheadFactor=0.8 with the default value of 0.4.
Operation steps: Go to DLC console and Data Job (Spark Job) > Edit Jobs, then configure as follows:


How to Automatically Add a REPARTITION command After INSERT INTO/OVERWRITE for Data Partition to Reduce the Number of Small Files?

Solution: Enable auto repartitioning and configure the following parameters:
spark.sql.adaptive.enabled:true
spark.sql.adaptive.insert.repartition:true
spark.sql.adaptive.insert.repartition.forceNum: 300 (specifies partition count)
Directions:
Configure to SparkConf in the program:



Set SQL SET in the program:


Why Do PySpark Tasks Return 503 Errors During High-Concurrency Writes to COS Buckets?

Issue description: During high-concurrency writes to COS buckets by PySpark tasks, executors report frequent 503 errors returned by COS.
Cause: The parallelism for Spark tasks writing to COS is determined by fs.cosn.trsf.fs.ofs.data.transfer.thread.count. For example, without tuning on 4096 cores, the default concurrency is 4096×32=131,072, creating COS bottlenecks.
Solution:
1. Create a metadata-accelerated bucket in COS to prevent rate limiting from frequent list and rename operations during Spark task writes.
2. Adjust bandwidth limits for metadata-accelerated buckets in COS.
3. Add the following parameters to tasks to reduce excessive access pressure on COS during high parallelism.
fs.cosn.trsf.fs.ofs.data.transfer.thread.count=8
fs.cosn.trsf.fs.ofs.block.max.file.cache.mb=0
spark.hadoop.fs.cosn.trsf.fs.ofs.data.transfer.thread.count=8
spark.hadoop.fs.cosn.trsf.fs.ofs.block.max.file.cache.mb=0

What Are Common Data Governance SQL Commands?

Disable database governance SQL.
ALTER DATABASE DataLakeCatalog.demo_db
SET
DBPROPERTIES (
'dlc.ao.data.govern.inherit' = 'none',
'dlc.ao.merge.data.enable' = 'disable',
'dlc.ao.expired.snapshots.enable' = 'disable',
'dlc.ao.remove.orphan.enable' = 'disable',
'dlc.ao.merge.manifests.enable' = 'disable'
)
Enable database governance SQL.
ALTER DATABASE DataLakeCatalog.db_name
SET
DBPROPERTIES (
'dlc.ao.data.govern.inherit' = 'none',
'dlc.ao.merge.data.enable' = 'enable',
'dlc.ao.merge.data.engine' = 'bda-sinker',
'dlc.ao.merge.data.min-input-files' = '10',
'dlc.ao.merge.data.target-file-size-bytes' = '536870912',
'dlc.ao.merge.data.interval-min' = '90',
'dlc.ao.expired.snapshots.enable' = 'enable',
'dlc.ao.expired.snapshots.engine' = 'bda-sinker',
'dlc.ao.expired.snapshots.retain-last' = '5',
'dlc.ao.expired.snapshots.before-days' = '2',
'dlc.ao.expired.snapshots.max-concurrent-deletes' = '4',
'dlc.ao.expired.snapshots.interval-min' = '150',
'dlc.ao.remove.orphan.enable' = 'enable',
'dlc.ao.remove.orphan.engine' = 'bda-sinker',
'dlc.ao.remove.orphan.before-days' = '3',
'dlc.ao.remove.orphan.max-concurrent-deletes' = '4',
'dlc.ao.remove.orphan.interval-min' = '600',
'dlc.ao.merge.manifests.enable' = 'enable',
'dlc.ao.merge.manifests.engine' = 'bda-sinker',
'dlc.ao.merge.manifests.interval-min' = '1440'
)
Disable table governance SQL.
ALTER TABLE
`DataLakeCatalog`.`db_name`.`tb_name`
SET
TBLPROPERTIES (
'dlc.ao.data.govern.inherit' = 'none',
'dlc.ao.merge.data.enable' = 'disable',
'dlc.ao.expired.snapshots.enable' = 'disable',
'dlc.ao.remove.orphan.enable' = 'disable',
'dlc.ao.merge.manifests.enable' = 'disable'
)
Enable inherited database governance SQL..
ALTER TABLE `DataLakeCatalog`.`db_name`.`tb_name`
SET TBLPROPERTIES ('dlc.ao.data.govern.inherit' = 'default')
Enable table governance SQL.
ALTER TABLE
`DataLakeCatalog`.`db_name`.`tb_name`
SET
TBLPROPERTIES (
'dlc.ao.data.govern.inherit' = 'none',
'dlc.ao.merge.data.enable' = 'enable',
'dlc.ao.merge.data.engine' = 'bda-sinker',
'dlc.ao.merge.data.min-input-files' = '10',
'dlc.ao.merge.data.target-file-size-bytes' = '536870912',
'dlc.ao.merge.data.interval-min' = '90',
'dlc.ao.expired.snapshots.enable' = 'enable',
'dlc.ao.expired.snapshots.engine' = 'bda-sinker',
'dlc.ao.expired.snapshots.retain-last' = '5',
'dlc.ao.expired.snapshots.before-days' = '2',
'dlc.ao.expired.snapshots.max-concurrent-deletes' = '4',
'dlc.ao.expired.snapshots.interval-min' = '150',
'dlc.ao.remove.orphan.enable' = 'enable',
'dlc.ao.remove.orphan.engine' = 'bda-sinker',
'dlc.ao.remove.orphan.before-days' = '3',
'dlc.ao.remove.orphan.max-concurrent-deletes' = '4',
'dlc.ao.remove.orphan.interval-min' = '600',
'dlc.ao.merge.manifests.enable' = 'enable',
'dlc.ao.merge.manifests.engine' = 'bda-sinker',
'dlc.ao.merge.manifests.interval-min' = '1440'
)
Full-table merge SQL (no WHERE clause).
CALL `DataLakeCatalog`.`system`.`rewrite_data_files`(
`table` => 'tb_name',
`options` => map(
'min-input-files',
'10',
'target-file-size-bytes',
'536870912',
'delete-file-threshold',
'1',
'max-concurrent-file-group-rewrites',
'20'
)
)
Incremental merge SQL (with WHERE clause).
CALL `DataLakeCatalog`.`system`.`rewrite_data_files`(
`table` => 'tb_name',
`options` => map(
'min-input-files',
'10',
'target-file-size-bytes',
'536870912',
'delete-file-threshold',
'1',
'max-concurrent-file-group-rewrites',
'20'
),
`where` => 'field_date > "2022-01-01" and field_date <= "2023-01-01"'
)
Snapshot expiration SQL.
CALL `DataLakeCatalog`.`system`.`expire_snapshots`(
`table` => 'tb_name',
older_than => TIMESTAMP '2023-02-28 16:06:35.000',
retain_last => 1,
max_concurrent_deletes => 4,
stream_results => true
)

How to View SQL Execution Plans and Logs?

View SQL execution plans: Use the EXPLAIN keyword in Data Explore to examine physical execution plans. For detailed EXPLAIN usage, refer to EXPLAIN.
View SQL execution logs:
1. SQL execution logs display in the results when running SQL in Data Explore.
2. View SQL execution logs in DLC console > Data Ops > Execution History.

Does Data Write Failure Occur Due to CAST Operations Not Automatically Converting Precision?

Issue description: When migrating Hive SQL to Spark SQL, an error occurred: Cannot safely cast 'class_type': string to bigint.
Cause positioning: Starting from Spark 3.0.0, Spark SQL enforces 3 security policies when performing type conversions:
ANSI: Prohibits Spark from performing certain unreasonable type conversions, for example, 'string' to 'timestamp'.
LEGACY: Allows Spark to force type conversions as long as it's an effective CAST operation.
STRICT: Prohibits Spark from performing any conversion that may damage precision. The default policy is ANSI.
Solution: Switch to the LEGACY policy by setting spark.sql.storeAssignmentPolicy=LEGACY.

An error of 'QUERY_PROGRESS_UPDATE_ERROR(code=3060): Failed to update statement progress'

Issue description: When submitting Spark SQL tasks in Data Explore, the system reports a "Failed to update statement progress" error during execution.
Cause positioning: When multiple Spark SQL tasks are submitted, continuous asynchronous tracking is required for each SQL execution progress. The async processing queue here has a capacity limit, with a default value of 100 (updated to 300 for versions after January 14, 2024). Therefore, if a submitted task remains incomplete while new tasks exceed the queue limit, this error occurs. Such errors typically indicate the SQL task might be a long-tail task. You should evaluate its resource impact on concurrent operations.
Solution: You can adjust the parameter livy.rsc.retained-statements to a value larger than its default in engine configurations. Note that the engine will restart after adjustments. The specific value can be set based on task concurrency. This parameter has minimal impact on the cluster. When concurrent SQL submissions reach 100-200/min, setting this parameter to 6000 was validated through production testing with negligible impact.


Help and Support

Was this page helpful?

Help us improve! Rate your documentation experience in 5 mins.

Feedback