Release Notes and Announcements
- Release Notes
- Announcements
- Security Announcements
Product Introduction
- Overview
- Strengths
- Architecture
- Features
- Use Cases
- Constraints and Limits
- Technical Support Scope
- Product release
Purchase Guide
- EMR on CVM Billing Instructions
- EMR on TKE Billing Instructions
- EMR Serverless HBase Billing Instructions
- EMR Serverless TCBase Billing Overview
Getting Started
- EMR on CVM Quick Start
- EMR on TKE Quick Start
EMR on CVM Operation Guide
- Planning Cluster
- Administrative rights
- Configuring Cluster
- Managing Cluster
- Managing Service
- Monitoring and Alarms
- TCInsight
EMR on TKE Operation Guide
- Introduction to EMR on TKE
- Configuring Cluster
- Cluster Management
- Service Management
- Monitoring and Ops
- Application Analysis
EMR Serverless HBase Operation Guide
- EMR Serverless HBase Product Introduction
- Quotas and Limits
- Planning an Instance
- Managing an Instance
- Monitoring and Alarms
- Development Guide
EMR Serverless TCBase Operation Guide
- Introduction to EMR Serverless TCBase
- Managing Instances
- Managing Services
- Monitoring and Alarms
EMR Development Guide
- Hadoop Development Guide
- Spark Development Guide
- HBase Development Guide
- Phoenix on Hbase Development Guide
- Hive Development Guide
- Presto Development Guide
- Sqoop Development Guide
- Hue Development Guide
- Oozie Development Guide
- Flume Development Guide
- Kerberos Development Guide
- Knox Development Guide
- Alluxio Development Guide
- Kylin Development Guide
- Livy Development Guide
- Kyuubi Development Guide
- Zeppelin Development Guide
- Hudi Development Guide
- Superset Development Guide
- Impala Development Guide
- Druid Development Guide
- TensorFlow Development Guide
- Kudu Development Guide
- Ranger Development Guide
- Kafka Development Guide
- StarRocks Development Guide
- Flink Development Guide
- JupyterLab Development Guide
- MLflow Development Guide
Practical Tutorial
- Practice of EMR on CVM Ops
- Data Migration
- Practical Tutorial on Custom Scaling
API Documentation
- History
- Introduction
- API Category
- Making API Requests
- Cluster Resource Management APIs
- Cluster Services APIs
- User Management APIs
- Information Query APIs
- Scaling APIs
- Configuration APIs
- Other APIs
- Cluster Lifecycle APIs
- Serverless HBase APIs
- YARN Resource Scheduling APIs
- Data Types
- Error Codes
FAQs
- EMR on CVM
Service Level Agreement
Contact Us

Meson Engine

Download

Modo Foco

Tamanho da Fonte

Última atualização: 2025-09-26 16:41:47

Meson Engine is a high-performance vectorized query engine built into EMR Spark. It supports seamless acceleration of Spark SQL workloads and DataFrame API calls, reducing the overall cost of workloads. Compared with open-source Spark, it offers a 2.7x performance improvement in TPC-DS 1TB benchmark. Meson is fully compatible with Apache Spark APIs, requiring no changes to existing business code. In EMR product versions that support Meson Engine,  you only need to modify a small amount of configuration to enable it.
Principle Introduction
With the extensive application of SSDs and significant improvement in network interface card performance, the performance bottleneck of the Spark engine has shifted from the traditional understanding of IO to computing resources mainly driven by CPU. However, CPU optimization schemes around JVM (such as Codegen) face many constraints, such as limits on bytecode length and number of parameters. Developers also find it difficult to leverage some features of modern CPUs on JVM.
The Meson Engine transforms Spark Physical Plan, uses a C++ implemented vectorized acceleration library to execute computations, and returns the executed data in a columnar format, enhancing memory and bandwidth utilization efficiency. This breakthrough in performance bottlenecks can effectively improve the efficiency of Spark jobs.
Usage Restrictions
The Meson Engine currently has usage scenario limits. In restricted scenarios, the Meson engine will perform Fallback and revert to the Native Spark engine for execution. Since Fallback needs to convert data, too many Fallback times may lead to a longer total running time than the Native Spark engine.
Please learn about the main usage limits of Meson Engine in advance.
Supports Parquet data format. ORC support is not currently optimized. Other data formats are not supported.
ANSI mode is not supported.
Applications based on RDD are not supported.
Structured Streaming is not supported.
Custom Python code based on PySpark is not supported.
MEMORY_ONLY CacheTable is not supported.
Applicable Scenarios
Support capability is provided based on Spark 3.5.3 and above versions.
Note:
Meson Engine does not fully support or has unsupported storage formats, data types, operators, and functions, which will fall back to Native Spark engine execution.
Storage Format
Meson engine supported data storage format:
Supported data formats: Parquet, ORC
Supported table formats: Iceberg, Hive
Data Types
Meson engine supported data types:
Byte,Short,Int,Long
Boolean
String,Binary
Decimal
Float,Double
Date,Timestamp
Operators
Type
Supported Operators
Unsupported Operators
Source
FileSourceScanExec,HiveTableScanExec,BatchScanExec,InMemoryTableScanExec
-
Sink
DataWritingCommandExec,InsertIntoHiveTable,
-
Common
FilterExec,ProjectExec,SortExec,UnionExec
-
Aggregate
HashAggregateExec
SortAggregateExec,ObjectHashAggregateExec
Join
BroadcastHashJoinExec,ShuffledHashJoinExec,SortMergeJoinExec,BroadcastNestedLoopJoinExec,CartesianProductExec
-
Window
WindowExec
WindowGroupLimitExec
Exchange
ShuffleExchangeExec,ReusedExchangeExec,BroadcastExchangeExec,CoalesceExec
CustomShuffleReaderExec
Limit
GlobalLimitExec,LocalLimitExec,TakeOrderedAndProjectExec,CollectLimitExec
-
Subquery
SubqueryBroadcastExec
-
Other
ExpandExec,GenerateExec,CollectTailExec,RangeExec
RangeExec,SampleExec
Functions
Type
Supported Functions
Generator Functions
explode,explode_outer,inline,inline_outer,posexplode,posexplode_outer,stack
Window Functions
cume_dist,dense_rank,lag,lead,nth_value,ntile,percent_rank,rank,row_number
Aggregate Functions
any,any_value,approx_count_distinct,approx_percentile,array_agg,avg,bit_and,bit_or,bit_xor,bool_and,bool_or,collect_list,collect_set,corr,count,count_if,covar_pop,covar_samp,every,first,first_value,grouping,grouping_id,kurtosis,last,last_value,max,max_by,mean,median,min,min_by,percentile,percentile_approx,regr_avgx,regr_avgy,regr_count,regr_intercept,regr_r2,regr_slope,regr_sxx,regr_sxy,regr_syy,skewness,some,std,stddev,stddev_pop,stddev_samp,sum,try_avg,try_sum,var_pop,var_samp,variance
Array Functions
array,array_append,array_compact,array_contains,array_distinct,array_except,array_insert,array_intersect,array_join,array_max,array_min,array_position,array_prepend,array_remove,array_repeat,array_union,arrays_overlap,arrays_zip,flatten,get,shuffle,slice,sort_array
Bitwise Functions
&,^,bit_count,bit_get,getbit,shiftright,|,~
Collection Functions
array_size,cardinality,concat,reverse,size
Conditional Functions
coalesce,if,ifnull,nanvl,nullif,nvl,nvl2,when
Conversion Functions
bigint,binary,boolean,cast,date,decimal,double,float,int,smallint,string,timestamp,tinyint
Date and Timestamp Functions
add_months,date_add,date_diff,date_format,date_from_unix_date,date_sub,date_trunc,dateadd,datediff,day,dayofmonth,dayofweek,dayofyear,extract,from_unixtime,from_utc_timestamp,hour,last_day,make_date,make_timestamp,make_ym_interval,minute,month,next_day,quarter,second,timestamp_micros,timestamp_millis,to_unix_timestamp,to_utc_timestamp,trunc,unix_date,unix_micros,unix_millis,unix_seconds,unix_timestamp,weekday,weekofyear,year
Hash Functions
crc32,hash,md5,sha,sha1,sha2,xxhash64
JSON Functions
from_json,get_json_object,json_array_length,json_object_keys,json_tuple,schema_of_json,to_json
Lambda Functions
aggregate,array_sort,exists,filter,forall,map_filter,map_zip_with,reduce,transform,transform_keys,transform_values,zip_with
Map Functions
element_at,map,map_concat,map_contains_key,map_entries,map_keys,map_values,str_to_map,try_element_at
Mathematical Functions
%,*,+,-,/,abs,acos,acosh,asin,asinh,atan,atan2,atanh,bin,cbrt,ceil,ceiling,conv,cos,cosh,cot,csc,degrees,e,exp,expm1,factorial,floor,greatest,hex,hypot,least,log,log10,log1p,log2,mod,negative,pi,pmod,positive,pow,power,rand,random,rint,round,sec,shiftleft,sign,signum,sinh,sqrt,try_add,unhex,width_bucket
Misc Functions
assert_true,equal_null,spark_partition_id,uuid,version,||
Predicate Functions
!,!=,<,<=,<=>,<>,=,==,>,>=,and,between,case,ilike,in,isnan,isnotnull,isnull,like,not,or,regexp,regexp_like
String Functions
ascii,base64,bit_length,btrim,char,char_length,character_length,chr,concat_ws,contains,endswith,find_in_set,format_number,format_string,initcap,instr,lcase,left,len,length,levenshtein,locate,lower,lpad,ltrim,luhn_check,mask,overlay,position,regexp_extract,regexp_extract_all,regexp_replace,repeat,replace,right,rpad,rtrim,soundex,split,split_part,startswith,substr,substring,substring_index,translate,trim,ucase,unbase64,upper
Struct Functions
named_struct,struct
URL Functions
url_decode,url_encode
Enabling Meson Acceleration
EMR-V3.7.0
To create an EMR-V3.7.0 Version Cluster, you can use the configuration management feature in the EMR Console to add the following configuration in the spark-defaults.conf configuration file to enable this feature:
Parameter
Description
spark.plugins
The plug-in used by Spark, set the parameter value to org.apache.gluten.GlutenPlugin (if spark.plugins is already configured, you can add org.apache.gluten.GlutenPlugin to it, use comma "," as separator).
spark.memory.offHeap.enabled
Set to true, Meson speed up requires the use of JVM off memory
spark.memory.offHeap.size
Set the offHeap Memory size according to actual conditions. For details, see recommended configurations for executor memory of varying specifications.
spark.shuffle.manager
The columnar shuffle manager used by Meson, set the parameter value to: org.apache.spark.shuffle.sort.ColumnarShuffleManager
Recommended memory configurations for Executors of varying specifications:
executor-cores
spark.executor.memory
spark.memory.offHeap.size
2
2GB
4GB
4
3GB
10GB
8
6GB
20GB
EMR-V3.6.1(beta)
To create an EMR-V3.6.1 Version Cluster, you can use the configuration management feature in the EMR Console to add the following configuration in the spark-defaults.conf configuration file to enable this feature:
Parameter
Description
spark.plugins
The plug-in used by Spark, set the parameter value to org.apache.gluten.GlutenPlugin (if spark.plugins is already configured, you can add org.apache.gluten.GlutenPlugin to it, use comma "," as separator).
spark.memory.offHeap.enabled
Set to true, Native speed up requires the use of JVM off memory
spark.memory.offHeap.size
Set the offHeap Memory size according to actual conditions. The initial size can be set to 1G.
spark.shuffle.manager
The columnar shuffle manager used by Meson, set the parameter value to: org.apache.spark.shuffle.sort.ColumnarShuffleManager
spark.driver.extraClassPath
The Gluten native jar used by Spark, the default path of the jar is /usr/local/service/spark/gluten
spark.executor.extraClassPath
The Gluten native jar used by Spark, with the default path at /usr/local/service/spark/gluten
spark.executorEnv.LIBHDFS3_CONF
Path of the integrated HDFS cluster configuration file, default at /usr/local/service/hadoop/etc/hadoop/hdfs-site.xml
﻿

Ajuda e Suporte

Esta página foi útil?

Você também pode entrar em contato com a Equipe de vendas ou Enviar um tíquete em caso de ajuda.

comentários

tencent cloud

Elastic MapReduce

Meson Engine

Principle Introduction

Usage Restrictions

Applicable Scenarios

Storage Format

Data Types

Operators

Functions

Enabling Meson Acceleration

EMR-V3.7.0

EMR-V3.6.1(beta)

Ajuda e Suporte

Type	Supported Operators	Unsupported Operators
Source	FileSourceScanExec,HiveTableScanExec,BatchScanExec,InMemoryTableScanExec	-
Sink	DataWritingCommandExec,InsertIntoHiveTable,	-
Common	FilterExec,ProjectExec,SortExec,UnionExec	-
Aggregate	HashAggregateExec	SortAggregateExec,ObjectHashAggregateExec
Join	BroadcastHashJoinExec,ShuffledHashJoinExec,SortMergeJoinExec,BroadcastNestedLoopJoinExec,CartesianProductExec	-
Window	WindowExec	WindowGroupLimitExec
Exchange	ShuffleExchangeExec,ReusedExchangeExec,BroadcastExchangeExec,CoalesceExec	CustomShuffleReaderExec
Limit	GlobalLimitExec,LocalLimitExec,TakeOrderedAndProjectExec,CollectLimitExec	-
Subquery	SubqueryBroadcastExec	-
Other	ExpandExec,GenerateExec,CollectTailExec,RangeExec	RangeExec,SampleExec

Type	Supported Functions
Generator Functions	explode,explode_outer,inline,inline_outer,posexplode,posexplode_outer,stack
Window Functions	cume_dist,dense_rank,lag,lead,nth_value,ntile,percent_rank,rank,row_number
Aggregate Functions	any,any_value,approx_count_distinct,approx_percentile,array_agg,avg,bit_and,bit_or,bit_xor,bool_and,bool_or,collect_list,collect_set,corr,count,count_if,covar_pop,covar_samp,every,first,first_value,grouping,grouping_id,kurtosis,last,last_value,max,max_by,mean,median,min,min_by,percentile,percentile_approx,regr_avgx,regr_avgy,regr_count,regr_intercept,regr_r2,regr_slope,regr_sxx,regr_sxy,regr_syy,skewness,some,std,stddev,stddev_pop,stddev_samp,sum,try_avg,try_sum,var_pop,var_samp,variance
Array Functions	array,array_append,array_compact,array_contains,array_distinct,array_except,array_insert,array_intersect,array_join,array_max,array_min,array_position,array_prepend,array_remove,array_repeat,array_union,arrays_overlap,arrays_zip,flatten,get,shuffle,slice,sort_array
Bitwise Functions	&,^,bit_count,bit_get,getbit,shiftright,\|,~
Collection Functions	array_size,cardinality,concat,reverse,size
Conditional Functions	coalesce,if,ifnull,nanvl,nullif,nvl,nvl2,when
Conversion Functions	bigint,binary,boolean,cast,date,decimal,double,float,int,smallint,string,timestamp,tinyint
Date and Timestamp Functions	add_months,date_add,date_diff,date_format,date_from_unix_date,date_sub,date_trunc,dateadd,datediff,day,dayofmonth,dayofweek,dayofyear,extract,from_unixtime,from_utc_timestamp,hour,last_day,make_date,make_timestamp,make_ym_interval,minute,month,next_day,quarter,second,timestamp_micros,timestamp_millis,to_unix_timestamp,to_utc_timestamp,trunc,unix_date,unix_micros,unix_millis,unix_seconds,unix_timestamp,weekday,weekofyear,year
Hash Functions	crc32,hash,md5,sha,sha1,sha2,xxhash64
JSON Functions	from_json,get_json_object,json_array_length,json_object_keys,json_tuple,schema_of_json,to_json
Lambda Functions	aggregate,array_sort,exists,filter,forall,map_filter,map_zip_with,reduce,transform,transform_keys,transform_values,zip_with
Map Functions	element_at,map,map_concat,map_contains_key,map_entries,map_keys,map_values,str_to_map,try_element_at
Mathematical Functions	%,*,+,-,/,abs,acos,acosh,asin,asinh,atan,atan2,atanh,bin,cbrt,ceil,ceiling,conv,cos,cosh,cot,csc,degrees,e,exp,expm1,factorial,floor,greatest,hex,hypot,least,log,log10,log1p,log2,mod,negative,pi,pmod,positive,pow,power,rand,random,rint,round,sec,shiftleft,sign,signum,sinh,sqrt,try_add,unhex,width_bucket
Misc Functions	assert_true,equal_null,spark_partition_id,uuid,version,\|\|
Predicate Functions	!,!=,<,<=,<=>,<>,=,==,>,>=,and,between,case,ilike,in,isnan,isnotnull,isnull,like,not,or,regexp,regexp_like
String Functions	ascii,base64,bit_length,btrim,char,char_length,character_length,chr,concat_ws,contains,endswith,find_in_set,format_number,format_string,initcap,instr,lcase,left,len,length,levenshtein,locate,lower,lpad,ltrim,luhn_check,mask,overlay,position,regexp_extract,regexp_extract_all,regexp_replace,repeat,replace,right,rpad,rtrim,soundex,split,split_part,startswith,substr,substring,substring_index,translate,trim,ucase,unbase64,upper
Struct Functions	named_struct,struct
URL Functions	url_decode,url_encode

Parameter	Description
spark.plugins	The plug-in used by Spark, set the parameter value to org.apache.gluten.GlutenPlugin (if spark.plugins is already configured, you can add org.apache.gluten.GlutenPlugin to it, use comma "," as separator).
spark.memory.offHeap.enabled	Set to true, Meson speed up requires the use of JVM off memory
spark.memory.offHeap.size	Set the offHeap Memory size according to actual conditions. For details, see recommended configurations for executor memory of varying specifications.
spark.shuffle.manager	The columnar shuffle manager used by Meson, set the parameter value to: org.apache.spark.shuffle.sort.ColumnarShuffleManager

executor-cores	spark.executor.memory	spark.memory.offHeap.size
2	2GB	4GB
4	3GB	10GB
8	6GB	20GB