Release Notes 0.14.0
See original GitHub issueSince the release of 0.13, Apache Doris (incubating) contains around 390 new features, bug fixes, performance enhancements, documentation improvements, code refactors from 60+ contributors. We are ready to release Apache Doris (incubating) 0.14.
New Feature
Import and delete
Support to delete multiple pieces of data at one time through the import method to avoid performance degradation caused by multiple deletions. For tables of the UniqueKey model, support to specify the Sequence column when importing. Doris will judge the sequence of the data according to the value of the Sequence column to ensure that the data is imported Time order
[#4310] [#4256]
Support database backup
The support in the backup stmt specifies the backup content (metadata and data). Support exclude backup and restore some tables in stmt. When backing up the entire database, you can exclude some very large and unimportant tables. Supports backing up and restoring the entire database instead of declaring each table name in the backup and restore statement.
[#5314]
ODBC external table support
Support access to external tables such as MySQL, postgresql, Oracle, etc. through ODBC protocol
[#4798] [#4438] [#4559] [#4699]
Support SQL level and Partition level result Cache
Support for caching query results to improve the efficiency of repeated queries, support SQL-level and Partition-level results Cache [#4330]
Built-in functions
- Support bitmap_xor function [#5098]
- Add replace() function [#4347]
- Add the time_round function to support time alignment according to multiple time granularities [#4640]
FE interface and HTTP interface
-
The new FE UI interface can be enabled by setting the FE configuration item enable_http_server_v2 [#4684]
-
BE adds an http interface to show the distribution of all tablets in a partition among different disks in a BE [#5096]
-
BE adds an http interface to manually migrate a tablet to other disks on the same node [#5101]
-
Support to modify the configuration items of FE and BE through http, and persist these modifications [#4704]
Compatibility with MySQL
- Added support for views table in the information_schema database [#4778]
- Added table_privileges, schema_privileges and user_privileges to the information_schema library for compatibility with certain MySQL applications [#4899]
- A new statistic table is added to the information_schema meta-database for compatibility with some MySQL tools [#4991]
Monitoring
-
BE added tablet-level monitoring indicators, including scanned data volume and row number, written data volume and row number, to help locate hot tablets [#4428]
-
BE added metrics to view the usage of various LRU caches [#4688]
Table building related
- Added CREATE TABLE LIKE statement to facilitate the creation of a table metadata copy [#4705]
- Support atomic replacement of two tables through replace statement [#4669]
Support backup,restore,load,export directly connect to s3 [#5399]
Other
-
Support adding Optimizer Hints of type SET_VAR in the Select statement to set session variables [#4504]
-
Support to repair damaged tablets by filling in empty tablets [#4255]
-
Support Bucket Shuffle Join function (when the Join condition column is a subset of the table bucket column, the right table will be shuffled to the node where the data in the left table is located, which can significantly reduce the network overhead caused by Shuffle Join and improve query speed) [# 4677]
-
Support batch cancel import tasks through cancel load statement [#4515]
-
Add a Session variable to set whether to allow the partition column to be NULL [#5013]
-
Support TopN aggregation function [#4803]
-
Support a new data balancing logic based on the number of partitions and buckets [#5010]
-
Support creating indexes on the value column of unique table [#5305]
Enhancement
Performance improvement
- Implemented a new compaction selection algorithm, providing lower write amplification and a more reasonable compaction strategy [#4212]
- Optimize bit operation efficiency in variable length coding [#4366]
- Improve the execution efficiency of monery_format function [#4672]
- Optimize query execution plan: When the bucket column of the table is a subset of the GroupBy column in SQL, reduce the data shuffle step [#4482]
- Improve the efficiency of column name search on BE [#4779]
- Improve the performance of the BE side LRU Cache [#4781]
- Optimized the tablet selection strategy of Compaction, reducing the number of invalid selections [#4964]
- Optimized the reading efficiency of Unique Key table [#4958]
- Optimized the memory usage of LoadJob on the FE side and reduced the memory overhead on the FE side [#4993]
- Reduce the lock granularity in FE metadata from Database level to Table level to support more fine-grained concurrent access to metadata [#3775]
- Avoid unnecessary memory copy when creating hash table [#5301]
- Remove the path check when BE starts to speed up BE startup speed [#5268]
- Optimize the import performance of Json data [#5114]
Functional improvements
- SQL supports collate utf8_general_ci syntax to improve MySQL syntax compatibility [#4365]
- Improve the function of Batch delete, improve and optimize the related compaction process [#4425]
- Enhance the function of parse_url() function, support lowercase, support parsing port [#4429]
- When SQL execution specifies the execution mode of join (Join Hint), the Colocation Join function will be disabled by default [#4497]
- Dynamic partition support hour level [#4514]
- HTTP interface on BE side supports gzip compression [#4533]
- Optimized the use of threads on the BE side [#4440]
- Optimize the checking process and error message of the rand() function in the query analysis stage [#4439]
- Optimize the compaction triggering and execution logic to better limit the resource overhead (mainly memory overhead) of the compaction operation, and trigger the compaction operation more reasonably [#4670]
- Support pushing Limit conditions to ODBC/MySQL external tables [#4707]
- Increase the limit on the number of tablet versions on the BE side to prevent excessive data versions from causing abnormal cluster load [#4687]
- When an RPC error occurs in a query, it can quickly return specific error information to prevent the query from being stuck [#4702]
- Support automatic mapping of count(distinct if(bool, bitmap, null)) to bitmap_union_count function [#4201]
- Support set sql_mode = concat(@@sql_mode, “STRICT_TRANS_TABLES”) statement [#4359]
- Support all stream load features in multiload [#4717]
- Optimize BE’s strategy for selecting disks when creating tablets, and use the “two random choices” algorithm to ensure tablet copies are more even [#4373]
- When creating a materialized view, the bitmap_union aggregation method only supports integer columns, and hll_union does not support decimal columns [#4432]
- Optimize the log level of some FEs to avoid log writing becoming a bottleneck [#4766]
- In the describe table statement, display the definition expression of the aggregate column of the materialized view [#4446]
- Support convert() function [#4364] -Support cast (expr as signed/unsigned int) syntax to be compatible with MySQL ecology -Add more columns to the information_schema.columns table to be compatible with the MySQL ecosystem
- In Spark Load function, use yarn command line instead of yarn-client API to kill job or get job status [#4383]
- Persistence of stale rowset meta-information to ensure that this information will not be lost after BE restarts [#4454]
- Return an error code in the schema change result to more clearly inform the user of the specific error [#4388]
- Optimize the rowset selection logic of some compactions to make the selection strategy more accurate [#5152]
- Optimize the Page Cache on the BE side, divide Page into data cache and index cache [#5008]
- Optimized the accuracy of functions such as variance and standard deviation on Decimal type [#4959]
- Optimized the processing logic of predicates pushed down to ScanNode to avoid repeated filtering of predicate conditions at the query layer and improve query efficiency [#4999]
- Optimized the predicate push-down logic of Unique Key table, and supports push-down the conditions of non-primary key columns [#5022]
- Support pushing down “not in” and “!=” to the storage layer to improve query efficiency [#5207]
- Support writing multiple memtables of a tablet in parallel during import. Improve import efficiency [#5163]
- Optimize the creation logic of ZoneMap. When the number of rows on a page is too small, ZoneMap will not be created anymore [#5260]
- Added histogram monitoring indicator class on BE [#5148]
- When importing Parquet files, if there is a parsing error, the specific file name will be displayed in the error message [#4954]
- Optimize the creation logic of dynamic partitions, the table under construction directly triggers the creation of dynamic partitions [#5209]
- In the result of the SHOW BACKENDS command, display the real start time of BE [#4872]
- Support column names start with @ symbol, mainly used to support mapping ES tables [#5006]
- Optimize the logic of the mapping and conversion relationship of the declared columns in the import statement to make the use more clear [#5140]
- Optimize the execution logic of colocation join to make the query plan more evenly executed on multiple BE nodes [#5104]
- Optimize the predicate pushdown logic, and support pushdown of is null and is not null to the storage engine [#5092]
- Optimize the BE node selection logic in bucket join [#5133]
- Support UDF in import operation [#4863]
Other
- Added support for IN Predicate in delete statement [#4404]
- Update the Dockerfile of the development image and add some new dependencies [#4474]
- Fix various spelling errors in the code and documentation [#4714] [#4712] [#4722] [#4723] [#4724] [#4725] [#4726] [#4727]
- Added two segment-related indicators in the OlapScanNode of the query profile to display the total number of segments and the number of filtered segments [#4348]
- Add batch delete function description document [#4435]
- Added Spark Load syntax manual [#4463]
- Added the display of cumulative compaction strategy name and rowset data size in BE’s /api/compaction/show API [#4466]
- Redirect the Spark Launcher log in Spark Load to a separate log file for easy viewing [#4470]
- The BE configuration item streaming_load_max_batch_size_mb was renamed streaming_load_json_max_mb to make its meaning more clear [#4791]
- Adjust the default value of the FE configuration item thrift_client_timeout_ms to solve the problem of too long access to the information_schema library [#4808]
- CPU or memory sampling of BE process is supported on BE web page to facilitate performance debugging [#4632]
- Extend the data slicing balance class on the FE side, so that it can extend more balance logic [#4771]
- The reorganized OLAP_SCAN_NODE profile information makes the profile clearer and easier to read [#4825]
- Added monitoring indicators on the BE side to monitor cancelled Query Fragment [#4862]
- Reorganized the profile information of HASH_JOIN_NODE, CROSS_JOIN_NODE, UNION_NODE, ANALYTIC_EVAL_NODE to make the Profile more clear and easy to read [#4878]
- Modify the default value of query_colocate_join_memory_limit_penalty_factor to 1 to ensure that the default memory limit of the execution plan fragment is consistent with the user setting during the colocation join operation [#4895]
- Added consideration of tablet scanning frequency in the selection of compaction strategy on the BE side [#4837]
- Optimize the strategy of sending Query Fragments and reduce the number of sending public attributes to improve query plan scheduling performance [#4904]
- Optimized the accuracy of load statistics for unavailable nodes when the query scheduler is scheduling query plans [#4914]
- Add the code version information of the FE node in the result of the SHOW FRONTENDS statement [#4943]
- Support more column type conversion, such as support conversion from CHAR to numeric type, etc. [#4938]
- Import function to identify complex types in Parquet files [#4968]
- In the BE monitoring indicators, increase the monitoring of used permits and waiting permits in the compaction logic [#4893]
- Optimize the execution time of BE single test [#5131]
- Added more JVM-related monitoring items on the FE side [#5112]
- Add a session variable to control the timeout period for the transaction to take effect in the insert operation [#5170]
- Optimize the logic of selecting scan nodes for query execution plans, and consider all ScanNode nodes in a query [#4984]
- Add more system monitoring indicators for FE nodes [#5149]
- Use of VLOG in unified BE code [#5264]
BugFix
-
Fix the bug that may be caused during playback of Erase Table metadata operations [#5221]
-
Fix the problem that the BE process crashes due to the orc::TimezoneError not being caught when importing ORC format files [#4350]
-
Fix the problem that the result of the Except operator is incorrect [#4369]
-
Fix the problem that the query always route to the same BE node when querying ES data [#4352]
-
Fix the problem that the operation is not correctly persisted when setting the Global Variable [#4324]
-
Fixed the problem that the MemTracker was not constructed correctly in PushHandler which caused the BE process to crash [#4345]
-
Fix the problem of importing blank lines when importing Json data format [#4379]
-
Fix the problem that the SQL rewriting rules failed to correctly handle count distinct [#4382]
-
Fix the problem that the data model type of the materialized view is not set correctly when creating the materialized view [#4375]
-
Fix the problem of wrong query result of left semi/anti join [#4417]
-
Prioritize the join method specified by the user [#4424]
-
Fix the problem of incorrect results when Inline view is included in the Left join operation [#4279]
-
[#4362]
select database() no longer returns the cluster qualified name, and fix the problem that select user() does not display the user ip
-
Fix the problem that the number of table copies displayed by show create table is incorrect for tables that use the dynamic partition function [#4393]
-
Fix the inconsistent precision of decimal, char and varchar columns in the base table and the materialized view in the materialized view [#4436]
-
Fix the problem of wild pointer in PlanFragmentExecutor, fix the problem of null pointer when importing in json format [#4448]
-
Fixed the problem that some remaining tablet directories on BE were not cleared [#4401]
-
Fix some issues with Spark Load [#4464]
-
Fix the problem that the balance of the colocation table cannot be completed [#4471]
-
Fix MemIndex::load_segment possible memory copy exception problem [#4458]
-
Fix the problem of BE crashing when using Load Error Hub function when WITH_MYSQL compilation option is not added [#4486]
-
Fix the problem of execution error when using @@sql_mode environment variable in SQL [#4484]
-
Fix the problem of splitting the same column in Spark Load and Broker Load, and the splitting behavior is inconsistent [#4491]
-
Fix the problem of BE downtime caused by querying the information_schema.columns table [#4511]
-
Fix some issues in the persistence of rowset metadata in historical versions [#4513]
-
Fix the problem of inconsistent behavior of str_do_date() function on FE side and BE side [#4495]
-
Fixed the issue where BE was down due to some historical data conversion when performing linked schema change [#4526]
-
Fix the problem that Spark Load stays in the ETL stage after FE restart [#4528]
-
Fixed an issue that caused unreadable data when the delete condition contained “\n” [#4531]
-
Fix the problem that Spark Load job in PENDING state cannot be cancelled [#4536]
-
Fix the problem of inconsistent behavior when splitting columns between Spark Load and other import methods [#4536]
-
Fix the problem that net.sourceforge.czt.dev cannot be found when compiling the FE module [#4636]
-
Fix the problem that the statement parsing fails when the cast function exists in the case when statement [#4646]
-
Fix the problem that all queries will fail when there is a problem with the RPC of a certain BE [#4651]
-
Fixed the issue that related import transactions were not cleaned up after the BE node went down [#4661]
-
Fix the problem that the column types of the columns table of information_schema are not compatible with MySQL [#4648]
-
Fix the problem of SQL Cache access out of bounds [#4641]
-
Fix the problem that import throws a null pointer exception when there is no partition in the table [#4658]
-
Fix an error when tools/show_segment_status access external tables [#4671]
-
Fix the issue that delete on clause may not take effect in Routine Load [#4676]
-
Fix the problem that the columns of information_schema do not display comments [#4683]
-
Fix the problem that hidden columns (delete flag column, etc.) may be lost after schema change [#4686]
-
Fix the problem that the window function lag()/lead() reports an error when matching the decimal type [#4666]
-
Fix the problem that the client is stuck in high concurrency scenarios when using MySQL NIO Server [#4680]
-
Fix the problem of always reporting out of date in tablet report [#4695]
-
Fix the problem of duplicate columns in case when statement after query planning [#4693]
-
Fix the problem that the rand() function generates the same random value every time [#4709]
-
Fix the problem of query error caused by incorrect column cardinality statistics [#4678]
-
Fix the problem of BE downtime caused by function error of split_part function [#4721]
-
Fix the problem of query execution error when SQL statement contains constant subquery [#4719]
-
Fix the problem of join query error when the table contains the delete tag column [#4734]
-
Fix the problem of syntax parsing errors when the CTE statement contains nested subqueries [#4731]
-
Fix the problem of lead/lag type matching error in window function [#4732]
-
Fix the problem that tablet cannot be selected correctly when selecting tablet for compaction [#4593]
-
Fix the problem that limit conditions are incorrectly pushed down to the odbc external table and Es external table [#4764] [#4768]
-
Fix the problem that the compaction thread stops working [#4750]
-
Fix the problem that the timeout idle connection is not automatically killed in some cases [#4774]
-
Fix the problem of error when querying tables with delete flag column when SQL contains join [#4770]
-
Fix the calculation results of some time functions in FE to keep the results consistent with BE calculations [#4786]
-
Fix the issue that BE crashes when displaying tablet information on BE web page [#4775]
-
Fix the type conversion problem of the time type filter condition, so that it can be correctly converted to the corresponding event type [#4806]
-
Fixed the problem of repeatedly creating hidden columns when creating Rollup [#4816]
-
Fix the problem of hidden sequence column not displaying [#4818]
-
Fix the problem of incorrect query results of some union statements [#4807]
-
Fixed an issue where offline node tasks could not be completed in some cases [#4804]
-
Intelligently identify illegal date constants during SQL parsing to avoid query scanning all partitions [#4756]
-
Fix the problem that BE crashes when the BE side selects the tablet for compaction without locking [#4829]
-
Fix some front-end display issues and back-end cookie processing logic issues in the new version of the UI [#4830]
-
Fixed the problem that the tablet could not be found when querying errors when UNION and Colocation Join are included in SQL [#4842]
-
When submitting import tasks, the submission failed due to the full task queue, but the failure exception was not captured correctly [#4796]
-
Fix the problem of Broker Load job scheduling. Avoid the problem that some jobs cannot be scheduled after submission [#4869]
-
Just before Master FE is started, avoid forwarding commands to Master FE [#4844]
-
Ignore Parquet and ORC format empty files when importing to avoid reading errors [#4810]
-
Fix the problem that the materialized view name conflict is not checked when renaming the OLAP table [#4870]
-
Fix the problem that the creation fails when using complex SQL to create a logical view [#4840]
-
Fixed an issue where Routine Load could not end the task correctly due to reading empty messages when consuming Kafka data [#4861]
-
Fix the problem that some column names are not recognized when using CTE syntax [#4887]
-
Fix the problem that the content of the columns table of the Information_schema library is incorrect [#4858]
-
Fix the problem that BitmapValue serialization fails when only 32-bit integers are included in the implementation of BitmapValue on the FE side [#4884]
-
Fix that when calculating BE disk usage, all disk space not used by Doris in the node is incorrectly included. This will cause calculation errors during the Decommission operation [#4889]
-
Fix the problem that an additional column may be added incorrectly when only constant expressions are included in the SELECT list [#4901]
-
Fix the problem that the Thrift Server type on the FE side and the BE side are inconsistent and cause communication failure [#4908]
-
When partition cutting, ignore the filter conditions on non-columns [#4921]
-
Fix the problem that the log directory is created incorrectly in the start_fe.sh startup script [#4929]
-
Fix the problem that some NULL values are not displayed when using CTE syntax [#4932]
-
Fix the problem that Colocation Group is always in unstable state when some BE nodes are down [#4936]
-
It is forbidden to create a table in Segment V1 format [#4913]
-
Fix the problem that Bool type condition processing error when Doris queries ES data [#4990]
-
Fix a problem of Tablet Shard lock on BE side [#5000]
-
Fix the problem of ConcurrentModificationException that may appear on the FE side when deleting a table that is being imported [#5003]
-
Fix the problem of incorrect return type of str_to_date function [#5004]
-
Fix the problem that the precision of some floating point types is lost when importing Json format data [#4983]
-
Fix the problem of incomplete query results when using Union to connect multiple external tables to query [#5067]
-
Fix the problem that the query result is incorrect when the SQL contains multiple in conditions [#5072]
-
Fix a problem that the order of Profile destruction caused BE downtime [#5078]
-
Fix the problem of memory leakage when importing Json format data [#5073]
-
Fix the problem that Colocation balance logic occupies 100% CPU when there is no BE node [#5079]
-
Fix the issue that creating a new tablet may cause BE downtime [#5089]
-
Fixed the problem that the shared pointer circular reference caused the tablet to be unable to be cleared and occupied disk space [#5100]
-
Fix the issue that the BE will crash when the is null condition is included in the delete condition [#5109]
-
Fix a problem with Partition Cache hit strategy [#5060]
-
Optimize the strategy of Spark Load to read Hive tables to avoid full scanning of Hive tables [#5047]
-
Added support for Ninjia build system to speed up the compilation speed of BE [#5076]
-
Optimize the efficiency of importing data in Json format [#5055]
-
Support FE to directly use thrift protocol to transmit heartbeat information to avoid heartbeat blocking failure that may be caused by http communication model [#5027]
-
Simplify the opening logic of the dynamic partition function, and prohibit hourly partitioning for date type columns [#5043]
-
Support to view Broker Load Profile through FE Web page [#5052]
-
When viewing Resource information, clear text password is no longer displayed [#5088]
-
The BE side adds trace information for tablet creation to help locate the problem of slow tablet creation [#5091]
-
Fix the issue that may cause data loss when Routine Load consumes Kafka data in some cases [#5093]
-
Fix the problem that desc statement to view all materialized views may return Malformed packet [#5115]
-
Fix the issue that may cause BE to crash when BE starts loading the data directory [#5113]
-
Fix the problem that non-Master FE repeatedly sends non-query requests to Master FE [#5160]
-
Fix the problem of partition cache hit logic error [#5065]
-
Fixed an error when bucket join was executed on an empty table [#5145]
-
Fix the problem that the percentile_approx function returns the wrong result [#5172]
-
Fix the problem of the calling sequence of Olap Scanner thread ending [5111]
-
Fixed an error when creating the colocation attribute for an empty partitioned table [#5139]
-
Fixed an error when querying materialized views in CTE statement [#5165]
-
Fix the problem that the min max function does not handle the null value of string type column correctly [#5189]
-
Modify the string encoding in Spark-Doris-Connector to utf8 [#5202]
-
Fix the problem that delete column may be added repeatedly in routine load [#5222]
-
Fix bucket shuffle join bug [#5228]
-
Fix the issue that the ALTER ROUTINE LOAD operation is invalid for some parameters [#5257]
-
Fixed an issue where metadata signatures of different tables may be the same during backup and recovery operations [#5254]
-
Fix the problem that Colocate Join and Buckets shuffle join may cause data to be scanned repeatedly [#5256]
-
Fix the issue of metadata errors caused by unchecked log id when FE pushes metadata [#5219]
-
Fix the problem of error in aggregate query processing -0.0 [#5226]
-
Fix outer join query error [#5285]
Other
-
Add some non-Apache protocol code protocol declarations to the NOTICE file [#4831]
-
Reformatted the code of BE using clang-format [#4965]
-
Added clang-format checking and formatting scripts to unify the C++ code style of BE before submission [#4934]
-
The third-party library adds the AWS S3 SDK, which can be used to directly read the data in the object storage through the SDK [#5234]
-
Fixed some issues related to License: [#4371]
-
The dependencies of the two third-party libraries, MySQL client and LZO, will no longer be enabled in the default compilation options. If users need MySQL external table function, they need to turn it on
-
Removed the js and css code in the code library and introduced it in the form of a third-party library dependency
-
-
Updated the Docker development environment image build-env-1.2
-
Updated the compilation method of the UnixODBC tripartite library, so that the BE process no longer depends on the libltdl.so dynamic library of the system when it is running
-
Added third-party UDF to support more efficient set calculation of orthogonal bitmap data [#4198]
-
Added UnixODBC third-party library dependency to support ODBC external table function [#4377]
API Change
- Prohibit the creation of segment v1 tables [#4913]
- Rename the configuration item
streaming_load_max_batch_size_mb
tostreaming_load_json_max_mb
[#4791] - Support column reference passing in column definition of load statement [#5140]
- Support creating indexes on the value column of unique table [#5305]
- Support atomic replacement of two tables through replace statement [#4669]
- Support CREATE TABLE LIKE statement
Credits
924060929 acelyc111 Astralidea benbiti blueChild caiconghui caoyang10 ccoffline coalchan Dam1029 e0c9 EmmyMiao87 gengjun-git HangyuanLiu HappenLee hffariel jollykingCN kangkaisen killxdcj lihuigang liutang123 luozenglin marising mengqinghuan morningman nimuyuhan Nivane pengxiangyu px-l qidaye sduzh Skysheepwang songchuangyuan stalary stdpain Sunt-ing vagetablechicken vergilchiu wangbo wangxiaobaidu11 weizuo93 WingsGo wutiangan wuyunfeng xinghuayu007 xinyiZzz Xpray xy720 yangzhg Youngwb yxqweasd zh0122 ZhangYu0123 zhaojintaozhao xxiao2018 bookeezhou JNSimba killxdcj yuliangwan
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (4 by maintainers)
自从 0.13发布以来, Apache Doris (incubating) 已经有超过60个contributor贡献了 390 多个新特性, bug fixes, 性能优化, 文档, 代码重构 等相关的PR. 因此我们准备发布 Apache Doris (incubating) 0.14版本.
New Feature
导入和删除
支持通过导入方式一次性删除多条数据,避免多次删除造成的性能下降,针对UniqueKey模型的表,支持在导入时指定Sequence列Doris将按照Sequence列的值判断数据的先后关系,保证数据在导入时的有序性
[#4310] [#4256]
支持数据库整库备份
备份stmt中的支持指定备份内容(元数据和数据)。 支持排除备份和还原stmt中的某些表。 在备份整个数据库时,可以排除一些非常大且不重要的表。 支持备份和还原整个数据库,而不是在备份和还原语句中声明每个表名。
[#5314]
ODBC 外部表支持
支持通过 ODBC 协议访问 MySQL,postgresql,Oracle 等外部表
[#4798] [#4438] [#4559] [#4699]
支持SQL级别和Partition级别的结果Cache
支持在缓存查询结果,用来提高重复查询的效率,支持SQL级别和Partition级别的结果Cache [#4330]
内置函数
FE 界面和HTTP 接口
新的FE UI界面,可以通过设置FE配置项enable_http_server_v2开启使用 [#4684]
BE增加一个http接口,用于展示一个分区下所有tablet在一个BE上不同磁盘间的分布情况 [#5096]
BE增加一个http接口,用于手动将一个tablet迁移到同节点的其他磁盘 [#5101]
支持通过http修改FE和BE的配置项,并且持久化这些修改 [#4704]
与MySQL 兼容性
监控
BE新增Tablet级别的监控指标,包括扫描数据量和行数、写入数据量和行数,帮助定位热tablet [#4428]
BE新增metrics用于查看各种LRU cache的使用情况 [#4688]
建表相关
支持跳过broker 直接使用S3协议 导入导出 备份恢复 [#5399]
其他
支持Select语句中添加SET_VAR类型的Optimizer Hints来设定会话变量 [#4504]
支持通过使用空tablet填补的方式修复已损坏的tablet [#4255]
支持Bucket Shuffle Join功能(当Join条件列是表分桶列的子集时,右表会Shuffle到左表数据所在节点上,可以显著减少Shuffle Join带来的网络开销,提升查询速度) [#4677]
支持通过cancel load语句批量取消导入任务 [#4515]
增加一个Session变量用于设置是否允许分区列为NULL [#5013]
支持TopN聚合函数 [#4803]
支持一种新的基于分区分桶数量的数据均衡逻辑 [#5010]
支持在unique 表 value 列上创建索引 [#5305]
Enhancement
性能改进
功能改进
其他
BugFix
修复在回放Erase Table元数据操作时可能引起的bug [#5221]
修复导入ORC格式文件时,未捕获orc::TimezoneError导致BE进程crash的问题 [#4350]
修复Except算子结果不正确的问题 [#4369]
修复查询ES数据时,查询总是路由到相同BE节点的问题 [#4352]
修复设置全局变量(Global Variable)时,操作没有被正确的持久化的问题 [#4324]
修复了PushHandler中,MemTracker没有被正确构造导致BE进程Crash的问题 [#4345]
修复导入Json数据格式时,可能出现的导入空行的问题 [#4379]
修复SQL改写规则未能正确处理 count distinct的问题 [#4382]
修复创建物化视图时,物化视图的数据模型类型未能正确设置的问题 [#4375]
修复left semi/anti join查询结果错误的问题 [#4417]
优先使用用户指定的方式进行 join [#4424]
修复Left join 操作中包含Inline view 时,结果不正确的问题 [#4279]
[#4362]
select database() 不再返回 cluster 限定名,同时修复 select user() 不显示用户ip的问题
修复对于使用动态分区功能的表,show create table 显示的表副本数不正确的问题 [#4393]
修复物化视图中,base表和物化视图的decimal、char、varchar 列精度不一致的问题 [#4436]
修复 PlanFragmentExecutor 中的野指针问题, 修复 json 格式导入时空指针的问题 [#4448]
修复BE上一些残留的tablet目录未被清除的问题 [#4401]
修复 Spark Load 的若干问题 [#4464]
修复对于colocation 表的均衡无法完成的问题 [#4471]
修复 MemIndex::load_segment 可能出现的内存拷贝异常问题 [#4458]
修复在未添加 WITH_MYSQL 编译选项时,使用Load Error Hub功能导致BE宕机的问题 [#4486]
修复在SQL中使用@@sql_mode环境变量时,执行报错的问题 [#4484]
修复Spark Load和Broker Load中对相同的列分割,分割行为不一致的问题 [#4491]
修复查询information_schema.columns表导致BE宕机的问题 [#4511]
修复历史版本的rowset元数据持久化方面的一些问题 [#4513]
修复str_do_date()函数在FE端和BE端行为不一致的问题 [#4495]
修复在执行linked schema change时,一些历史数据转换导致BE宕机的问题 [#4526]
修复在FE重启后,Spark Load一直停留在ETL阶段的问题 [#4528]
修复当delete条件中包含”\n”时,导致数据不可读的问题 [#4531]
修复无法取消处于PENDING状态的Spark Load作业的问题 [#4536]
修复Spark Load与其他导入方式,在分割列时的行为不一致的问题 [#4536]
修复在编译FE模块时,无法找到net.sourceforge.czt.dev依赖的问题 [#4636]
修复在case when语句中存在cast函数是,语句解析失败的问题 [#4646]
修复当某个BE的RPC出现问题,所有查询都会失败的问题 [#4651]
修复当BE节点宕机后,相关的导入事务没有清理的问题 [#4661]
修复information_schema的columns表的列类型和MySQL不兼容的问题 [#4648]
修复SQL Cache访问越界的问题 [#4641]
修复当表中没有分区时,导入抛出空指针异常的问题 [#4658]
修复tools/show_segment_status工具访问外部表时出错的问题 [#4671]
修复delete on子句在Routine Load中可能不生效的问题 [#4676]
修复information_schema的columns不显示comment的问题 [#4683]
修复隐藏列(delete flag column等)可能在进行schema change后丢失的问题 [#4686]
修复窗口函数lag()/lead()在匹配decimal类型时报错的问题 [#4666]
修复在使用MySQL NIO Server时,在高并发场景下客户端夯住的问题 [#4680]
修复在tablet report总是报错out of date的问题 [#4695]
修复case when语句在查询规划后产生重复列的问题 [#4693]
修复rand()函数每次都产生相同的随机值的问题 [#4709]
修复因列基数统计信息错误导致的查询报错的问题 [#4678]
修复split_part函数功能错误导致BE宕机的问题 [#4721]
修复SQL语句中包含常量子查询时,查询执行报错的问题 [#4719]
修复当表中包含delete标记列时,join查询出错的问题 [#4734]
修复CTE语句中包含嵌套子查询时,语法解析出错的问题 [#4731]
修复窗口函数lead/lag类型匹配错误的问题 [#4732]
修复在选择tablet进行compaction时,无法正确选择tablet的问题 [#4593]
修复limit条件被错误的下推到odbc外部表和Es外部表的问题 [#4764] [#4768]
修复compaction线程停止工作的问题 [#4750]
修复在某些情况下,超时的空闲连接没有被自动杀掉的问题 [#4774]
修复当SQL中包含join时,查询带有 delete flag列的表报错的问题 [#4770]
修复部分时间函数的在FE计算的结果,以保持和BE计算结果一致 [#4786]
修复BE的web页面在展示tablet信息时可能导致BE宕机的问题 [#4775]
修复时间类型过滤条件的类型转换问题,使其能够正确的转换为对应的事件类型 [#4806]
修复在创建Rollup时,可能会出现重复创建隐藏列的问题 [#4816]
修复sequence column隐藏列不显示的问题 [#4818]
修复部分union语句查询结果错误的问题 [#4807]
修复部分情况下,下线节点任务无法完成的问题 [#4804]
SQL解析时智能识别非法的日期常量,以避免查询扫描所有分区 [#4756]
修复BE端挑选tablet进行compaction时,没有加锁导致BE宕机的问题 [#4829]
修复新版本UI的一些前端显示问题和后端Cookie处理逻辑问题 [#4830]
修复SQL中包含UNION和Colocation Join时,查询报错tablet找不到的问题 [#4842]
修复当提交导入任务时,因任务队列已满导致提交失败,但失败异常没有被正确捕获的问题 [#4796]
修复Broker Load作业调度方面的问题。避免部分作业提交后无法被调度的问题 [#4869]
刚Master FE没有启动完成前,避免将命令转发到Master FE [#4844]
导入时忽略Parquet,ORC格式的空文件,避免读取错误 [#4810]
修复重命名OLAP表时,未检查物化视图名称是否冲突的问题 [#4870]
修复使用复杂SQL创建逻辑视图时,创建失败的问题 [#4840]
修复Routine Load在消费Kafka数据时,因读取到空消息导致任务无法正确结束的问题 [#4861]
修复使用CTE语法时,部分列名不识别的问题 [#4887]
修复Information_schema库的columns表内容错误的问题 [#4858]
修复FE端BitmapValue实现中,当仅包含32位整型时,BitmapValue序列化失败的问题 [#4884]
修复在计算BE磁盘使用量时,错误的包含了节点中所有非Doris使用的磁盘空间。这会导致在进行Decommission操作时计算出现错误 [#4889]
修复当SELECT列表中仅包含常量表达式时,可能会错误的额外添加一个列的问题 [#4901]
修复FE端和BE端Thrift Server类型不一致导致无法通讯的问题 [#4908]
分区裁剪时,忽略非列上的过滤条件 [#4921]
修复start_fe.sh启动脚本中,创建log目录错误的问题 [#4929]
修复使用CTE语法时,部分NULL值不显示的问题 [#4932]
修复在部分BE节点宕机情况下,Colocation Group一直处于unstable状态的问题 [#4936]
禁止创建Segment V1 格式的表 [#4913]
修复Doris查询ES数据时,Bool类型条件处理错误的问题 [#4990]
修复一个BE端Tablet Shard锁的问题 [#5000]
修复当删除一个正在进行导入的表时,FE端可能出现的ConcurrentModificationException的问题 [#5003]
修复str_to_date函数返回类型不正确的问题 [#5004]
修复导入Json格式数据时,某些浮点类型精度丢失的问题 [#4983]
修复使用Union连接多张外部表查询时,查询结果不完整的问题 [#5067]
修复SQL中包含多个 in 条件时,查询结果不正确的问题 [#5072]
修复一个Profile析构顺序导致BE宕机的问题 [#5078]
修复导入Json格式数据存在内存泄露的问题 [#5073]
修复当没有BE节点时,Colocation均衡逻辑占用100%CPU的问题 [#5079]
修复创建新tablet可能导致BE宕机的问题 [#5089]
修复共享指针循环引用导致tablet无法被清除,占用磁盘空间的问题 [#5100]
修复当delete条件中包含is null 条件时,会导致BE宕机的问题 [#5109]
修复Partition Cache命中策略的一个问题 [#5060]
优化Spark Load读取Hive表的策略,避免全量扫描Hive表 [#5047]
新增对Ninjia构建系统的支持,加速BE端的编译速度 [#5076]
优化Json格式数据导入的效率 [#5055]
支持FE直接采用thrift协议传递心跳信息,以避免http通讯模型可能导致的心跳阻塞失败的问题 [#5027]
简化动态分区功能的开启逻辑,并禁止对date类型的列做小时级别的分区 [#5043]
支持通过FE Web页面查看Broker Load的Profile [#5052]
查看Resource信息时,不再显示明文密码 [#5088]
BE端增加创建tablet的的trace信息,帮助定位tablet创建慢的问题 [#5091]
修复在某些情况下,Routine Load 消费kafka数据时,可能会导致数据丢失的问题 [#5093]
修复desc语句查看所有物化视图,可能返回 Malformed packet 的问题 [#5115]
修复BE启动加载数据目录时,可能导致BE宕机的问题 [#5113]
修复非Master FE重复发送非query请求到Master FE的问题 [#5160]
修复partition cache命中逻辑错误的问题 [#5065]
修复对空表执行bucket join 时报错的问题 [#5145]
修复percentile_approx函数返回结果错误的问题 [#5172]
修复Olap Scanner 线程结束调用顺序的问题 [5111]
修复对空分区的表创建colocation属性时报错的问题 [#5139]
修复CTE语句中查询物化视图时报错的问题 [#5165]
修复min max 函数对string 类型的列的null值处理不正确的问题 [#5189]
将Spark-Doris-Connector 中字符串编码修改为 utf8 [#5202]
修复在routine load中可能重复添加delete列的问题 [#5222]
修复bucket shuffle join的bug [#5228]
修复 ALTER ROUTINE LOAD 操作对部分参数无效的问题 [#5257]
修复在备份恢复操作中,不同Table的元数据签名可能相同的问题 [#5254]
修复Colocate Join 和 Buckets shuffle join 可能导致数据被重复扫描的问题 [#5256]
修复FE推送元数据时,未检查日志id导致元数据错误的问题 [#5219]
修复聚合查询处理-0.0有误的问题 [#5226]
修复outer join查询错误 [#5285]
Other
将一些非Apache协议的代码协议声明添加至NOTICE文件 [#4831]
使用clang-format重新格式化了BE的代码 [#4965]
添加了clang-format检查和格式化脚本,用于在提交前统一BE的C++代码风格 [#4934]
第三方库新增 AWS S3 SDK,用于之后通过sdk直接读取对象存储中的数据 [#5234]
修复了License相关的一些问题:[#4371]
默认编译选项中将不再开启MySQL client 和 LZO 两个第三方库的依赖。如果用户需要 MySQL 外表功能,需自行开启
移除了代码库中的 js 和 css 代码,采用第三方库依赖的形式引入
更新了Docker开发环境镜像build-env-1.2
更新了UnixODBC三方库的编译方式,使得BE进程在运行时不在依赖系统的libltdl.so动态库
新增第三方UDF,支持对正交的bitmap数据进行更高效的集合计算 [#4198]
新增UnixODBC第三方库依赖,用于支持ODBC外部表功能 [#4377]
API Change
streaming_load_max_batch_size_mb
tostreaming_load_json_max_mb
[#4791]Credits
924060929 acelyc111 Astralidea benbiti blueChild caiconghui caoyang10 ccoffline coalchan Dam1029 e0c9 EmmyMiao87 gengjun-git HangyuanLiu HappenLee hffariel jollykingCN kangkaisen killxdcj lihuigang liutang123 luozenglin marising mengqinghuan morningman nimuyuhan Nivane pengxiangyu px-l qidaye sduzh Skysheepwang songchuangyuan stalary stdpain Sunt-ing vagetablechicken vergilchiu wangbo wangxiaobaidu11 weizuo93 WingsGo wutiangan wuyunfeng xinghuayu007 xinyiZzz Xpray xy720 yangzhg Youngwb yxqweasd zh0122 ZhangYu0123 zhaojintaozhao xxiao2018 bookeezhou JNSimba killxdcj yuliangwan