Empty partitions are left behind after `DELETE FROM`
See original GitHub issueCrateDB version
4.7.0
CrateDB setup information
Single node; docker
Steps to Reproduce
Create table:
create table test (
ts timestamp,
ts_day generated always as date_trunc('day', ts),
value int)
partitioned by (ts_day);
Insert sample data:
insert into test (ts, value) values ('2022-02-21T00:00', 1), ('2022-02-22T00:00', 2), ('2022-02-23T00:00', 3);
Delete based on ts
column:
delete from test where ts <= '2022-02-22T12:00';
Expected Result
1 Partition (ts_day=1645574400000
) with 1 record
Actual Result
3 Partitions:
ts_day=1645401600000
with zero recordsts_day=1645488000000
with zero recordsts_day=1645574400000
with 1 records
Working query
delete from test where ts_day <= '2022-02-22T12:00';
This query drops partitions as well.
I would have expected that optimzier can also infer from first query (with WHERE ts...
) that full partition can be dropped.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:2
- Comments:10 (5 by maintainers)
Top Results From Across the Web
Empty Partitions - Microsoft Community
So my strange question is this: how do you reset these empty partitions so that you can use the disk space without formatting...
Read more >Hive delete partitions are broken when presto inserts into ...
Hive delete partitions are broken when presto inserts into them first. Presto's S3 driver does not erase HDFS S3A FakeDir blobs when inserting...
Read more >Which empty disk partitions can be safely deleted (and there ...
Yes. Delete each empty partition, starting with the one to the right of C, then expand C to the now unallocated space. the...
Read more >"Partition is not empty" error when trying to delete partition - VOX
I have a vault store, dedicated database and partition which I want to delete. The data is test only. On the Centera side,...
Read more >Drop empty Impala partitions - Stack Overflow
1 Answer 1 ... Found a workaround through HIVE. By issuing MSCK REPAIR TABLE tablename SYNC PARTITIONS then refreshing the table in impala,...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thanks @mfussenegger . So, I tried the first suggestion but could not find the debug configuration
Crate
. But that’s ok, because the second option worked with one minor adjustment: we used this insteadexport JAVA_OPTS='-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=0.0.0.0:5005'
, then attached. Making progress now.When the where clause is containing a variable which participates in a generated column which is part of the
partition by
expression, then we add “translate” the filter comparison using the generating expression and then we add it with anAND
to the original where filter: i.e.:the DELETE becomes:
Then, in the
WhereClauseAnalyzer#resolvePartitions()
, we check this new where query against all partitions. In our example: the table has 3 partitions with values 2, 3 and 4 (for thexv
): For partition2
the query normalizes tox = 1 AND true
=>x = 1
For partition3
the query normalizes tox = 1 AND false
=>false
For partition4
the query normalizes tox = 1 AND false
=>false
Therefore, we end up with a query running on partition2
(but not deleting it, only the relevant doc) and the other 2 partitions are not visited.OR
instead ofAND
:Then: For partition
2
the query normalizes tox = 1 OR true
=>true
For partition3
the query normalizes tox = 1 OR false
=>x = 1
For partition4
the query normalizes tox = 1 OR false
=>x = 1
In turn because of the
WhereClauseAnalyzer#tieBreakPartitionQueries()
code we end up with a map with 2 entries, and we cannot optimize, and we run the whole query:(x = 1) OR (xv AS (x + 1) = 2)
on all 3 partitionsThen somehow it seems to work in this case but it’s not the correct solution, because we have lost the original query and instead we need to run something more complex to match docs, Keep in mind that
WhereClauseAnalyzer
works also for selects.Solution:
DELETE
: if a partition matches toTRUE
then we can add it to the list of the partitions to completely remove, if it results in acanMatch
then we add it to a list of partitions for which we run the original query and delete docs. ForSELECT statements we can use the translated query and add all partitions with
TRUEand
canMatch` to the list and for those run the original query to select docs.