Running a group by query produces results with blank primary keys
See original GitHub issueCrateDB version: 2.3.11
Environment description:
- JVM version: 1.8.0_181
- Kernel: Linux 4.4.38
- Distribution: Ubuntu 16.04
- Number of nodes: 3
Problem description: Running a group by query produces results with blank primary keys. That is, I get several results which are valid, and one row with a blank primary key result, which seems to affect the rest of the results values.
I’ve hit this issue before, and previously running a pointless update query fixed it. I also tried changing the replica count, and running an optimize query. I then checked the error log on the master node, which is full of errors (I have attached the whole log as I can’t understand a lot of the errors to know what is relevant), and noticed that the health of the cluster keeps switching between yellow and green with one table saying it has underreplicated shards occasionally but not underreplicated records.
This issue, and issues like it, seem to occur when one or more of the cluster nodes are restarted or go down, as they did recently.
Steps to reproduce: Here’s the query I ran:
SELECT armada.wind_turbine_data_daily.device_uuid AS armada_wind_turbine_data_daily_device_uuid, sum(armada.wind_turbine_data_daily.energy) AS energy, avg(armada.wind_turbine_data_daily.wind_speed) AS wind_speed, sum(armada.wind_turbine_data_daily.availability * armada.wind_turbine_data_daily.samples) / sum(armada.wind_turbine_data_daily.samples) AS availability
FROM armada.wind_turbine_data_daily
WHERE armada.wind_turbine_data_daily.timestamp >= '2018-09-01' AND armada.wind_turbine_data_daily.timestamp <= '2018-09-30' GROUP BY armada.wind_turbine_data_daily.device_uuid ORDER BY device_uuid limit 1000;
Here’s the table schema:
CREATE TABLE IF NOT EXISTS "armada"."wind_turbine_data_daily" (
"activity" FLOAT,
"availability" FLOAT,
"created_at" TIMESTAMP,
"device_uuid" STRING,
"direction_wind" FLOAT,
"energy" FLOAT,
"energy_cumulative" FLOAT,
"interval_duration" INTEGER,
"power_active" FLOAT,
"power_active_filtered_sum" FLOAT,
"power_active_sum" FLOAT,
"power_theoretical_filtered_sum" FLOAT,
"power_theoretical_sum" FLOAT,
"rpm_generator" FLOAT,
"rpm_rotor" FLOAT,
"samples" INTEGER,
"seconds_observed" INTEGER,
"status" STRING,
"timestamp" TIMESTAMP,
"updated_at" TIMESTAMP,
"wind_speed" FLOAT,
"wind_speed_max" FLOAT,
PRIMARY KEY ("timestamp", "device_uuid")
)
CLUSTERED INTO 4 SHARDS
WITH (
"allocation.max_retries" = 5,
"blocks.metadata" = false,
"blocks.read" = false,
"blocks.read_only" = false,
"blocks.write" = false,
column_policy = 'dynamic',
"mapping.total_fields.limit" = 1000,
number_of_replicas = '0-1',
"recovery.initial_shards" = 'quorum',
refresh_interval = 1000,
"routing.allocation.enable" = 'all',
"routing.allocation.total_shards_per_node" = -1,
"translog.durability" = 'REQUEST',
"translog.flush_threshold_size" = 536870912,
"translog.sync_interval" = 5000,
"unassigned.node_left.delayed_timeout" = 60000,
"warmer.enabled" = true,
"write.wait_for_active_shards" = 'all'
)
Here’s the update query I ran that previously has fixed issues like this:
UPDATE armada.wind_turbine_data_daily SET seconds_observed = seconds_observed;
What I did with replicas was:
alter table armada.wind_turbine_data_daily set (number_of_replicas='0');
and then
alter table armada.wind_turbine_data_daily set (number_of_replicas='0-1');
Issue Analytics
- State:
- Created 5 years ago
- Comments:9 (2 by maintainers)
Top GitHub Comments
Yeah, so the cluster machines went down, and for a time only one cluster node was up. Nothing should have been able to write to the individual node, although attempts might have been made most errored because of the state of the cluster.
Something interesting does seem to have happened - some data exists in the table I’m querying with blank primary key (device_uuid in this case) values - removing those rows fixes it, but in a very similar case in a different table there are no such blank primary key rows to delete.
I have also tried recreating the table (by renaming the table, making a new one without more than the bare minimum of the create table statement, copying the data from the old renamed table), and that does not seem to have fixed the issue.