dashboard locked if `api/v1/viz` DELETE requests are sent on table involved in batch sql jobs
See original GitHub issueContext
I get 504 gateway timeouts (and a frozen dashboard) when a delete table request is applied on a table involved in a batch sql operation.
Steps to Reproduce
- Have a table with ~1M rows (download one here)
- Run a batch operation, then try to delete the table. Here’s the operations in Python:
from carto.auth import APIKeyAuthClient
from carto.sql import BatchSQLClient
auth = APIKeyAuthClient('https://eschbacher.carto.com/',
'my api key')
# table with 950000 rows
table = 'batch_sql_viz_api_lock'
# update geometry from columns `lat` and `lng`
BatchSQLClient(auth).create([
"UPDATE {table} SET the_geom = cdb_latlng(lat, lng)".format(table=table)
])
auth.send('api/v1/viz/{table}'.format(table=table),
http_method='DELETE')
Current Result
Dashboard is frozen until the Batch SQL job completes. Map and dataset pages also cannot be loaded.
Expected result
Batch jobs and requests to delete tables should not freeze user account.
Browser and version
Chrome 61.0.3163.91 (Official Build) (64-bit) macOS 10.12.5
.carto file
None, but you can get a dataset to test here: https://eschbacher.carto.com/api/v2/sql?q=select+*+from+batch_sql_viz_api_lock_copy&format=csv&filename= batch_sql_viz_api_lock
Additional info
Discovered while developing cartoframes
Issue Analytics
- State:
- Created 6 years ago
- Comments:22 (19 by maintainers)
Top Results From Across the Web
Understand and resolve SQL Server blocking problems
For INSERT, UPDATE, and DELETE statements, the locks are held during the query, both for data consistency and to allow the query to...
Read more >The best ways to use SQL DELETE Statement in a SQL table
There are several best practices to consider when using a SQL delete statement to remove data from a SQL table. Learn how to...
Read more >Reporting and alerting on job failure in SQL Server
This table can be queried to determine how many jobs exist on a server or to search based on a specific string in...
Read more >How to find out what is locking my tables? - Stack Overflow
sp_who; sp_lock. Also, in SSMS, you can view locks and processes in different ways: enter image description here. Different versions of SSMS ...
Read more >SQL Performance Best Practices | CockroachDB Docs
For more information, see Batch delete expired data with Row-Level TTL. Assign column families. A column family is a group of columns in...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@rafatower We would not only need to apply
lock_timeout
to the delete, but also to other queries that might get locked, like other ALTERs, cartodbfication and some more. Still, it should be a reduced number, so it could be ok. Two comments:SET lock_timeout
because of pgbouncer (might get applied to other sessions). However, TIL aboutSET LOCAL
which sets only for a transaction, which could be a good solution.lock_timeout
at DB level could make sense in our case. There is a subtle trick you’re missing here. We use thepostgres
user to doDROP TABLE
, which is overriden to havestatement_timeout = 0
(and so, it never gives up). However, it does not have an override forlock_timeout
, so if we set at a per-DB level, it would use that.That’s more or less why I proposed setting it at database level, since it would help for all cases, including Rails, but also things like Batch SQL API and analyses (which also skip timeouts). We have had problems with those components in the past with competing analyses or user deletion i.e: this is not exclusive of Rails, you can trigger a similar situation just by not being careful using Batch API. Although, most cases are from Rails, since it’s the main user of direct
postgres
user connections.I agree that setting a timeout in the most problematic Rails queries is a solution to this particular case, but I still think setting a global
lock_timeout
could be beneficial for other cases we have not yet pinpointed as clearly as this one.Of course, the dashboard is still going to break if there is something locked at that point (long transaction with an exclusive lock), so we are only talking about mitigation by trying to avoid such long locks by limiting waitign queries.
In summary:
lock_timeout
could be a good idea as a safety measure.SET LOCAL lock_timeout = 5
or something like that.Bonus: we may want to consider setting the
lock_timeout
in the db size function in the extension. That would also have helped here (the table would be locked, but the dashboad would still work).And fixed in production:
https://gist.github.com/rafatower/e80b159d0fd66ccd6e7d573470c18604
Obviously the
Could not delete canonical viz
is because of the lock timeout, and it is logged as such in rollbar: https://rollbar.com/carto/CartoDB/items/36087/