question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Question: best method to detect & record changes to existing data

See original GitHub issue

I would like to ask a question of the embulk community. Is there an email list? Here is my question:

I am running a schedule task ETL from MySQL to Redshift. I re-create the tables each time (no update). During the task, before deleting the old data, I want to detect which cells (row+column) that had been written from the previous ETL had data that changed. I want to record those changes to a new table. (That will help us with our temporal prediction task.)

My question is: are there tools to do this within embulk? If not, what’s the correct place to create such code? It doesn’t seem to fit the filter nor output logic cleanly. I would also like to avoid reading the input data twice, as it can be quite slow (even though it’s just millions of rows).

Any ideas? Thank you in advance…

Issue Analytics

  • State:open
  • Created 6 years ago
  • Comments:13 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
bcipollicommented, Mar 29, 2017

@hiroyuki-sato Thank you for your suggestions. I hope to try these out over the weekend! I will let you know how it goes 😃

0reactions
hiroyuki-satocommented, Apr 10, 2017

That’s great!!

It thought that it is better to use ETL software (ex. Digdag) in your after_load part. I’ve never thought to use after_load for that case, But It may be OK.

  • I have some comments.
    • Maybe you can use alter table instead of CREATE TABLE schema.table.
    • You don’t need DROP TABLE schema.table___new__; part because the replace mode drop target table.
Read more comments on GitHub >

github_iconTop Results From Across the Web

How to track data changes in a database table - Stack Overflow
At the basic database level you can track changes by having a separate table that gets an entry added to it via triggers...
Read more >
How to track changes in SQL Server
The first available option in SQL server for tracking the changes are the After Insert, After Update and After Delete triggers.
Read more >
Track Data Changes - SQL Server | Microsoft Learn
In this article ... SQL Server provides two features that track changes to data in a database: change data capture and change tracking....
Read more >
Detecting changes in a SQL Server table - DBA Stack Exchange
This will not scale well. A lighter alternative to Change Data Capture is Change Tracking. It will not tell you what values changed,...
Read more >
Question: Usage of Find Changes? - Boomi Community
Typically the Find Changes shape (or generically, "change data capture" approach) is sort of a last resort and is applicable if 1) the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found