Question: best method to detect & record changes to existing data
See original GitHub issueI would like to ask a question of the embulk community. Is there an email list? Here is my question:
I am running a schedule task ETL from MySQL to Redshift. I re-create the tables each time (no update). During the task, before deleting the old data, I want to detect which cells (row+column) that had been written from the previous ETL had data that changed. I want to record those changes to a new table. (That will help us with our temporal prediction task.)
My question is: are there tools to do this within embulk? If not, what’s the correct place to create such code? It doesn’t seem to fit the filter
nor output
logic cleanly. I would also like to avoid reading the input data twice, as it can be quite slow (even though it’s just millions of rows).
Any ideas? Thank you in advance…
Issue Analytics
- State:
- Created 6 years ago
- Comments:13 (5 by maintainers)
Top GitHub Comments
@hiroyuki-sato Thank you for your suggestions. I hope to try these out over the weekend! I will let you know how it goes 😃
That’s great!!
It thought that it is better to use ETL software (ex. Digdag) in your
after_load
part. I’ve never thought to useafter_load
for that case, But It may be OK.alter table
instead ofCREATE TABLE schema.table
.DROP TABLE schema.table___new__;
part because the replace modedrop
target table.