question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

It is actually compatible with unique constraint

See original GitHub issue

Hi, ThibTrip. Although in the usage instruction you indicate that pangres.upsert function works by primary key, which must be set as the index of DataFrame. And you also wrote ‘we don’t want autoincremented PK’ in the examples. But I found that it is actually compatible with unique constraint and an autoincremented PK (at least in MySQL 5.7).

Please take a look.

-- Here the `row_id` is the auto-incremented primary key
-- `order_id` and `product_id` make up of the unique constraint
-- let's say a single order can have more than one kind of product

CREATE TABLE `order_info` (
  `row_id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'auto_incremented_ID',
  `order_id` varchar(5) NOT NULL DEFAULT '-9999' COMMENT 'order_id',
  `product_id` varchar(5) NOT NULL DEFAULT '-9999' COMMENT 'product_id',
  `qty` int(11) DEFAULT NULL COMMENT 'purchase_quantity',
  `refund_qty` int(11) DEFAULT NULL COMMENT 'refund_quantity',
  `update_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'last_update_time',
  `create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'first_create_time',
  PRIMARY KEY (`row_id`),
  UNIQUE KEY `main` (`order_id`,`product_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Order Info'

Insert the origin values first.

order_id product_id qty refund_qty
A0001 PD100 10 0
A0002 PD200 20 0
A0002 PD201 22 0
import pangres

old_data = {'order_id': ['A0001', 'A0002', 'A0002'],
            'product_id': ['PD100', 'PD200', 'PD201'],
            'qty': [10, 20, 22],
            'refund_qty': [0, 0, 0]}
old_df = pd.DataFrame(old_data)
old_df = old_df.set_index(['order_id', 'product_id'])

# lets suppose engine has been defined somehow

pangres.upsert(engine=engine,
               df=old_df,
               table_name='order_info',
               if_row_exists='update')

Then we get

row_id order_id product_id qty refund_qty update_time create_time
1 A0001 PD100 10 0 2020-06-28 20:31:04 2020-06-28 20:31:04
2 A0002 PD200 20 0 2020-06-28 20:31:04 2020-06-28 20:31:04
3 A0002 PD201 22 0 2020-06-28 20:31:04 2020-06-28 20:31:04

Next, upsert the new df as below:

order_id product_id qty refund_qty
A0001 PD100 10 0
A0002 PD200 20 0
A0002 PD201 22 2
A0003 PD300 30 0
new_data = {'order_id': ['A0001', 'A0002', 'A0002', 'A0003'],
            'product_id': ['PD100', 'PD200', 'PD201', 'PD300'],
            'qty': [10, 20, 22, 30],
            'refund_qty': [0, 0, 2, 0]}
new_df = pd.DataFrame(new_data)
new_df = new_df.set_index(['order_id', 'product_id'])

pangres.upsert(engine=engine,
               df=new_df,
               table_name='order_info',
               if_row_exists='update')

The result is completely as expected!

row_id order_id product_id qty refund_qty update_time create_time
1 A0001 PD100 10 0 2020-06-28 20:31:04 2020-06-28 20:31:04
2 A0002 PD200 20 0 2020-06-28 20:31:04 2020-06-28 20:31:04
3 A0002 PD201 22 2 2020-06-28 20:37:13 2020-06-28 20:31:04
4 A0003 PD300 30 0 2020-06-28 20:37:13 2020-06-28 20:37:13

The update_time field only changed in the last two records, while the first two remain what they should be.


I would suggest you add this feature description to the README (and nothing have to change in the code), since I was so excited to find your repo to solve the upsert issue of pandas so nicely , but then turned sad when I read it only supporting primary key without auto increment. Only after I took a closer look at the code and carefully ran a test can I find it actually works with auto increment and unique constraint (at leat in MySQL).

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:9 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
LawrentChencommented, Jul 1, 2020

Thank you for taking my opinion in consideration. I’ve looked at both two PRs and I believe they are clear enough for new users. No need to feel sorry for not asking me before merging, I am already quite satisfied for being involved 😝.

Docker and pytest are right there in my learning roadmap, also considering Kubernetes. Happy to have these right targets. And I will try npdoc_to_md in the future. It seems can be used together with document generator like Sphinx/Jupinx.

And after you finish, I believe this issue can be closed anytime you wish. Have a nice day! 👍

1reaction
ThibTripcommented, Jun 30, 2020

I can’t really remember what was the issue with the auto incrementing key but the comment you were pointing at is merely a module filled with examples which is used for tests, docs and if a user wants to try pangres quickly. I recall that for some reason testing with an auto-incrementing primary key was complicated wich is why I ended up using a instead VARCHAR. But I removed this comment anyways in my new pull request. This PR changes the documentation to indicate we can use unique keys. Actually it does a little more than that I kind of got carried away (e.g. I changed the script for generating the documentation and fixed the yml file for code coverage). Maybe you want to give me your opinion on the documentation changes in the PR before I merge it?

I also added tests with unique keys in a previous PR. I forgot to ask you before merging sorry. Hope the test is what you had in mind. I just removed the timestamp columns because that was unnecessarily complicated for testing. I did see what you mentioned with triggers yes. Fortunately I have never needed such a use case in my work 🙈 or I would just do datetime.datetime.now().astimezone(datetime.timezone.utc) and call it a day (it did not matter much for me but doing that server side would be more accurate and most likely better for performance).

For JupyterHub I suppose using docker should make the task much easier. Obviously you’ll have to learn docker but it should be worth it (plus docker is used everywhere 😐). See docker page on JupyterHub website (they provide the link to the docker image on docker hub with detailed instructions). As for testing I can heavily recommand pytest. It’s very convenient and flexible. I am not sure I understand everything with parameterizing and generating tests though. I have used parameters in my other public library npdoc_to_md and in pangres I just generate tests for each database type. I learnt a few things about pytest by looking at tests in pandas repo. And keep writing issues 👍 . I don’t think everyone does that when they notice something’s not right.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Unique Constraints and Check Constraints - SQL Server
UNIQUE constraints and CHECK constraints are two types of constraints that can be used to enforce data integrity in SQL Server tables.
Read more >
UNIQUE Constraints in SQL - Simple Talk - Redgate Software
REFERENCES can refer to a UNIQUE constraint, not just a PRIMARY KEY . The columns have to be “union compatible” (i.e. columns in...
Read more >
How do I create a unique constraint that also allows nulls?
I want to have a unique constraint on a column which I am going to populate with GUIDs. However, my data contains null...
Read more >
Uniqueness Constraint - an overview | ScienceDirect Topics
A uniqueness constraint requires that the content of a data item (or combination or set 20 of data items) be different from that...
Read more >
SQL UNIQUE Constraint - W3Schools
The UNIQUE constraint ensures that all values in a column are different. Both the UNIQUE and PRIMARY KEY constraints provide a guarantee for...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found