question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

oauth token race condition?

See original GitHub issue

I have run into a fairly interesting race condition problem using oauth tokens stored on the filesystem. At least it has the hallmarks of a race condition…

I use O365 in a (Docker) containerized application I’ve developed that manages file encryption & SFTP transfers for my company. My application runs various download & upload jobs daily via cron, and sends out status e-mails when the jobs have completed. I have wrapped O365 in a separate module I created that takes care of all its initialization and allows my other modules to easily send preset e-mails depending on the job. Each job calls its own instance of this O365 wrapper.

I have 2 jobs that run at the exact same time in the morning. The jobs take slightly different amounts of time to complete, on the order of milliseconds. When they’re done, they both call the O365 wrapper, which starts the process of initializing O365 and sending the desired e-mails. At the time of these jobs, the last token was used over 60 min prior, so O365 has to fetch new oAuth tokens thru Graph.

Here’s where things get funky. Since the calls to O365 come within a few milliseconds of one another, it’s not enough time for the first call to fetch and save down a new token and have the other call see this token before it goes and does the same thing. So both O365 calls think they need to get new tokens…which they do (successfully). Both jobs then send a POST request to the graph sendMail endpoint with my email content…and after about 20 seconds I get a 503 service unavailable error on both requests.

To make this a bit easier to understand - here is a very rough, simplified & scrubbed version of my logs, to indicate what’s happening.

2019-11-06 07:00:01,236 - job1 - INFO - Job 1 Finishes
2019-11-06 07:00:01,238 - O365.connection:_internal_request:618 - INFO - Requesting (POST) URL: https://graph.microsoft.com/v1.0/users/myemail@myemail.com/sendMail
2019-11-06 07:00:01,238 - O365.connection:_internal_request:619 - INFO - Request parameters: {...job 1 graph json...}
2019-11-06 07:00:01,238 - O365.connection:_internal_request:634 - INFO - Oauth Token is expired, fetching a new token
2019-11-06 07:00:01,246 - job2 - INFO - Job 2 Finishes
2019-11-06 07:00:01,248 - O365.connection:_internal_request:618 - INFO - Requesting (POST) URL: https://graph.microsoft.com/v1.0/users/myemail@myemail.com/sendMail
2019-11-06 07:00:01,248 - O365.connection:_internal_request:619 - INFO - Request parameters: {...job 2 graph json...}
2019-11-06 07:00:01,248 - O365.connection:_internal_request:634 - INFO - Oauth Token is expired, fetching a new token
2019-11-06 07:00:01,669 - O365.connection:_internal_request:637 - INFO - New oauth token fetched
2019-11-06 07:00:01,669 - O365.connection:_internal_request:618 - INFO - Requesting (POST) URL: https://graph.microsoft.com/v1.0/users/myemail@myemail.com/sendMail
2019-11-06 07:00:01,669 - O365.connection:_internal_request:619 - INFO - Request parameters: {...job 1 graph json...}
2019-11-06 07:00:01,734 - O365.connection:_internal_request:637 - INFO - New oauth token fetched
2019-11-06 07:00:01,734 - O365.connection:_internal_request:618 - INFO - Requesting (POST) URL: https://graph.microsoft.com/v1.0/users/myemail@myemail.com/sendMail
2019-11-06 07:00:01,735 - O365.connection:_internal_request:619 - INFO - Request parameters: {...job 2 graph json...}
2019-11-06 07:00:24,354 - O365.connection:_internal_request:662 - DEBUG - Server Error: 503 Server Error: Service Unavailable for url: https://graph.microsoft.com/v1.0/users/myemail@myemail.com/sendMail
2019-11-06 07:00:24,582 - O365.connection:_internal_request:662 - DEBUG - Server Error: 503 Server Error: Service Unavailable for url: https://graph.microsoft.com/v1.0/users/myemail@myemail.com/sendMail

I suspect the issue is coming from the first job sending a POST request to send the email while the graph server is still handling the second job’s request for a new token. The 503 code on the MS documentation indicates the service is not available OR busy. But I’m not positive. MS server is definitely not down. I wanted to see if you had encountered this or something similar before, and had any suggestions.

The easiest solution is to just have the job’s run a minute apart. Gives enough time to fetch and read new tokens. Or I thought of maybe using a token file name unique to the job? Currently, I use the same token file for everything.

Any input would be appreciated. Thanks!

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:20 (20 by maintainers)

github_iconTop GitHub Comments

1reaction
riptusk331commented, Nov 11, 2019

Pushed the discussed token.py changes to #356 and removed changes to Connection

0reactions
janscascommented, Nov 20, 2019

Great news! I’ll close this as it seems solved.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Best practice/solution to handle OAUTH access token refresh ...
Multiple calls to refresh the token will result in a failure because the first request will invalidate the others aka 'race condition'.
Read more >
Stop Race Condition of Refresh Token? - Stack Overflow
A race condition occurs when two sessions share a common refresh token (like when two tabs are open in a browser and the...
Read more >
Race condition when refreshing access tokens in a web app + ...
This concept works fine most of the times. However, we have identified a race condition: Once the access token needs to be refreshed,...
Read more >
GitLab Token Unable To Refresh Due To Race Condition
There is a known race condition where if Gitlab receives a lot of webhooks at once then it will refresh the token on...
Read more >
Race Condition - HackTricks
The problem appears when you accept it and automatically sends an authorization_code to the malicious application. Then, this application abuses a Race ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found