question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

MAJOR Perf Regression Using The Same Binaries From Ubuntu 20.04 -> 22.04

See original GitHub issue

Describe the bug I’m running the latest NuGet version, I just upgraded to dotnet7, and all systems are 100% updated. I have a service that handles active server connections, so when it restarts to update, all of the current servers connect also reconnect. Usually, that’s somewhere between 4k-6k connections trying to re-connect in 10-30 seconds. When the client reconnects, I need to look up its information in my CosmosDB database, so I do a simple query for an indexed unique field representing the client.

Results Running the exact same binaries under dotnet 7: On Ubuntu 20.04: The ~4k connections take about 20 seconds to full reconnect, and my 4 core VM hovers around 20% CPU usage. On Ubuntu 22.04: The ~4k connections take about 5 minutes to fully reconnect, and my 4 core VM is maxed out at around 100% CPU usage.

This is a really bad regression, and it must be related to the OS change. Again, I’m using the exact same complied binaries of the server and CosmosDB SDK. The two VMs are setup identically, the only difference is the OS version. I noticed this issue after upgrading to Ubuntu 22.04; my server got really slow. To validate it, I then created a new VM using the old OS, where I’m able to switch traffic between.

I know that the pref regression is in CosmosDB, because the lookup call is what’s taking the majority of the time and seems to be the cause of the 100% CPU load. It seems really suspicious because I get that a large influx of lookup queries all at once might max out the DB, but it wouldn’t expect it to max out the client-side CPU, all of the pending client-side tasks should just be awaiting the DB response.

To Reproduce Breaking down my code, basically, it’s equivalent to calling this small sudo code block in a concurrent loop about 3-4k times in 20 seconds.

         var container = Db.GetOctoClientMetaContainer();
                try
                {
                    var it = container.GetItemQueryIterator<OctoClientMeta>($"SELECT * FROM OctoClients u WHERE u.id='{id.ToUpper()}'");
                    while (it.HasMoreResults)
                    {
                        foreach (var client in await it.ReadNextAsync())
                        {
                            return new UpdateResult(true, client);
                        }
                    }
                }
                catch (Exception e)
                {
                       ...
                }

Expected behavior Like on Ubuntu 20.04, this should be expensive, but it shouldn’t peak out the CPU.

Actual behavior On Ubuntu 22.04, it peaks the CPU and takes a lot longer for all 4k lookups to occur.

Environment summary SDK Version: CosmosDB 3.31.2 OS Version Ubuntu 20.04 and Ubuntu 22.04

Additional context I would love to help gather anymore info that I can, perf traces or whatnot, just let me know.

Issue Analytics

  • State:closed
  • Created 10 months ago
  • Comments:15 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
QuinnDamerellcommented, Nov 9, 2022

They are digital ocean VMs, CPU optimized 4 core. I’m pretty sure it’s not a problem with the VM, because I have 7+ VMs all running the same setup worldwide. None of them had a problem on 20.04, but they all have it now after the update to 22.04.

Digital Ocean also doesn’t move around VMs as Azure can between hosts when they are turned off or restarted, so they should be running on the same hardware from the 20.04 -> 22.04 move.

0reactions
QuinnDamerellcommented, Nov 9, 2022

Yeah, will do.

Thanks for all of the info and help. I will take my finding and traces to open a bug elsewhere. 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

What a mess is Ubuntu 22.04 : r/linux
You know i use to think the same thing, i waited a few months and did a clean install of 20.04 LTS and...
Read more >
[bug] Performance regression on 22.04 benchmarks #5800
Describe the bug The benchmarks seen here: https://tauri.app/v1/references/benchmarks/ Have a significant (~24s) regression in the execution ...
Read more >
Performance regression when moving same application to ...
My environment is Ubuntu 20.04.1 using the clang-10 compiler. My development system is an aging Intel® Core™ i7-3740QM CPU @ 2.70GHz × 8....
Read more >
Jammy Jellyfish Point-Release Changes
This is a brief summary of bugs fixed between Ubuntu 22.04 and 22.04.1. This summary covers only changes to packages in main and...
Read more >
Why Ubuntu 22.04 is so fast (and how to make it faster)
An improvement to smoothness is therefore seen with triple buffering even on systems without frequency scaling. Will my games run faster?
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found