question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Terrible queries performance

See original GitHub issue

Description

As I see in repo description, this library is not a wrapper on HTTP interface so I thought that queries over native protocol must be faster than, for example, getting json and deserializing it. But it works incredible slow. I made some investigation that shows exponential growth of data processing

Steps to reproduce

I tried three ways to get data. A little bit modified method from tests (TestPerfromance), ExecuteSelectCommand from this wrapper and simple wrapper for executing requests over http:

{
Stopwatch sw = new Stopwatch();
            sw.Start();    
            for (int i = 1; i <= 16384; i *= 2)
            {
                string query = $"SELECT address from (SELECT DISTINCT  address FROM (SELECT from_address as address FROM addresses) ANY FULL OUTER JOIN (SELECT to_address as address FROM addresses) USING (address) ) LIMIT {i}";
                using (var cnn = GetConnection())
                {
                    var cmd = cnn.CreateCommand(query);
                    var list = new List<List<object>>();
                    using (var reader = cmd.ExecuteReader())
                    {                       
                        reader.ReadAll(x =>
                        {
                            var rowList = new List<Object>();
                            for (var j = 0; j < x.FieldCount; j++)
                                rowList.Add(x.GetValue(j));
                            list.Add(rowList);                          
                        });
                    }
                }
                Console.WriteLine($"{i} records: {sw.ElapsedMilliseconds} ms. IDataReader.ReadAll");                
                var nodes = _db.ExecuteSelectCommand(query);
                sw.Restart();
                Console.WriteLine($"{i} records: {sw.ElapsedMilliseconds} ms. ExecuteSelectCommand");
                sw.Restart();
                var nodesJson = ClickhouseQueryExecutor.ExecuteQuery<List<HolderMap>>(query);
                Console.WriteLine($"{i} records: {sw.ElapsedMilliseconds} ms. JSON ");
            }
}
public static T ExecuteQuery<T>(string query) where T : class
        {
            _webClient.Headers[HttpRequestHeader.AcceptEncoding] = "gzip";
            var data = Encoding.ASCII.GetBytes(query + " FORMAT JSON");
            var response = _webClient.UploadData("http://house.click:8123/", data);
            var responseString = JObject.Parse(Encoding.Default.GetString(response));
            var responseBody = responseString["data"];
            var results = JsonConvert.DeserializeObject<T>(responseBody.ToString());
            return results;
        }`

And get next result:
1 records: 609 ms. IDataReader.ReadAll
1 records: 425 ms. ExecuteSelectCommand
1 records: 595 ms. JSON
2 records: 1012 ms. IDataReader.ReadAll
2 records: 397 ms. ExecuteSelectCommand
2 records: 246 ms. JSON
4 records: 640 ms. IDataReader.ReadAll
4 records: 404 ms. ExecuteSelectCommand
4 records: 264 ms. JSON
8 records: 668 ms. IDataReader.ReadAll
8 records: 395 ms. ExecuteSelectCommand
8 records: 254 ms. JSON
16 records: 652 ms. IDataReader.ReadAll
16 records: 398 ms. ExecuteSelectCommand
16 records: 242 ms. JSON
32 records: 638 ms. IDataReader.ReadAll
32 records: 387 ms. ExecuteSelectCommand
32 records: 270 ms. JSON
64 records: 712 ms. IDataReader.ReadAll
64 records: 441 ms. ExecuteSelectCommand
64 records: 261 ms. JSON
128 records: 777 ms. IDataReader.ReadAll
128 records: 515 ms. ExecuteSelectCommand
128 records: 256 ms. JSON
256 records: 948 ms. IDataReader.ReadAll
256 records: 685 ms. ExecuteSelectCommand
256 records: 314 ms. JSON
512 records: 1361 ms. IDataReader.ReadAll
512 records: 1052 ms. ExecuteSelectCommand
512 records: 299 ms. JSON
1024 records: 2037 ms. IDataReader.ReadAll
1024 records: 1715 ms. ExecuteSelectCommand
1024 records: 335 ms. JSON
2048 records: 3713 ms. IDataReader.ReadAll
2048 records: 6521 ms. ExecuteSelectCommand
2048 records: 603 ms. JSON
4096 records: 6949 ms. IDataReader.ReadAll
4096 records: 6341 ms. ExecuteSelectCommand
4096 records: 793 ms. JSON
8192 records: 13315 ms. IDataReader.ReadAll
8192 records: 12631 ms. ExecuteSelectCommand
8192 records: 1143 ms. JSON
16384 records: 25422 ms. IDataReader.ReadAll
16384 records: 24835 ms. ExecuteSelectCommand
16384 records: 1286 ms. JSON

Any thoughts?

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:29 (15 by maintainers)

github_iconTop GitHub Comments

1reaction
VitaliyMFcommented, Jan 10, 2019

I just runned my benchmark on local ubuntu with connecting to remote host and getted wonderful results

In my tests client is .NET Core 2.1 app hosted on Ubuntu 16.04, so new managed Sockets implementation is used. But I was able to reproduce significant delays anyway.

I think that delays are caused by TcpClient.ReceiveBufferSize (set in ClickHouseConnection.cs) value which is 1024 by default (in ClickHouseConnectionSettings.cs). I tried to set “BufferSize=8192” in the connection string, and surprisingly I got report generation time about 400-700ms instead of 15-17 seconds!

@ridicoulous could you try to increase BufferSize in your connection string and run your tests that show bad load times?

1reaction
ridicoulouscommented, Jan 10, 2019

@VitaliyMF @ilyabreev @killwort collegues, problem has been definetely located in ntdll.dll. I just runned my benchmark on local ubuntu with connecting to remote host and getted wonderful results: image I also created a post at stackowerflow

Read more comments on GitHub >

github_iconTop Results From Across the Web

a SQL query performance killer – the basics
Query performance also depends on data volume and transaction ... We'll start with the worst case – selecting all columns and all rows: ......
Read more >
Bad Query Performance Tips Rebutted
I'm going to list all the bad query performance tips and I'll explain where they're good and where they're bad. However, the single...
Read more >
How Bad Statistics Cause Bad SQL Server Query ...
When it guesses too low, your queries will perform poorly because they won't get enough memory or CPU resources. When it guesses too...
Read more >
Options for preventing performance issues with poorly ...
Below are some solutions that can be used individually or all together in order to avoid allowing bad queries to overwhelm a production ......
Read more >
How to identify worst performing queries & improve their ...
We will learn here how to identify the worst performing queries without having any idea of which queries are executed behind the scenes....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found