question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Common database operations are slow

See original GitHub issue

Is your feature request related to a problem? Please describe. Currently, Ghidra uses its own custom database, which supports only rudimentary indexes and filtering, which means that almost all filtering and sorting has to be done on a higher level by iterating over all possible records (see TreeTextFilter). This results in a very slow user experience for common actions, like searching for a symbol (related issue: #500).

I measured the time to filter symbols (in the “Symbol Tree” using the “Contains” mode) by a given string in a large project (roughly a million symbols) to be around 15s (computer specs: Ryzen 3700X, 32GB RAM, mid-range NVME SSD). I then exported all symbols into a CSV file using Ghidra’s Jython

outf = open("CSV_PATH", "w")
for s in currentProgram.getSymbolTable().getAllSymbols(False):
    outf.write(str(s.getAddress().getOffset()) + ',"' + s.getName() + '"\n')
outf.close()

and imported the CSV into three SQL databases (H2, PostgreSQL and SQLite)

-- For H2:
CREATE TABLE symbols (address bigint, name text)
AS SELECT * FROM csvread('CSV_PATH');
-- For PostgreSQL:
CREATE TABLE symbols (address bigint, name text);
COPY symbols FROM 'CSV_PATH' WITH (FORMAT CSV);
-- For SQLite
CREATE TABLE symbols (address bigint, name text);
.mode csv
.import CSV_PATH symbols

A subsequent SELECT * FROM symbols WHERE name ILIKE '%SEARCH_TERM%' or equivalent executed in ~1000ms on H2, ~250ms on PostgreSQL and ~100ms on SQLite.

This isn’t meant to compare the databases, but to show that, even without adding fulltext indexes, common SQL databases might be at least 15x better than Ghidra’s current code. I’m not very familiar with Ghidra’s internals, so please point out any problems that might invalidate these results.

Describe the solution you’d like I’d like to propose moving filtering and sorting functionality into the database layer by adopting an existing database (a relational database seems most suitable), abstracting it under the same API used today and adding “DB-aware” code to hotspots that are currently slow. This would have the added benefit of relying on a well-maintained open source project instead of custom code, making maintenance easier.

Describe alternatives you’ve considered It is certainly possible to optimise Ghidra’s current database to get performance comparable to existing, faster databases. I feel, however, that this would take more effort than integrating an existing database, both in up-front effort and especially in maintenance.

It’s also possible that the current performance is caused by bugs that can be fixed without such architectural changes, which would definitely be preferable.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6

github_iconTop GitHub Comments

1reaction
ghostcommented, Sep 9, 2019

I’m pretty sure the custom database is not the issue here. I just did a test and I was able to retrieve 1M records and filtered them and it took 533 ms, which is in the ballpark of the numbers you mentioned. I suspect the problem has more to do with the way we are mapping the records to Symbol objects. We may be doing it in an inefficient way where we are only keeping the symbol record keys in memory and every time we access a symbol field (such as its name), we may have to retrieve the symbol record again and again as we filter and sort. There is a soft cache for the symbol objects, but if you are low on memory and the garbage collector runs, those objects will be reclaimed, forcing it to constantly go back to the database to retrieve the information. Anyway, this is something we can look into and see what is really causing the slowness.

0reactions
ryanmkurtzcommented, Sep 20, 2019

Since this doesn’t appear to be a DB issue, I am going to close the ticket. Speedup pertaining to the Symbol Tree/Table can be tracked in #500.

Read more comments on GitHub >

github_iconTop Results From Across the Web

3 Common Database Performance Issues (and How to Fix ...
#1. A Lack of Indexes · #2. Inefficient Querying · #3. Improper Data Types1 · How to Spot Problems · It's an Easy...
Read more >
Database Response Time Analysis: Understanding Why The ...
“Why is my database slow?” This could be for many reasons, with one of the hard-to-isolate reasons being slow query processing and longer...
Read more >
Why is My Database Application so Slow? - Simple Talk
When your application is running slowly, the reflex action is to blame the database queries. It is certainly true that some of the...
Read more >
Known causes of slow database performance - HCL support
The most common reasons for slow database performance are related to one or a combination of issues in these areas: A. Hardware/OS-related causes...
Read more >
Dealing with Slow SQL, Common Performance Issues and More
Common causes of SQL performance issues · Tech debt in schemas – Often, bad schema design can slow down queries. · Legacy data...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found