question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

What if number of DBs is unknown?

See original GitHub issue

Big fan of the project. Thanks for making such a great persistence solution!

Question: Normally when you create an Environment you must specify the number of DBs eg:

final Env<ByteBuffer> env = create()
        // LMDB also needs to know how large our DB might be. Over-estimating is OK.
        .setMapSize(10_485_760)
        // LMDB also needs to know how many DBs (Dbi) we want to store in this Env.
        .setMaxDbs(1)

We’re leaning towards a model (data is very unstructured) that calls for creating a DBI for an unknown number of entities. Entities are identified by UUIDs and there might be 1 persisted… or millions.

The simplest thing to do might be to simply call setMaxDbs(Integer.MAX) (or some other very high number).

Is this okay?

What might be the impact on performance due to passing a very high number to maxDbs?

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
benalexaucommented, Aug 19, 2017

there was concern about the cost of calculating compound keys

Don’t be. Just preallocate a re-usable key buffer and use offsets. For example, if you needed an integer prefix and then a UUID, you’d need a key buffer of 12 bytes. At offset 0 you’d write the integer, and at offset 4 you’d write the UUID. This is incredibly inexpensive, as the entire 12 bytes will be in the same cache line (on that issue you should allocate a 64 byte buffer but only use the first 12 bytes), and you can use a bounds-check disabled buffer (eg Agrona UnsafeBuffer) if you were really concerned. As an aside Agrona has static methods that hide the cache line (eg allocateDirectAligned can be used with org.agrona.BitUtil.CACHE_LINE_LENGTH).

If your keys are somewhat complex, I use and recommend Simple Binary Encoding. SBE generates encoder and decoder flyweights that can also handle schema changes, recurring fields, user-defined types / enums etc. It shields you from needing to keep track of field offsets and imposes no overhead compared with hand-written code. SBE is overkill if you have simple needs though. We use our own DAO code generator (which uses the offset approaches for automatically-encoded primary keys and index keys) and SBE for the value encoding.

Our generated DAO pattern has found the transaction capabilities of LMDB makes it quite simple to maintain multiple index tables. A typical pattern for a generated update method is basically:

  1. Start a write transaction
  2. Build primary key buffer from the passed entity object
  3. Fetch existing record (using the primary key buffer) from the data table
  4. Generate all index keys for the existing record
  5. Drop all index records (using the index key buffers) from the index tables
  6. Write the modified record into the data table
  7. Generate new index keys for the modified record
  8. Write the index records into the index tables (the values are the PK to the data table)
  9. Commit

Sure that’s somewhat inefficient for writes because often we drop an index record and the insert exactly the same index record again, but the trade-off there is we’re using a code generator so we want the generator to be correct and maintainable. Furthermore to be using LMDB at all means your use case is read-optimised, and writes are slower. If you need to be write-optimised and care less about reads, you’re probably better off with a log structured merge type system such as LevelDB or its derivatives.

I share this because it sounds like you’re trying to write a “generic” store, so having a way to differentiate your “index” keys from the “primary” keys is generally necessary. To finish off my DAO example, this shows how we do that:

  @Put(fw = InstrumentDecoder.class, pk = "global")
  void putInstrument(Txn<DirectBuffer> txn, DirectBuffer in);

  @Get(fw = InstrumentDecoder.class, key = "global")
  DirectBuffer getInstrument(Txn<DirectBuffer> txn, long global);

  @Iter(fw = InstrumentDecoder.class, key = "composite")
  DaoIterator findInstrumentByComposite(Txn<DirectBuffer> txn, KeyRangeType type,
                                        long composite);

  @Iter(fw = InstrumentDecoder.class, key = "shareClass")
  DaoIterator findInstrumentByShareClass(Txn<DirectBuffer> txn,
                                         KeyRangeType type, long shareClass);

Note the fw attribute is an SBE-generated flyweight, and the pk and key fields are found in the flyweight. Everything else is handed via runtime code generation. Our pattern is edit an SBE XML file, generate, write an annotated DAO interface, then use the annotated DAO interface with its runtime implementation. It’s very rare to need to deal with LmdbJava APIs directly from domain code (the exception is our time series data, as that’s 99% of our data volume and has extremely challenging storage and performance requirements).

If you do not need to transactional data sync across databases you could also create separate Env instances in different directories

@krisskross noted the fundamental piece about a Txn cannot cross Envs. You can, if desired, also have different Envs in the same directory (although different physical data and lock files) if you use MDB_NOSUBDIR.

0reactions
bukocommented, Aug 22, 2017

@benalexau Thanks very much. You’ve given us a lot to think about. Definitely like the idea of a metamodel that links primary keys and indexes.

Read more comments on GitHub >

github_iconTop Results From Across the Web

True or False: Revealing the Myths Around DBS Checks
DBS Checks are an unknown entity to many and as a result, we hear some weird and wonderful myths about them. DBS Checks...
Read more >
Criminal Record Checks - Rethink Mental Illness
You can contact the Disclosure and Barring Service (DBS) if there is incorrect information on your DBS certificate. You can ask for the...
Read more >
Unexpected Complications of Novel Deep Brain Stimulation ...
It carried the known risks associated with standard DBS procedures together with a number of unknown risks that by their nature were unpredictable....
Read more >
Deep Brain Stimulation (DBS) FAQ - Boston Scientific
Your information has been sent to one of our DBS Specialists. Pro tip: During the time you requested, be sure to answer all...
Read more >
How do I get my money back after I discovered an ...
Contact your bank or credit union immediately if you suspect an ... give you a tracking number, and keep you updated on the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found