Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

What if number of DBs is unknown?

See original GitHub issue

Big fan of the project. Thanks for making such a great persistence solution!

Question: Normally when you create an Environment you must specify the number of DBs eg:

final Env<ByteBuffer> env = create()
        // LMDB also needs to know how large our DB might be. Over-estimating is OK.
        .setMapSize(10_485_760)
        // LMDB also needs to know how many DBs (Dbi) we want to store in this Env.
        .setMaxDbs(1)

We’re leaning towards a model (data is very unstructured) that calls for creating a DBI for an unknown number of entities. Entities are identified by UUIDs and there might be 1 persisted… or millions.

The simplest thing to do might be to simply call setMaxDbs(Integer.MAX) (or some other very high number).

Is this okay?

What might be the impact on performance due to passing a very high number to maxDbs?

Issue Analytics

State:
Created 6 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

2reactions

benalexaucommented, Aug 19, 2017

there was concern about the cost of calculating compound keys

Don’t be. Just preallocate a re-usable key buffer and use offsets. For example, if you needed an integer prefix and then a UUID, you’d need a key buffer of 12 bytes. At offset 0 you’d write the integer, and at offset 4 you’d write the UUID. This is incredibly inexpensive, as the entire 12 bytes will be in the same cache line (on that issue you should allocate a 64 byte buffer but only use the first 12 bytes), and you can use a bounds-check disabled buffer (eg Agrona UnsafeBuffer) if you were really concerned. As an aside Agrona has static methods that hide the cache line (eg allocateDirectAligned can be used with org.agrona.BitUtil.CACHE_LINE_LENGTH).

If your keys are somewhat complex, I use and recommend Simple Binary Encoding. SBE generates encoder and decoder flyweights that can also handle schema changes, recurring fields, user-defined types / enums etc. It shields you from needing to keep track of field offsets and imposes no overhead compared with hand-written code. SBE is overkill if you have simple needs though. We use our own DAO code generator (which uses the offset approaches for automatically-encoded primary keys and index keys) and SBE for the value encoding.

Our generated DAO pattern has found the transaction capabilities of LMDB makes it quite simple to maintain multiple index tables. A typical pattern for a generated update method is basically:

Start a write transaction
Build primary key buffer from the passed entity object
Fetch existing record (using the primary key buffer) from the data table
Generate all index keys for the existing record
Drop all index records (using the index key buffers) from the index tables
Write the modified record into the data table
Generate new index keys for the modified record
Write the index records into the index tables (the values are the PK to the data table)
Commit

Sure that’s somewhat inefficient for writes because often we drop an index record and the insert exactly the same index record again, but the trade-off there is we’re using a code generator so we want the generator to be correct and maintainable. Furthermore to be using LMDB at all means your use case is read-optimised, and writes are slower. If you need to be write-optimised and care less about reads, you’re probably better off with a log structured merge type system such as LevelDB or its derivatives.

I share this because it sounds like you’re trying to write a “generic” store, so having a way to differentiate your “index” keys from the “primary” keys is generally necessary. To finish off my DAO example, this shows how we do that:

  @Put(fw = InstrumentDecoder.class, pk = "global")
  void putInstrument(Txn<DirectBuffer> txn, DirectBuffer in);

  @Get(fw = InstrumentDecoder.class, key = "global")
  DirectBuffer getInstrument(Txn<DirectBuffer> txn, long global);

  @Iter(fw = InstrumentDecoder.class, key = "composite")
  DaoIterator findInstrumentByComposite(Txn<DirectBuffer> txn, KeyRangeType type,
                                        long composite);

  @Iter(fw = InstrumentDecoder.class, key = "shareClass")
  DaoIterator findInstrumentByShareClass(Txn<DirectBuffer> txn,
                                         KeyRangeType type, long shareClass);

Note the fw attribute is an SBE-generated flyweight, and the pk and key fields are found in the flyweight. Everything else is handed via runtime code generation. Our pattern is edit an SBE XML file, generate, write an annotated DAO interface, then use the annotated DAO interface with its runtime implementation. It’s very rare to need to deal with LmdbJava APIs directly from domain code (the exception is our time series data, as that’s 99% of our data volume and has extremely challenging storage and performance requirements).

If you do not need to transactional data sync across databases you could also create separate Env instances in different directories

@krisskross noted the fundamental piece about a Txn cannot cross Envs. You can, if desired, also have different Envs in the same directory (although different physical data and lock files) if you use MDB_NOSUBDIR.

0reactions

bukocommented, Aug 22, 2017

@benalexau Thanks very much. You’ve given us a lot to think about. Definitely like the idea of a metamodel that links primary keys and indexes.

Top Results From Across the Web

True or False: Revealing the Myths Around DBS Checks

DBS Checks are an unknown entity to many and as a result, we hear some weird and wonderful myths about them. DBS Checks...

Criminal Record Checks - Rethink Mental Illness

You can contact the Disclosure and Barring Service (DBS) if there is incorrect information on your DBS certificate. You can ask for the...

Unexpected Complications of Novel Deep Brain Stimulation ...

It carried the known risks associated with standard DBS procedures together with a number of unknown risks that by their nature were unpredictable....

Deep Brain Stimulation (DBS) FAQ - Boston Scientific

Your information has been sent to one of our DBS Specialists. Pro tip: During the time you requested, be sure to answer all...

How do I get my money back after I discovered an ...

Contact your bank or credit union immediately if you suspect an ... give you a tracking number, and keep you updated on the...