Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Write Performance

See original GitHub issue

Hello all,

I fear that i’m not using the library correctly. Based on a benchmark review, i should be executing more writes / second.

I’m currently writing at 40,000 vals/sec.

I have tried bulking the data and using the Agrona DirectBuffers, however i see no performance increase.

Below is the code which i’m using for a performance test.

import org.lmdbjava.Cursor;
import org.lmdbjava.Dbi;
import org.lmdbjava.DbiFlags;
import org.lmdbjava.Env;
import org.lmdbjava.EnvFlags;
import org.lmdbjava.Txn;

import java.io.File;
import java.io.IOException;
import java.nio.ByteBuffer;

import static java.nio.charset.StandardCharsets.UTF_8;

public class EfficentTest {
    public static void main(String[] args) throws IOException
    {
        final File path = new File("/foo");

        if (!path.mkdirs() && !path.exists())
        {
            throw new IOException("Unable to create: " + path);
        }

        Env<ByteBuffer> env = Env.create()
                .setMapSize(1L << 31)
                .setMaxDbs(2)
                .open(path, EnvFlags.MDB_NOSYNC);

        final Dbi<ByteBuffer> names = env.openDbi("names", DbiFlags.MDB_CREATE);

        final ByteBuffer key = ByteBuffer.allocateDirect(4);
        final ByteBuffer val = ByteBuffer.allocateDirect(1024);

        final long t0 = System.currentTimeMillis();
        long tn = t0;
        for( int i = 0; i < 1000000 ; i++) {
            try (Txn<ByteBuffer> txn = env.txnWrite()) {
                final Cursor<ByteBuffer> c = names.openCursor(txn);
                key.putInt(0, i);
                val.put("Hello world".getBytes(UTF_8)).flip();
                c.put(key,val);
                txn.commit();
            }

            if (i % 1000 == 0)
            {
                long taken = System.currentTimeMillis() - tn;
                System.out.printf("Inserted: %d rows at %,2f vals/sec%n", i, (1000 * 1000D) / taken);
                tn = System.currentTimeMillis();
            }
        }
        final long t1 = System.currentTimeMillis();
        System.out.printf("Time to load db: %,dms%n", (t1 - t0));
    }
}

Any help would be much appreciated.

Issue Analytics

State:
Created 5 years ago
Comments:5 (5 by maintainers)

Top GitHub Comments

1reaction

benalexaucommented, Dec 19, 2018

@benalexau I somehow have never managed to be able to commit and reuse the same Transaction… I always had to create a new one…

A transaction can only be reused if it’s a read-only transaction. A read-write transaction can only be committed once. You need to decide whether your use case can be modelled as a single read-write transaction (for performance) or you need to use lots of read-write transactions.

In most of my LMDB workloads I have a single read-write transaction that runs for long periods (eg in some cases 24 hours as that’s how often the underlying input files roll over). This minimises wasted space (as a read transaction concurrent with a write transaction will cause file growth) while maximising throughput. It’s not for everyone though. My usage works because the underlying input files can be used to regenerate the LMDB database from scratch when something goes wrong (OS crashes, Java-side bugs, invalid input files etc) and there is only a single JVM application accessing each file (so the DAO or service holds onto the single read-write Txn instance and routes all read and write operations through it).

What is best depends on each application and the trade-offs you’re willing to live with. If using LMDB as a system of record you’ll probably want to use individual transactions per logical change. But if using LMDB as a very high performance durable sorted map, chances are you’ll use fewer transactions and instead accept a recovery strategy (rebuild from what your data input is, discard data since the last commit etc).

1reaction

benalexaucommented, Dec 7, 2018

@scottazord I ran your code locally and made some slight adaptions:

package org.lmdbjava;

import java.io.File;
import java.io.IOException;
import java.nio.ByteBuffer;

import java.nio.ByteOrder;
import static java.nio.charset.StandardCharsets.UTF_8;
import org.junit.Test;

public class EfficentTest {

  @Test
  public void test() throws IOException {
    final File path = new File("/tmp/foo");

    if (!path.mkdirs() && !path.exists()) {
      throw new IOException("Unable to create: " + path);
    }

    Env<ByteBuffer> env = Env.create()
        .setMapSize(1L << 31)
        .setMaxDbs(2)
        .open(path, EnvFlags.MDB_NOSYNC);

    final Dbi<ByteBuffer> names = env.openDbi("names", DbiFlags.MDB_CREATE, DbiFlags.MDB_INTEGERKEY);

    final ByteBuffer key = ByteBuffer.allocateDirect(4);
    key.order(ByteOrder.LITTLE_ENDIAN);
    final ByteBuffer val = ByteBuffer.allocateDirect(1024);

    final int report = 100_000;
    final int records = 1_000_000;
    final long t0 = System.currentTimeMillis();
    long tn = t0;
    for (int i = 0; i < records; i++) {
      try (Txn<ByteBuffer> txn = env.txnWrite()) {
        key.putInt(0, i);
        val.put("Hello world".getBytes(UTF_8)).flip();
        names.put(txn, key, val);
        txn.commit();
      }

      if (i % report == 0) {
        long taken = System.currentTimeMillis() - tn;
        double tps = report / ((double) taken/1_000);
        System.out.printf("%d Inserted: %d rows at %,2f vals/sec%n", taken, i, tps);
        tn = System.currentTimeMillis();
      }
    }
    final long t1 = System.currentTimeMillis();
    final double tps = records / ((double) (t1 - t0) / 1000);
    System.out.printf("Time to load db: %,dms (%,2f tps)  %n", (t1 - t0), tps);
  }
}

This gives:

Running org.lmdbjava.EfficentTest
0 Inserted: 0 rows at Infinity vals/sec
595 Inserted: 100000 rows at 168,067.226891 vals/sec
574 Inserted: 200000 rows at 174,216.027875 vals/sec
555 Inserted: 300000 rows at 180,180.180180 vals/sec
530 Inserted: 400000 rows at 188,679.245283 vals/sec
547 Inserted: 500000 rows at 182,815.356490 vals/sec
540 Inserted: 600000 rows at 185,185.185185 vals/sec
580 Inserted: 700000 rows at 172,413.793103 vals/sec
585 Inserted: 800000 rows at 170,940.170940 vals/sec
574 Inserted: 900000 rows at 174,216.027875 vals/sec
Time to load db: 5,645ms (177,147.918512 tps)

It’s always best to call Txn.commit() as infrequently as possible. So let’s make some minimal changes to see the impact of doing that:

package org.lmdbjava;

import java.io.File;
import java.io.IOException;
import java.nio.ByteBuffer;

import java.nio.ByteOrder;
import static java.nio.charset.StandardCharsets.UTF_8;
import org.junit.Test;

public class EfficentTest {

  @Test
  public void test() throws IOException {
    final File path = new File("/tmp/foo");

    if (!path.mkdirs() && !path.exists()) {
      throw new IOException("Unable to create: " + path);
    }

    Env<ByteBuffer> env = Env.create()
        .setMapSize(1L << 31)
        .setMaxDbs(2)
        .open(path, EnvFlags.MDB_NOSYNC);

    final Dbi<ByteBuffer> names = env.openDbi("names", DbiFlags.MDB_CREATE,
                                              DbiFlags.MDB_INTEGERKEY);

    final ByteBuffer key = ByteBuffer.allocateDirect(4);
    key.order(ByteOrder.LITTLE_ENDIAN);
    final ByteBuffer val = ByteBuffer.allocateDirect(1024);

    final int report = 100_000;
    final int records = 1_000_000;
    final long t0 = System.currentTimeMillis();
    long tn = t0;

    try (Txn<ByteBuffer> txn = env.txnWrite()) {
      for (int i = 0; i < records; i++) {
        key.putInt(0, i);
        val.put("Hello world".getBytes(UTF_8)).flip();
        names.put(txn, key, val);

        if (i % report == 0) {
          long taken = System.currentTimeMillis() - tn;
          double tps = report / ((double) taken / 1_000);
          System.out.printf("%d Inserted: %d rows at %,2f vals/sec%n", taken, i,
                            tps);
          tn = System.currentTimeMillis();
        }
        
        if (i + 1 == records) {
          txn.commit();
          System.out.println("Committed");
        }
      }
    }
    final long t1 = System.currentTimeMillis();
    final double tps = records / ((double) (t1 - t0) / 1000);
    System.out.printf("Time to load db: %,dms (%,2f tps)  %n", (t1 - t0), tps);
  }
}

This gives:

Running org.lmdbjava.EfficentTest
0 Inserted: 0 rows at Infinity vals/sec
63 Inserted: 100000 rows at 1,587,301.587302 vals/sec
36 Inserted: 200000 rows at 2,777,777.777778 vals/sec
36 Inserted: 300000 rows at 2,777,777.777778 vals/sec
36 Inserted: 400000 rows at 2,777,777.777778 vals/sec
33 Inserted: 500000 rows at 3,030,303.030303 vals/sec
33 Inserted: 600000 rows at 3,030,303.030303 vals/sec
34 Inserted: 700000 rows at 2,941,176.470588 vals/sec
35 Inserted: 800000 rows at 2,857,142.857143 vals/sec
48 Inserted: 900000 rows at 2,083,333.333333 vals/sec
Committed
Time to load db: 404ms (2,475,247.524752 tps)

Both tests resulted in the same sized database directory.

Of course a suitable batch size depends on your use case. But you gain more than an order of magnitude throughput in this simple test (177K tps to 2.5M tps).

As an aside, a test like this one will result in the LMDB C library detecting you are inserting values in increasing order and then it does some page size and B+ Tree optimisations. So when writing benchmarks please be sure to test on a dataset that is representative of your particular use case. To illustrate how much this differs the results, if I started the batch at 1,000,000 and decrement to 0 it dropped to 2.0M TPS with a final database 50% larger. Again, just be sure to test with data representative of your use case.