question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Question: does python-rocksdb support importing external SST files, eg for bulk loads?

See original GitHub issue

To optimize for initial large bulk loads, this Rocksdb blog post recommends creating the SST files externally (eg from a big-data pipeline like Spark/MapReduce), and importing them into your DB: http://rocksdb.org/blog/2017/02/17/bulkoad-ingest-sst-file.html

Options options;
SstFileWriter sst_file_writer(EnvOptions(), options, options.comparator);
Status s = sst_file_writer.Open(file_path);
assert(s.ok());

// Insert rows into the SST file, note that inserted keys must be 
// strictly increasing (based on options.comparator)
for (...) {
  s = sst_file_writer.Add(key, value);
  assert(s.ok());
}

// Ingest the external SST file into the DB
s = db_->IngestExternalFile({"/home/usr/file1.sst"}, IngestExternalFileOptions());
assert(s.ok());

The post refers to the C++ Rocksdb API, eg db_->IngestExternalFile().

Does python-rocksdb support this kind of “ingest external SST files” (eg db_->IngestExternalFile()) behavior? I didn’t see this function listed in python-rocksdb. Thanks!

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:7

github_iconTop GitHub Comments

5reactions
Congyuwangcommented, Dec 15, 2022

This library has SstFileWriter and ingestExternalFile implemented: https://github.com/Congyuwang/RocksDict.

pip install rocksdict. With pre-build wheels, no need to compile.

Build Write Demo:

from rocksdict import Rdict, Options, SstFileWriter
import random

# generate some rand bytes
rand_bytes1 = [random.randbytes(200) for _ in range(100000)]
rand_bytes1.sort()
rand_bytes2 = [random.randbytes(200) for _ in range(100000)]
rand_bytes2.sort()

# write to file1.sst
writer = SstFileWriter(options=Options(raw_mode=True))
writer.open("file1.sst")
for k, v in zip(rand_bytes1, rand_bytes1):
    writer[k] = v

writer.finish()

# write to file2.sst
writer = SstFileWriter(options=Options(raw_mode=True))
writer.open("file2.sst")
for k, v in zip(rand_bytes2, rand_bytes2):
    writer[k] = v

writer.finish()

# Create a new Rdict with default options
d = Rdict("tmp", options=Options(raw_mode=True))
d.ingest_external_file(["file1.sst", "file2.sst"])
d.close()

# reopen, check if all key-values are there
d = Rdict("tmp", options=Options(raw_mode=True))
for k in rand_bytes2 + rand_bytes1:
    assert d[k] == k

d.close()

# delete tmp
Rdict.destroy("tmp")
0reactions
Congyuwangcommented, Dec 15, 2022

Yeah, that’s right. Let me fix it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Bulkloading by ingesting external SST files
Bulkloading. Write all of our keys and values into SST file outside of the DB; Add the SST file into the LSM directly....
Read more >
python-rocksdb
Question : does python-rocksdb support importing external SST files, eg for ... large bulk loads, this Rocksdb blog post recommends creating the SST...
Read more >
Import data from SST files
RocksDB is a storage engine based on the hard disk, providing a series of APIs for creating and importing SST files to help...
Read more >
Creating RocksDB SST file in Java for bulk loading
I think I found the problem with the code. The keys must be in order for the SST. The way I do the...
Read more >
TiDB Tools (II): Introducing TiDB Lightning
TiDB Lightning is designed for quickly importing a large MySQL dump ... and RocksDB stores the persistent data as a series of “SST”...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found