question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

On The Fly Encryption Feature Proposal

See original GitHub issue

Feature Proposal

This document is a proposal of On The Fly encryption feature that allows OpenSearch to encrypt search indices on the Directory level using different encryption keys per index.

Why we need it

Enterprise customers require additional controls over data they store in multi-tenanted cloud services. Data encryption with a customer provided key is one of the features these customers are asking for. This feature allows customers to manage their own master key and then give a cloud service access to encrypt or decrypt customer’s data with derived data keys. A customer can revoke master key in a case of a security incident making their data non-decryptable.

This feature enables a better data isolation in a multi-tenanted service, allows for a better audit trail, and for an added security.

OpenSearch does not provide fine-grained multi-tenanted encryption solution yet. It’s either enabled for the whole cluster or for a data node, or is fully disable. When we use a search index per tenant, there is no way to configure encryption per index. Having a separate OpenSearch cluster per tenant is too expensive.

Proposal

image

The proposal is to implement a new Lucene Directory that will encrypt or decrypt shard data on the fly. We can use existing settings.store.type configuration to enable encryption when we create an index. For example:

{
  "settings": {
  "store": {
    "type": "cryptofs"
  }
}

In this case cryptofs becomes a new Store Type. OpenSearch will use CryptoDirectory for this specific store type.

Potentially, we can implement CryptoDyrectory as a simple FilterDirectory to leverage existing Index Input and Output classes, however this approach won’t allow us to leverage buffered reads and writes. Lucene issues frequent single byte read and write calls, so it’s better to read from and write into an encrypted buffer instead of decrypting and encrypting single bytes every time.

We propose to override Lucene IndexInput and IndexOutput with a new encrypting implementations to leverage existing IO buffer optimization. CryptoDirectory will extend FSDirectory and will instantiate overridden versions of these inputs.

Also, Index Input and Output classes provide access to underlying IO streams, it allows to leverage existing optimized stream encryption libraries.

Encryption

Concrete encryption algorithm can be made configurable, but it’s critical to use no-padding algorithms to keep Lucene’s random IO access support.

Concrete crypto provider will be also configurable. Crypto providers like Amazon Corretto, SunJCE, or Bouncy Castle come with their own tradeoffs. Consumer of this On The Fly encryption feature should be able to make a decision based on their specific performance, FIPS compliance, or runtime environment requirements.

{
  "settings": {
    ...
    "encryption": {
      "algorithm": "AES/GCM/NoPadding",
      "provider": "SunJCE",
      ...
      }
  }
}

Key management

Each index shard will require one or multiple data keys to encrypt data. We can start with only one data key per shard to simplify key management. But this solution can evolve, for example OpenSearch can generate new data keys according to a time-based or usage-based criteria.

All shard data keys will be derived from one master key defined on the index level. When OpenSearch creates a new index, CryptoDirectoryFactory will reach out to a Key Management Service (KMS) to generate a data key pair. Encrypted version of the data key can be persisted in a key file inside the shard data folder itself. Any encryption or decryption operation will require a plain text version of the key, CryptoDirectory will need to make a call to the KMS service to decrypt encrypted data key. It will cache this plain text key version in a short lived cache for performance reasons.

Here is how we can configure a KMS when we create an index:

{
    "settings": {
        "store": {
            "type": "cryptofs"
        },
        "encryption": {
            "kms_type": "aws_kms",
            "master_key": "arn:aws:kms:us-west-2:111122223333:key/943842d0-f961-4322-aff5-e9581e7271b7"
        }
    }
}

This configuration can support multiple KMS vendors if required.

Key revocation and restoration

When customer revokes access to a master key, OpenSearch cannot decrypt encrypted data keys anymore. It will be able to decrypt encrypted data with a cached plain text version of a key until key cache expires, but after that any requests will start failing. OpenSearch will require a special error code to convey this error to consumers.

Any background operations like merge or refresh will also start failing - they will require a special handling to avoid data corruption.

Key restoration will require no specific logic. Once customer restores key access, then OpenSearch can use immediately to decrypt data keys.

Key rotation and re-encryption

This proposal does not cover managed key rotation and re-encryption. OpenSearch re-indexing satisfies both of these requirements during initial implementation phase.

Audit trail

Customers will be interested in monitoring how OpenSearch uses their encryption keys. Any KMS requests will be logged automatically on the customer’s KMS side. However when OpenSearch uses these data key to encrypt of decrypt data, no logs will be produced.

Performance

Encryption comes with a performance cost. Actual performance degradation will depend on a request type and on encryption algorithm. For example, according to our initial performance benchmarking overhead on injection and simple queries is less than on complex queries with functions and aggregates.

Concrete acceptable performance degradation numbers are still TBD.

Shipment options

We would like this feature to be available in managed AWS OpenSearch service. We can either ship this feature as a community plugin or implement it inside OpenSearch itself.

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:10
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
willyborankincommented, Jun 19, 2022

@dblock @willyborankin thank you for your questions and feedback. I’m replying to both of you because there is a certain overlap in Snapshot related functionality that both of you brought up.

Snapshots Snapshots are currently out of the scope of this proposal. When OpenSearch creates a snapshot, it will decrypt an index and will store index data in plain text. Decryption happens automatically because OpenSearch creates a Directory based on the index store.type. Snapshots may or may not have different encryption requirements. Snapshot encryption might be solved using different tools and technologies. E.g. storing tenant’s index snapshots on a S3 buckets encrypted with SSE-KMS, or using the plugin referenced by @willyborankin, or by any other means. Some use cases also require no snapshots at all. I’d prefer no to bloat this proposal with snapshot encryption.

I agree they need to be independent, some customers could use encrypted file systems and store encrypted snapshots in clouds, using their own keys or build-in functionality provided by clouds.

Shard merge OpenSearch uses same Directory approach to merge shards. In this case, it reads segment data, decrypts it, and encrypts again when it writes a merged shard. We have not observed any merge-related issues when we POCed on it. @willyborankin please let me know if you have any specific use cases in mind, we can double check them.

Thank you for your explanation now it is clear.

Change/roll a master key Key rotation and data re-encryption is outside of this proposal’s scope. Key management becomes complex very quickly. We can achieve both key rotation and re-encryption by re-indexing an index into a new index that uses a new key and then by swapping these indices. Potentially, we can evolve this solution in future to support key rotation, but not re-encryption. Key rotation would simply mean data key re-encryption with a new master key. But data re-encryption will be risky and error prone, it’s still safer to re-index. I also propose to distinguish between data loss and crypto shredding. If a customer revokes a master key on purpose, OpenSearch cannot decrypt data anymore, it’s crypto shredded on purpose. If a customer rotates a master key, old master key might still be used for decryption purposes for some time. During that time we can schedule reindexing. If reindexing fails within that time then it’s a data loss.

Got it.

How this solution will work with big shards size of > 64GB?

Initialization Vector (IV) must not be used to encrypt more than 64Gb of data. Encrypting more data with the same IV makes the key vulnerable. We propose to have a separate IV for a segment, not a shard. It means, we might have problems with segment files that have more than 64Gb. There are multiple ways to fix it:

  • “Chunk” big files internally - generate a new IV each 64Gb and store it inside a segment file, in the beginning of each “chunk”. It will require careful IV-aware positioning when we read from that file.

I especially asked this question due to the problem I thought existed for merging procedure. But I’m for switching IV each 64 GB, instead of limiting it to 64 GB. Such problem exists for the encrypted plugin, which I’m going to fix soon.

  • Limit segment size for encrypted indices.

Besides that, our proposal should have no issues with such big shards.

How will it work for the remote segments? Rotation of the master key will lead to partial data loss

We don’t cover master key rotation yet, so should not be an issue.

It affects the size of index data on the disk and in the memory as well since encrypted data is worse than non-encrypted

Encryption adds almost no overhead on the persisted data. The overhead will be: data key or keys, IV per file, custom Lucene headers and footers per file. Yes, we will have to pay CPU and memory price for this kind of index encryption. We will need to account for that when we do sizing estimates.

Got it. Thank you for your explanation.

1reaction
willyborankincommented, Jun 16, 2022

The idea is good But (IMHO):

  • When Lucene starts merge of segments for the shard it will break encryption which leads to data-loss
  • When you will change/roll a master key it will break your encryption partially, in memory data most probably be ok, on the disk no which leads to partial data-loss
  • If you send this data to the snapshot switching the key will leave your snapshot useless, which leads to data-loss
  • How this solution will work with big shards size of > 64GB?
  • How will it work for the remote segments? Rotation of the master key will lead to partial data loss

It affects the size of index data on the disk and in the memory as well since encrypted data is worse than non-encrypted

For snapshotting encryption we already introduced a plugin here: https://github.com/aiven/encrypted-repository-opensearch and it was added here: https://github.com/opensearch-project/project-website/pull/812 as a community plugin

Read more comments on GitHub >

github_iconTop Results From Across the Web

On-the-fly AES256 Encryption / Decryption for Trusted Cloud ...
Abstract—We propose client-side AES256 encryption for a cloud SQL database. We rely on the safety of DB run-time values.
Read more >
On-the fly AES Decryption/Encryption for Cloud SQL Databases
We propose the client-side AES256 encryption for a cloud SQL DB. A column ciphertext is deterministic or probabilistic. We trust the cloud ...
Read more >
What is On-the-Fly Encryption? - Study.com
With the key, encryption can be undone, a characteristic known as reversibility. Encryption is a basic part of cell phone protection, online business ......
Read more >
Code Encryption - an overview | ScienceDirect Topics
The ECU reprogramming process with the encryption and on-the-fly ... that may be used to protect software by using features like data hiding,...
Read more >
Azure encryption overview - Microsoft Learn
See information for encryption at rest, encryption in flight, ... With the Always Encrypted feature in Azure SQL you can encrypt data within ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found