question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support storage of binary data outside the relational database

See original GitHub issue

Is your feature request related to a problem? Please describe. I want to store lots of DocumentReference FHIR resources with corresponding Attachments which might be large PDFs.

I don’t want to store an large binary attachments inside my relational database, as it’s not optimized for such a use case. Instead I would like to use a dedicated object storage system (e.g. AWS S3 or google cloud storage) for such data.

Describe the solution you’d like I would like some pluggable configuration for large binary resource storage, separate from the relational database, with implementations for popular cloud infrastructure providers.

Describe alternatives you’ve considered Some database backends like postgresql support choosing particular storage devices for particular tables, so it might be possible to provision different classes of storage for such large binary objects separate from the rest of the relational data. Postgresql also supports large object storage (BLOB).

In some circumstances it is not possible for the application developer to provision such tablespaces e.g. when using managed databases or when the database administrators do not permit such configuration.

Acceptance Criteria

  1. GIVEN IBM FHIR server configured to store binary data in blob storage. WHEN a large attachment resource is received. THEN binary data is stored in the blob storage. AND attachment can be read back from the FHIR server with GET request.

Additional context FHIR Attachment resource provides two alternative mechanisms for transferring the binary data. This can either be inline encoded into the data field, or stored externally via the url field. Both present some challenges:

  • Large attachments transmitted via the data field may take some time to serialize and deserialize and transfer over a network connection. This can be undesirable when processing a large data feed.
  • Attachments transmitted via the url field must be fetched separately via some non-FHIR mechanism, which must be authorized separately. One possible mechanism is the use of ‘signed’ URLs to allow time-limited access by the recipient of the URL.

Some alternative FHIR server implementations e.g. AidBox support this kind of mechanism already.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:6 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
punktiliouscommented, Aug 13, 2021

Some of this has been prototyped already in one of my development branches, but that was to offload the storage of our entire data blob, whereas the requirements expressed above are a little more nuanced. It’s certainly possible, but the question is what can be done within the constraints of the FHIR spec (and how to handle scenarios such as bulk export etc).

1reaction
lmsurprecommented, Aug 13, 2021

I like this line of thinking. Interceptors definitely ARE able to modify the resources on the way in (via the beforeX methods in the interceptor), which I think was always the design but wasn’t working right in our R4 server implementation until https://github.com/IBM/FHIR/issues/2369.

I believe that interceptors are also able to modify resources on the way out, so I think it should be possible to replace the URLs on the way out (using the afterX methods in the same interceptor) with temporary/presigned ones. Please let us know if you give this a try and it isn’t working as expected.

Finally, as an aside, we do have some config+code for generated presigned temporary urls in our bulk data export feature. For example, in https://github.com/IBM/FHIR/blob/main/operation/fhir-operation-bulkdata/src/main/java/com/ibm/fhir/operation/bulkdata/model/url/DownloadUrl.java#L108. It might be possible to refactor some of that to be common and/or just use it as an example.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Should binary files be stored in the database?
My opinion is that storing in the DB as a blob is a better solution and more scalable in a multi-server scenario, especially...
Read more >
Managing Databases with Binary Large Objects - Index of /
We present recommendations on Performance Management for databases supporting Binary Large Objects. (BLOB) that, under a wide range of conditions, ...
Read more >
Non-relational data and NoSQL - Azure Architecture Center
Learn about non-relational databases that store data as key/value pairs, graphs, time series, objects, and other storage models, based on data requirements.
Read more >
SIARD/Recommendation for storing large object outside the ...
These LOBs can be stored inside a relational database as CLOBs or BLOBs or outside (SQL/MED).. This recommendation will be used by the...
Read more >
Binary Data in a Relational Database | Five Steps to ... - InformIT
Yes, but you can put binary data into a special type of table column. This level of clever flexibility has allowed relational database...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found