Support storage of binary data outside the relational database
See original GitHub issueIs your feature request related to a problem? Please describe. I want to store lots of DocumentReference FHIR resources with corresponding Attachments which might be large PDFs.
I don’t want to store an large binary attachments inside my relational database, as it’s not optimized for such a use case. Instead I would like to use a dedicated object storage system (e.g. AWS S3 or google cloud storage) for such data.
Describe the solution you’d like I would like some pluggable configuration for large binary resource storage, separate from the relational database, with implementations for popular cloud infrastructure providers.
Describe alternatives you’ve considered Some database backends like postgresql support choosing particular storage devices for particular tables, so it might be possible to provision different classes of storage for such large binary objects separate from the rest of the relational data. Postgresql also supports large object storage (BLOB).
In some circumstances it is not possible for the application developer to provision such tablespaces e.g. when using managed databases or when the database administrators do not permit such configuration.
Acceptance Criteria
- GIVEN IBM FHIR server configured to store binary data in blob storage. WHEN a large attachment resource is received. THEN binary data is stored in the blob storage. AND attachment can be read back from the FHIR server with GET request.
Additional context
FHIR Attachment resource provides two alternative mechanisms for transferring the binary data. This can either be inline encoded into the data field, or stored externally via the url field. Both present some challenges:
- Large attachments transmitted via the
datafield may take some time to serialize and deserialize and transfer over a network connection. This can be undesirable when processing a large data feed. - Attachments transmitted via the
urlfield must be fetched separately via some non-FHIR mechanism, which must be authorized separately. One possible mechanism is the use of ‘signed’ URLs to allow time-limited access by the recipient of the URL.
Some alternative FHIR server implementations e.g. AidBox support this kind of mechanism already.
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (5 by maintainers)

Top Related StackOverflow Question
Some of this has been prototyped already in one of my development branches, but that was to offload the storage of our entire data blob, whereas the requirements expressed above are a little more nuanced. It’s certainly possible, but the question is what can be done within the constraints of the FHIR spec (and how to handle scenarios such as bulk export etc).
I like this line of thinking. Interceptors definitely ARE able to modify the resources on the way in (via the beforeX methods in the interceptor), which I think was always the design but wasn’t working right in our R4 server implementation until https://github.com/IBM/FHIR/issues/2369.
I believe that interceptors are also able to modify resources on the way out, so I think it should be possible to replace the URLs on the way out (using the afterX methods in the same interceptor) with temporary/presigned ones. Please let us know if you give this a try and it isn’t working as expected.
Finally, as an aside, we do have some config+code for generated presigned temporary urls in our bulk data export feature. For example, in https://github.com/IBM/FHIR/blob/main/operation/fhir-operation-bulkdata/src/main/java/com/ibm/fhir/operation/bulkdata/model/url/DownloadUrl.java#L108. It might be possible to refactor some of that to be common and/or just use it as an example.