question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

A SPARQL LOAD with a presigned URL doesn't work

See original GitHub issue

Version

4.6.1

What happened?

I tried to do a SPARQL LOAD with a Minio presigned URL and instead of actually loading the remote data I got an error and stacktrace telling me: Failed to determine the content type.

When looking into the problem the issue seems to be the implementation of org.apache.jena.util.FileUtils#getFilenameExt. This method makes some assumptions that might be correct for a normal file path, but in the case of a SPARQL LOAD the same method is used, but a URL is passed and a URL, especially in the case of the presigned URL, does not necessarily end with the file extension.

I tried a quick override of the class/method that checks for a question mark in the filename parameter of the method and if that character is found tries to determine the file extension differently:

if (filename.contains("?")) {
    try {
        URL fileIneed = new URL(filename);
        String path = fileIneed.getPath();

        return FilenameUtils.getExtension(path);
    } catch (MalformedURLException e) {
        e.printStackTrace();
    }
}

With this hack and a fall back to the original code the SPARQL LOAD works as expected.

Relevant output and stacktrace

[2022-12-12 10:42:18] INFO  Fuseki          :: [5] POST http://fuseki.localhost/example-data-product.example-data-product.sparql/update
[2022-12-12 10:42:18] WARN  Fuseki          :: [5] ActionErrorException with cause
org.apache.jena.fuseki.servlets.ActionErrorException: Failed to LOAD 'http://minio-service.minio-dev.svc.cluster.local:9000/example-data-product.ddt.tst.212765240740/example-data-product/s3/test.ttl?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=DZOTE2HTDZ0N2O3R252P%2F20221212%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20221212T104141Z&X-Amz-Expires=604800&X-Amz-Security-Token=eyJhbGciOiJIUzUxMiIsInR5cCI6IkpXVCJ9.eyJhY2Nlc3NLZXkiOiJEWk9URTJIVERaME4yTzNSMjUyUCIsImV4cCI6MTY3MDg4NDg4NCwicGFyZW50IjoibWluaW9hZG1pbiJ9.-xyMI3oAnQ82xBW4j2vyCHdvzUC33pKIR_YsRO9am6KDus9qisodrCVqHOR9Xc4D4h539MSKDfdqyv70DKFYbg&X-Amz-SignedHeaders=host&versionId=null&X-Amz-Signature=746bfd4562f9fd7cac122dc3e201eea3cdfb208c671a6139dc642719ead1af64' :: Failed to determine the content type: (URI=http://minio-service.minio-dev.svc.cluster.local:9000/example-data-product.ddt.tst.212765240740/example-data-product/s3/test.ttl?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=DZOTE2HTDZ0N2O3R252P%2F20221212%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20221212T104141Z&X-Amz-Expires=604800&X-Amz-Security-Token=eyJhbGciOiJIUzUxMiIsInR5cCI6IkpXVCJ9.eyJhY2Nlc3NLZXkiOiJEWk9URTJIVERaME4yTzNSMjUyUCIsImV4cCI6MTY3MDg4NDg4NCwicGFyZW50IjoibWluaW9hZG1pbiJ9.-xyMI3oAnQ82xBW4j2vyCHdvzUC33pKIR_YsRO9am6KDus9qisodrCVqHOR9Xc4D4h539MSKDfdqyv70DKFYbg&X-Amz-SignedHeaders=host&versionId=null&X-Amz-Signature=746bfd4562f9fd7cac122dc3e201eea3cdfb208c671a6139dc642719ead1af64 : stream=application/octet-stream)
	at org.apache.jena.fuseki.servlets.ServletOps.errorOccurred(ServletOps.java:275) ~[fuseki-server.jar:4.6.1]
	at org.apache.jena.fuseki.servlets.SPARQL_Update.execute(SPARQL_Update.java:259) ~[fuseki-server.jar:4.6.1]
	at org.apache.jena.fuseki.servlets.SPARQL_Update.executeForm(SPARQL_Update.java:207) ~[fuseki-server.jar:4.6.1]
	at org.apache.jena.fuseki.servlets.SPARQL_Update.execute(SPARQL_Update.java:110) ~[fuseki-server.jar:4.6.1]
	at org.apache.jena.fuseki.servlets.ActionService.executeLifecycle(ActionService.java:58) ~[fuseki-server.jar:4.6.1]
	at org.apache.jena.fuseki.servlets.SPARQL_Update.execPost(SPARQL_Update.java:91) ~[fuseki-server.jar:4.6.1]
...

Caused by: org.apache.jena.riot.RiotException: Failed to determine the content type: (URI=http://minio-service.minio-dev.svc.cluster.local:9000/example-data-product.ddt.tst.212765240740/example-data-product/s3/test.ttl?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=DZOTE2HTDZ0N2O3R252P%2F20221212%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20221212T104141Z&X-Amz-Expires=604800&X-Amz-Security-Token=eyJhbGciOiJIUzUxMiIsInR5cCI6IkpXVCJ9.eyJhY2Nlc3NLZXkiOiJEWk9URTJIVERaME4yTzNSMjUyUCIsImV4cCI6MTY3MDg4NDg4NCwicGFyZW50IjoibWluaW9hZG1pbiJ9.-xyMI3oAnQ82xBW4j2vyCHdvzUC33pKIR_YsRO9am6KDus9qisodrCVqHOR9Xc4D4h539MSKDfdqyv70DKFYbg&X-Amz-SignedHeaders=host&versionId=null&X-Amz-Signature=746bfd4562f9fd7cac122dc3e201eea3cdfb208c671a6139dc642719ead1af64 : stream=application/octet-stream)
	at org.apache.jena.riot.RDFParser.parseURI(RDFParser.java:380) ~[fuseki-server.jar:4.6.1]
	at org.apache.jena.riot.RDFParser.parse(RDFParser.java:360) ~[fuseki-server.jar:4.6.1]
	at org.apache.jena.riot.RDFParserBuilder.parse(RDFParserBuilder.java:568) ~[fuseki-server.jar:4.6.1]
	at org.apache.jena.riot.RDFDataMgr.parseFromURI(RDFDataMgr.java:737) ~[fuseki-server.jar:4.6.1]
	at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:464) ~[fuseki-server.jar:4.6.1]
	at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:441) ~[fuseki-server.jar:4.6.1]
	at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:421) ~[fuseki-server.jar:4.6.1]
	at org.apache.jena.sparql.modify.UpdateEngineWorker.lambda$visit$2(UpdateEngineWorker.java:172) ~[fuseki-server.jar:4.6.1]
	at org.apache.jena.sparql.modify.UpdateEngineWorker.executeOperation(UpdateEngineWorker.java:550) ~[fuseki-server.jar:4.6.1]
	at org.apache.jena.sparql.modify.UpdateEngineWorker.visit(UpdateEngineWorker.java:157) ~[fuseki-server.jar:4.6.1]
	at org.apache.jena.sparql.modify.request.UpdateLoad.visit(UpdateLoad.java:65) ~[fuseki-server.jar:4.6.1]
...
[2022-12-12 10:42:18] INFO  Fuseki          :: [5] 500 Server Error (106 ms)

Are you interested in making a pull request?

Maybe

Issue Analytics

  • State:open
  • Created 9 months ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
afscommented, Dec 13, 2022

I’ve shortened the stack trace.

The URI is:

URI=http://minio-service.minio-dev.svc.cluster.local:9000/example-data-product.ddt.tst.212765240740/example-data-product/s3/test.ttl?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=DZOTE2HTDZ0N2O3R252P%2F20221212%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20221212T104141Z&X-Amz-Expires=604800&X-Amz-Security-Token=eyJhbGciOiJIUzUxMiIsInR5cCI6IkpXVCJ9.eyJhY2Nlc3NLZXkiOiJEWk9URTJIVERaME4yTzNSMjUyUCIsImV4cCI6MTY3MDg4NDg4NCwicGFyZW50IjoibWluaW9hZG1pbiJ9.-xyMI3oAnQ82xBW4j2vyCHdvzUC33pKIR_YsRO9am6KDus9qisodrCVqHOR9Xc4D4h539MSKDfdqyv70DKFYbg&X-Amz-SignedHeaders=host&versionId=null&X-Amz-Signature=746bfd4562f9fd7cac122dc3e201eea3cdfb208c671a6139dc642719ead1af64 : stream=application/octet-stream)

in essence

http://...host.../...path.../s3/test.ttl?...lots of query string ...
0reactions
afscommented, Dec 13, 2022

Full stacktrace: stacktrace.txt

Read more comments on GitHub >

github_iconTop Results From Across the Web

Using SPARQL UPDATE LOAD to import data into Neptune
The Boto3 documentation shows how to use a Python script to generate a presigned URL. Also, the content type of the files to...
Read more >
presigned s3 url of *.gz will not get load · Issue #6 · blazegraph ...
https://some-bucket.s3.amazonaws.com/triples-file.nt.gz?x-amz-security-token=XXX&AWSAccessKeyId=XXXX&Expires=1463756979&Signature=XXXX. The following sparql ...
Read more >
s3 presigned url doesn't work outside my network
I've generated presigned url & it works fine from my network. But I can't access the url from other network. here's my code...
Read more >
How NXP performs event-driven RDF imports to Amazon Neptune ...
The Lambda functions transform the CSV and XML data into RDF data, then load the new RDF data into Neptune by using the...
Read more >
AWS S3 Presigned URL doesn't display in Webviewer
I'm putting together a solution that runs Powershell scripts on the backend to manage files in an S3 bucket. My initial development was...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found