asyncBatchAnnotateFiles filename output concatenates "output-x-to-x.json"
See original GitHub issueEnvironment details
- OS: windows 10
- Node.js version: v9.2.0
- npm version:6.5.0
@google-cloud/vision
version: ^0.23.0
Steps to reproduce
function processFilename(fileName) { // Path to PDF file within bucket
// const gcsSourceUri = `gs://${bucketName}/pdfs/${fileName}`;
let gcsSourceUri = `gs://${bucketName}/${fileName}`;
let gcsDestinationUri = `gs://${bucketName}/${fileName}.json`;
let inputConfig = {
// Supported mime_types are: 'application/pdf' and 'image/tiff'
mimeType: 'application/pdf',
gcsSource: {
uri: gcsSourceUri,
},
};
let outputConfig = {
gcsDestination: {
uri: gcsDestinationUri,
},
};
// let features = [{ type: 'DOCUMENT_TEXT_DETECTION', model: "builtin/latest" }];
let features = [{ type: 'DOCUMENT_TEXT_DETECTION' }];
let request = {
requests: [{
inputConfig: inputConfig,
features: features,
outputConfig: outputConfig,
}, ],
};
client
.asyncBatchAnnotateFiles(request)
.then(results => {
const operation = results[0];
// Get a Promise representation of the final result of the job
operation
.promise()
.then(filesResponse => {
// console.log(JSON.stringify(filesResponse));
let destinationUri = filesResponse[0].responses[0].outputConfig.gcsDestination.uri;
console.log('Json saved to: ' + destinationUri);
// console.log(filesResponse[0].responses);
})
.catch(function(error) {
console.log(error);
});
})
.catch(function(error) {
console.log(error);
});
}
for example the input filename: aabb.pdf then the output will be: aabb.pdf.jsonoutput-1-to-1.json
(if the pdf contained 1 page)
Thanks!
Issue Analytics
- State:
- Created 5 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
Google Gloud Vision API - Return Output as JSON Object ...
Does anyone know how I need to structure the outputConfig object in order to achieve this? async function detectPdfText(bucketName, fileName) { ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Ah. Good catch, let me look into that. I wonder if you have to specify the batch_size as 1 in the OutputConfig to use a filename.
Yea, because it may end up splitting your output to multiple files due to size, so you’ll have to check what’s created. To simplify that check, It’s recommended to use a prefix
gs://$bucketName}/prefix/
so that you know only your output will be there. Less searching that way.@nnegrey just a heads up, the docs here still state that
GcsDestination
can represent a single file.Also, I’m not sure if this is the best place to ask, but what would be the best way to determine the final output location of the JSON file(s)? The
AsyncBatchAnnotateFilesResponse
only seems to contain theOutputConfig
you pass toasyncBatchAnnotateFiles
(so just thegs://${bucketName}/
, nogs://${bucketName}/output-x-to-x.json
). Is scanning the bucket with the Storage API’sBucket.getFiles
method the recommended way?