question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

asyncBatchAnnotateFiles filename output concatenates "output-x-to-x.json"

See original GitHub issue

Environment details

  • OS: windows 10
  • Node.js version: v9.2.0
  • npm version:6.5.0
  • @google-cloud/vision version: ^0.23.0

Steps to reproduce

function processFilename(fileName) { // Path to PDF file within bucket

//  const gcsSourceUri = `gs://${bucketName}/pdfs/${fileName}`;
let gcsSourceUri = `gs://${bucketName}/${fileName}`;
let gcsDestinationUri = `gs://${bucketName}/${fileName}.json`;

let inputConfig = {
    // Supported mime_types are: 'application/pdf' and 'image/tiff'
    mimeType: 'application/pdf',
    gcsSource: {
        uri: gcsSourceUri,
    },
};
let outputConfig = {
    gcsDestination: {
        uri: gcsDestinationUri,
    },
};
//    let features = [{ type: 'DOCUMENT_TEXT_DETECTION', model: "builtin/latest" }];
let features = [{ type: 'DOCUMENT_TEXT_DETECTION' }];
let request = {
    requests: [{
        inputConfig: inputConfig,
        features: features,
        outputConfig: outputConfig,
    }, ],
};

client
    .asyncBatchAnnotateFiles(request)
    .then(results => {
        const operation = results[0];
        // Get a Promise representation of the final result of the job
        operation
            .promise()
            .then(filesResponse => {

                //                    console.log(JSON.stringify(filesResponse));

                let destinationUri = filesResponse[0].responses[0].outputConfig.gcsDestination.uri;
                console.log('Json saved to: ' + destinationUri);

                //          console.log(filesResponse[0].responses);
            })
            .catch(function(error) {
                console.log(error);
            });
    })
    .catch(function(error) {
        console.log(error);
    });

}

for example the input filename: aabb.pdf then the output will be: aabb.pdf.jsonoutput-1-to-1.json

(if the pdf contained 1 page)

Thanks!

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
nnegreycommented, Jan 18, 2019

Ah. Good catch, let me look into that. I wonder if you have to specify the batch_size as 1 in the OutputConfig to use a filename.

Yea, because it may end up splitting your output to multiple files due to size, so you’ll have to check what’s created. To simplify that check, It’s recommended to use a prefix gs://$bucketName}/prefix/ so that you know only your output will be there. Less searching that way.

0reactions
mickdekkerscommented, Jan 18, 2019

@nnegrey just a heads up, the docs here still state that GcsDestination can represent a single file.

Also, I’m not sure if this is the best place to ask, but what would be the best way to determine the final output location of the JSON file(s)? The AsyncBatchAnnotateFilesResponse only seems to contain the OutputConfig you pass to asyncBatchAnnotateFiles (so just the gs://${bucketName}/, no gs://${bucketName}/output-x-to-x.json). Is scanning the bucket with the Storage API’s Bucket.getFiles method the recommended way?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Google Gloud Vision API - Return Output as JSON Object ...
Does anyone know how I need to structure the outputConfig object in order to achieve this? async function detectPdfText(bucketName, fileName) { ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found