Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

asyncBatchAnnotateFiles filename output concatenates "output-x-to-x.json"

See original GitHub issue

Environment details

OS: windows 10
Node.js version: v9.2.0
npm version:6.5.0
@google-cloud/vision version: ^0.23.0

Steps to reproduce

function processFilename(fileName) { // Path to PDF file within bucket

//  const gcsSourceUri = `gs://${bucketName}/pdfs/${fileName}`;
let gcsSourceUri = `gs://${bucketName}/${fileName}`;
let gcsDestinationUri = `gs://${bucketName}/${fileName}.json`;

let inputConfig = {
    // Supported mime_types are: 'application/pdf' and 'image/tiff'
    mimeType: 'application/pdf',
    gcsSource: {
        uri: gcsSourceUri,
    },
};
let outputConfig = {
    gcsDestination: {
        uri: gcsDestinationUri,
    },
};
//    let features = [{ type: 'DOCUMENT_TEXT_DETECTION', model: "builtin/latest" }];
let features = [{ type: 'DOCUMENT_TEXT_DETECTION' }];
let request = {
    requests: [{
        inputConfig: inputConfig,
        features: features,
        outputConfig: outputConfig,
    }, ],
};

client
    .asyncBatchAnnotateFiles(request)
    .then(results => {
        const operation = results[0];
        // Get a Promise representation of the final result of the job
        operation
            .promise()
            .then(filesResponse => {

                //                    console.log(JSON.stringify(filesResponse));

                let destinationUri = filesResponse[0].responses[0].outputConfig.gcsDestination.uri;
                console.log('Json saved to: ' + destinationUri);

                //          console.log(filesResponse[0].responses);
            })
            .catch(function(error) {
                console.log(error);
            });
    })
    .catch(function(error) {
        console.log(error);
    });

}

for example the input filename: aabb.pdf then the output will be: aabb.pdf.jsonoutput-1-to-1.json

(if the pdf contained 1 page)

Thanks!

Issue Analytics

State:
Created 5 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

nnegreycommented, Jan 18, 2019

Ah. Good catch, let me look into that. I wonder if you have to specify the batch_size as 1 in the OutputConfig to use a filename.

Yea, because it may end up splitting your output to multiple files due to size, so you’ll have to check what’s created. To simplify that check, It’s recommended to use a prefix gs://$bucketName}/prefix/ so that you know only your output will be there. Less searching that way.

0reactions

mickdekkerscommented, Jan 18, 2019

@nnegrey just a heads up, the docs here still state that GcsDestination can represent a single file.

Also, I’m not sure if this is the best place to ask, but what would be the best way to determine the final output location of the JSON file(s)? The AsyncBatchAnnotateFilesResponse only seems to contain the OutputConfig you pass to asyncBatchAnnotateFiles (so just the gs://${bucketName}/, no gs://${bucketName}/output-x-to-x.json). Is scanning the bucket with the Storage API’s Bucket.getFiles method the recommended way?