question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Feature request: Option to pass file metadata on file stream itself

See original GitHub issue

Request doesn’t currently have any logic to do transfer-encoding: chunked. It currently tries to always send a content-length header, but form-data cannot just pull that number out of thin air without processing the whole stream.

Ordinarily, this would be solved by passing the knownLength file option, either to form-data directly or to request, however this is not possible when using libraries that themselves use request, such as box-node-sdk.

Currently, I am working around this by taking advantage of form-data’s automatic detection of content-length headers on IncomingMessage instances and its use of the name prop (presumably from Browser File objects) to derive the mime-type, though this also means I’m having the client-side code send the file meta along side the file content itself.

myStream.httpVersion = '1.0'; // value technically doesn't matter, but might as well use a valid value.
myStream.headers = { 'content-length': myStreamKnownLength };
myStream.name = myStreamFileName;

However, it would be nice if there were a more sanctioned way to do this, something like:

myStream._formDataOptions = {
  filename: myStreamFileName,
  knownLength: myStreamKnownLength,
};

Aside: While box-node-sdk lets you pass additional base options to request, i’m not sure every library that uses request does this, so even if request did support transfer-encoding: chunked I’m not certain the end user would always be able to specify that. Being able to pass the file meta straight on the file stream without resorting to tricking form-data would ensure some sort of escape hatch for those end users.

I’m open to doing up a PR if the idea presented here is acceptable.

Issue Analytics

  • State:open
  • Created 6 years ago
  • Comments:9 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
joedskicommented, Jun 11, 2017

I probably should have provided a concrete example of my issue!

My primary concern is not with passing data to form-data itself, since I could just pass knownLength directly and be done with it, nor with passing data to request since that lets you also specify the options to pass to form-data including knownLength. Rather, it is with libraries that do not expose the object handed to request at all.

For instance, when using the box-node-sdk, you can pass some additional options to it to configure all requests it makes using request, but you can’t configure individual requests, and you can’t pass different options for requests which upload versus those which don’t, and you don’t directly create anything for the request body itself.

Basically, when uploading a file, all you have access to is this:

const BoxSDK = require('box-node-sdk');
const sdk = new BoxSDK({ clientID: CLIENT_ID, clientSecret: CLIENT_SECRET });
const boxClient = sdk.getBasicClient(USER_ACCESS_TOKEN);
// no way to pass in file meta here.
boxClient.files.uploadFile(parentFolder.id, fileName, fileStream, callback);

That goes through a couple layers until it finally reaches the internal wrapper around request, but the user of box-node-sdk never never gets to actually see any of that, let alone when request builds the body using form-data.

This means while something like the following work:

boxClient.files.uploadFile(parentFolder.id, fs.createReadStream(somePath), callback);
boxClient.files.uploadFile(parentFolder.id, request(otherResource), callback);

This does not:

// readstream metadata is stuck back on the readstream.  There is nothing on the passthrough stream.
boxClient.files.uploadFile(parentFolder.id, fs.createReadStream(somePath).pipe(new PassThrough()), callback);

Or, more specific to my case, using busboy to handle multipart/form-data requests from the client:

// koa 2 router handler.
async function handleUpload(ctx) {
  const busboy = new Busboy(ctx.req.headers);
  const uploadPromises = [];

  busboy.on('file', (fieldname, file, filename) => {
    uploadPromises.push(new Promise((resolve, reject) => {
      // `file` is a stream.
      // `boxClient` here is attached to the current request context.
      ctx.boxClient.files.uploadFile('0', file, (error, uploadList) => {
        if (error) reject(error);
        else resolve(uploadList);
      });
    }));
  });

  const busboyDone = new Promise((resolve, reject) => {
    busboy.once('end', resolve);
    busboy.once('error', reject);
  });

  ctx.req.pipe(busboy);

  try {
    await busboyDone;
    await Promise.all(uploadPromises);
    ctx.status = 200;
    ctx.body = ':)';
  }
  catch (error) {
    console.error(error);

    ctx.status = 500;
    ctx.body = ':(';
  }
}

In this case, the file will fail to upload because the file stream created by Busboy does not have any information on it that form-data can use to determine a knownSize. The boxClient from box-node-sdk builds a request object to pass to request, including the formData option which causes it to create a body using form-data. request then asks form-data what size the body is going to be because it wants to create a content-length header, but as the file stream created by busboy does not have any metadata on it, neither in the style of a ReadStream nor an IncomingMessage, form-data tells request a smaller size, and request then sets the content-length header with that smaller size, and the request body is truncated and the request closed prematurely.

In my specific case, I got around this by making the client send file metadata before the file itself and then using the above trickery to make form-data pick up on that metadata:

// koa 2 router handler.
async function handleUpload(ctx) {
  const busboy = new Busboy(ctx.req.headers);
  const uploadPromises = [];
  const fileMetas = {};

  // Check fields for any named `(fileFieldname).meta`
  busboy.on('field', (fieldname, val) => {
    if (/\.meta$/.test(fieldname)) {
      const fileFieldname = fieldname.replace(/\.meta$/, '');
      // size, name, type from the browser File object.
      fileMetas[fileFieldname] = JSON.parse(val);
    }
  });

  busboy.on('file', (fieldname, file, filename) => {
    uploadPromises.push(new Promise((resolve, reject) => {
      // Correlate data from `(fileFieldname).meta` with `(fileFieldname)`

      if (!fileMetas[fieldname]) {
        // no meta?  dump file.
        file.resume();
        return reject(new Error(`No file meta for ${fieldname}`));
      }

      // fake it til ya make it.
      file.httpVersion = '1.0';
      file.headers = { 'content-length': fileMetas[fieldname].size };
      file.name = filename;

      // `file` is a stream.
      // `boxClient` here is attached to the current request context.
      ctx.boxClient.files.uploadFile('0', file, (error, uploadList) => {
        if (error) reject(error);
        else resolve(uploadList);
      });
    }));
  });

  const busboyDone = new Promise((resolve, reject) => {
    busboy.once('end', resolve);
    busboy.once('error', reject);
  });

  ctx.req.pipe(busboy);

  try {
    await busboyDone;
    await Promise.all(uploadPromises);
    ctx.status = 200;
    ctx.body = ':)';
  }
  catch (error) {
    console.error(error);

    ctx.status = 500;
    ctx.body = ':(';
  }
}

In the interest of full disclosure, the Box folks have recently added a chunking file upload method intended for very large files or unreliable networks, although they do this by directly breaking things into chunks within their client so that separate parts may be retried. So far as I know, this doesn’t have anything to do with transfer-encoding: chunked. Also not sure about the memory use, though it’s probably not bad. I may try converting the server to that if I have time next sprint.

It also does not absolve my concerns with libraries other than box-node-sdk which themselves wrap around request but may not allow any way to specify the size of a file. Obviously the Correct Way would be to get them to let the library user specify the form data options for each file, but getting that done in a timely or consistent manner isn’t always possible.

Hope that explains the sort of situations I’m concerned with!

0reactions
fomojolacommented, Jul 17, 2017

As an extra vote for this: just spent 2 hours trying to figure out https://github.com/request/request/issues/2499. Currently form-data is broken for anything other than the 3 supported stream types (fs readstreams, request streams and httpIncoming) when really all it needs to know is the file size. I ended up doing something similar to whats described here: adding fake httpVersion and content-length headers, and that worked. A standard way to specify the size for arbitrary stream-compatible objects would really be the way to go.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Working with object metadata - Amazon Simple Storage Service
Name Description Can user modify the value? Date Current date and time. No Content‑Disposition Object presentational information. Yes Content‑Length Object size in bytes. No
Read more >
Setting Metadata on a File - Win32 apps | Microsoft Learn
Retrieve file properties by calling IWMHeaderInfo::GetAttributeByName, passing in the desired Windows Media Format SDK property constant.
Read more >
exiftool Application Documentation
A command-line interface to Image::ExifTool, used for reading and writing meta information in a variety of file types. FILE is one or more...
Read more >
View and edit object metadata | Cloud Storage
In the Google Cloud console, go to the Cloud Storage Buckets page. · In the list of buckets, click on the name of...
Read more >
File System Access API - MDN Web Docs - Mozilla
showOpenFilePicker() and window.showDirectoryPicker() . Once these are called, the file picker presents itself and the user selects either a ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found