Possible regression in loading CSVs when in Node.js environment
See original GitHub issuePlease:
- Check for duplicate issues. Please file separate requests as separate issues on GitHub.
- Describe how to reproduce the bug.
- Use the latest Vega version, if possible. Always indicate which version you are using.
-
If you are using Kibana, first read the instructions for posting Kibana Vega issues. - Include error messages or gifs/screenshots of problematic behavior, if applicable.
- Provide an example specification and accessible data. Try to provide the simplest specification that reproduces the problem. To share a specification, use the Vega-Editor and click “Share” to embed a working link, or provide an example spec in JSON, wrapped by triple backticks like this:
Hi folks (cc @jwoLondon) 👋
You probably know that Vega and Vega Lite are used in litvis; I’m opening this issue as a result of investigating https://github.com/gicentre/litvis/issues/27.
It seems like loading CSV files that are present in Vega Specs does not work in the Node.js environment. This was first observed after upgrading to Vega 5.0 and is possibly related to a new data loading approach. Here is an MWE (not litvis-specific):
mkdir /tmp/vega-csv-fetching-mwe
cd /tmp/vega-csv-fetching-mwe
yarn add vega vega-lite
cat <<"EOF" > mwe.js
const { parse, Spec, View } = require("vega");
const { compile } = require("vega-lite");
const generateSpec = (dataFormat) =>
compile({
$schema: "https://vega.github.io/schema/vega-lite/v3.json",
data: {
url: `https://gicentre.github.io/data/bicycleHiresLondon.${dataFormat}`,
},
encoding: {
x: { field: "Month", type: "temporal" },
y: { field: "NumberOfHires", type: "quantitative" },
},
mark: "circle",
}).spec;
(async () => {
for (const dataFormat of ["csv", "json"]) {
const spec = generateSpec(dataFormat);
const view = new View(parse(spec), {
renderer: "none",
}).initialize();
console.log(`\nVega spec for ${dataFormat}\n=====`);
console.log(JSON.stringify(spec));
console.log(`\nResult`);
console.log(await view.toSVG());
}
})();
EOF
node mwe.js
Output:
Vega spec for csv
=====
{...}
Result
<Rather small SVG with no data shown>
Vega spec for json
=====
{...}
Result
<An SVG with data points, much longer than the previous one>
In the expected output, both SVGs are of the same length.
Versions used in the MWE:
vega@5.3.5
vega-lite@3.2.0
Copying vega specs from the standard output into the Vega Editor produces the same correct chart for both CSV and the JSON. This suggests that the problem may be to do with CSV fetching or parsing when outside the browser environment. Setting logLevel: vega.Info
did not help – no issues were revealed.
What are your thoughts?
Issue Analytics
- State:
- Created 4 years ago
- Comments:7 (5 by maintainers)
Top GitHub Comments
@domoritz @kanitw @arvind: You might find this one particularly amusing and/or infuriating! 😅
This is possibly the strangest bug I’ve seen in a while. I first tried to replicate everything on my own using the vega-loader. No problems. However, I only tried csv loading with either no parsing or with complete auto parsing. Both worked. Or, I should say, they appeared to work…
I then tried the specification above (as compiled from Vega-Lite), and indeed it failed. In particular, the data parsing specified by Vega-Lite
{"Month": "date"}
was not being properly applied. The dates were not being parsed, and so the subsequentnull
/NaN
filter inserted by Vega-Lite suppresses all the values and we end up with an empty data set.So now things start to get weird. Why is the parsing step failing? The input appears correct (a loaded, but not yet parsed, CSV string). However, upon closer inspection, the output has not one but two keys for the ‘Month’ field. If you take the first object in the parsed data, here’s what you get:
Wat?! It’s as if the first key value has a hidden empty string inside of it… what could this be?
What is unicode character 65279? ZERO WIDTH NO-BREAK SPACE
What in the world is that doing here? Well I did notice that the input CSV file has not just line breaks but carriage returns. That by itself should not be an issue (one would hope), but it did get me thinking that maybe something fishy is going on within the CSV file itself…
Test 1: If I download the file through my browser and take a look and save it through my text editor (VS Code) and then try to load it locally it works fine. OK.
Test 2: If I instead download the file directly via
curl https://gicentre.github.io/data/bicycleHiresLondon.csv > bicycleHiresLondon.csv
and try to load it locally it breaks as before. Not OK! But, we learn something important: this shows that the problem is not in our node-basedfetch
polyfill, as now we see that node’sfs
module has the same behavior when loading directly from the local file system.My conclusion? The file format is probably not acceptable to node. So let’s get a hexdump. Here’s what we see:
Hmm, what is that
ef bb bf
at the beginning of the byte stream? A little Googling and Wikipedia comes to the rescue: https://en.wikipedia.org/wiki/Byte_order_markHere is a particularly telling passage from the Wikipedia article:
So it appears that the software being used to generate this CSV is producing output that breaks other tools. The solution is simple: generate CSV files through some other means.
But, why does this work online? Perhaps the browser’s loading mechanism handles this (or strips it) for us, whereas node.js does not.