Inconsistency in lat_longs.tsv and ordering.tsv data?
See original GitHub issueI am checking the file and see that only Asia
has region
definition on this line:
https://github.com/nextstrain/ncov/blob/master/config/lat_longs.tsv#L1596
Other continents do not have this. They still have country
and division
.
grep -n 'region' lat_longs.tsv
1607:region Asia 30.451098 86.654576
grep -n 'Europe\|North America' lat_longs.tsv
1061:division Europe 49.646237 10.799454
1290:division North America 28.2367447 -97.738017
1601:country Europe 49.646237 10.799454
1602:country North America 28.2367447 -97.738017
Whereas in ordering.tsv:
grep -n 'region' ordering.tsv
1652:region Asia
1653:region Oceania
1654:region Africa
1655:region Europe
1656:region South America
1657:region North America
I can fix this but need to understand the intention first. By the way, I guess it maybe because some samples do not have higher resolution data, so continents are taken as countries. But this caused confusion and doubts about data integrity when I see those entries.
Issue Analytics
- State:
- Created 3 years ago
- Comments:10 (6 by maintainers)
Top Results From Across the Web
How To Use Tab Separated Value (TSV) files
How To Use Tab Separated Value (TSV) Files. Data tables are presented in text file format (tab delimited). Although this file format allows...
Read more >CSV & TSV Upload Issues - Qualtrics
This discrepancy between encoding can sometimes cause issues when importing or exporting data from CSV / TSV files into Qualtrics.
Read more >TSV, Tab-Separated Values - Library of Congress
A tab-separated values (TSV) file is a text format whose primary function is to store data in a table structure where each record...
Read more >TSV Data Format - AWS Data Pipeline - AWS Documentation
Describes a comma-delimited data format where the column separator is a tab character and the record separator is a newline character.
Read more >Reading and writing CSV/TSV files with Python
Here's a snippet of a code that reads the data from CSV and TSV formats, ... In order to read the realEstate_trans.tsv file,...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Wonderful, I’m glad this helped! Yes, there’s lots of different restrictions on data protection, so sometimes this is not possible. If you ever end up conglomerating on a larger scale (New Zealand ended up creating much larger over which they cluster cases, for same reason) feel free to let us know, and we can incorporate those too! (Of course, if they’re made up, you’ll have to tell us the lat-longs!)
Sorry, I think I haven’t explained very well! So yes, we do take location (city usually) and division-level (county/state/canton) data. However, for all the Denmark samples we have, we have no information at these levels - only country. So this is why we don’t have any Danish locations or divisions, and why we wouldn’t add these at this time. We only include locations/divisions/countries where we have that data attached to some kind of sequence. Since for the Denmark sequences they only have ‘Denmark’ - then we only have ‘Denmark’ included.
If we get data which has lower-level Denmark data, then we will be happy to include the corresponding information for these! However, until we do, we don’t add data which isn’t attached to a sequence. Of course, we’d love if the Danish sequences were updated with more geographic info (if allowed) and would then be very happy to add this to our
lat_longs
&orderings
files! Until then, though, you are best tocat
as you are doing.Yes - and sorry if this isn’t clear either! You should add additional locations and divisions to
orderings.tsv
. These are loosely in Geographical order, so that locations within a country are near others within that same country - etc. This is the file that’s used to generatecolors_global.tsv
- so you don’t need to add anything to this file - if it’s inorderings.tsv
it’ll end up in here too.I hope that helps!