question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Inconsistency in lat_longs.tsv and ordering.tsv data?

See original GitHub issue

I am checking the file and see that only Asia has region definition on this line: https://github.com/nextstrain/ncov/blob/master/config/lat_longs.tsv#L1596 Other continents do not have this. They still have country and division.

grep -n 'region' lat_longs.tsv
1607:region     Asia    30.451098       86.654576

grep -n 'Europe\|North America' lat_longs.tsv
1061:division   Europe  49.646237       10.799454
1290:division   North America   28.2367447      -97.738017
1601:country    Europe  49.646237       10.799454
1602:country    North America   28.2367447      -97.738017

Whereas in ordering.tsv:

grep -n 'region' ordering.tsv
1652:region     Asia
1653:region     Oceania
1654:region     Africa
1655:region     Europe
1656:region     South America
1657:region     North America

I can fix this but need to understand the intention first. By the way, I guess it maybe because some samples do not have higher resolution data, so continents are taken as countries. But this caused confusion and doubts about data integrity when I see those entries.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:10 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
emmahodcroftcommented, Jun 25, 2020

Wonderful, I’m glad this helped! Yes, there’s lots of different restrictions on data protection, so sometimes this is not possible. If you ever end up conglomerating on a larger scale (New Zealand ended up creating much larger over which they cluster cases, for same reason) feel free to let us know, and we can incorporate those too! (Of course, if they’re made up, you’ll have to tell us the lat-longs!)

1reaction
emmahodcroftcommented, Jun 25, 2020

@emmahodcroft I thought about updating the upsteam ncov repo with Denmark locations because I could see there are locations for cities around the world. It’s only cities and municipalities of Denmark, not higher resolution than that. Are these still not suitable for upstream ncov? Further more, we also submitted data to GISAID. So you (your team) will deal with Denmark data anyway? Or maybe I am missing something still?

Sorry, I think I haven’t explained very well! So yes, we do take location (city usually) and division-level (county/state/canton) data. However, for all the Denmark samples we have, we have no information at these levels - only country. So this is why we don’t have any Danish locations or divisions, and why we wouldn’t add these at this time. We only include locations/divisions/countries where we have that data attached to some kind of sequence. Since for the Denmark sequences they only have ‘Denmark’ - then we only have ‘Denmark’ included.

If we get data which has lower-level Denmark data, then we will be happy to include the corresponding information for these! However, until we do, we don’t add data which isn’t attached to a sequence. Of course, we’d love if the Danish sequences were updated with more geographic info (if allowed) and would then be very happy to add this to our lat_longs & orderings files! Until then, though, you are best to cat as you are doing.

However, I am still wondering whether I should add something beside lat_longs data. For example, a location is mentioned in three files:

Yes - and sorry if this isn’t clear either! You should add additional locations and divisions to orderings.tsv. These are loosely in Geographical order, so that locations within a country are near others within that same country - etc. This is the file that’s used to generate colors_global.tsv - so you don’t need to add anything to this file - if it’s in orderings.tsv it’ll end up in here too.

I hope that helps!

Read more comments on GitHub >

github_iconTop Results From Across the Web

How To Use Tab Separated Value (TSV) files
How To Use Tab Separated Value (TSV) Files. Data tables are presented in text file format (tab delimited). Although this file format allows...
Read more >
CSV & TSV Upload Issues - Qualtrics
This discrepancy between encoding can sometimes cause issues when importing or exporting data from CSV / TSV files into Qualtrics.
Read more >
TSV, Tab-Separated Values - Library of Congress
A tab-separated values (TSV) file is a text format whose primary function is to store data in a table structure where each record...
Read more >
TSV Data Format - AWS Data Pipeline - AWS Documentation
Describes a comma-delimited data format where the column separator is a tab character and the record separator is a newline character.
Read more >
Reading and writing CSV/TSV files with Python
Here's a snippet of a code that reads the data from CSV and TSV formats, ... In order to read the realEstate_trans.tsv file,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found