question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Dataset Information:

The HC4 collection that is accepted at ECIR 2022

<brief description>

Links to Resources:

https://github.com/hltcoe/HC4/tree/main/resouces/hc4

Dataset ID(s) & supported entities:

  • Dataset ID: hc4/{language id: zh, fa, ru}/{train, dev, test}
  • Will have {Chinese, Farsi, Russian} documents, English queries(title/description/narrative), an English report associated with each topic and qrels.

Checklist

Mark each task once completed. All should be checked prior to merging a new dataset.

  • Dataset definition (in ir_datasets/datasets/[topid].py)
  • Tests (in tests/integration/[topid].py)
  • Metadata generated (using ir_datasets generate_metadata command, should appear in ir_datasets/etc/metadata.json)
  • Documentation (in ir_datasets/etc/[topid].yaml)
  • Downloadable content (in ir_datasets/etc/downloads.json)
    • Download verification action (in .github/workflows/verify_downloads.yml). Only one needed per topid.
    • Any small public files from NIST (or other potentially troublesome files) mirrored in https://github.com/seanmacavaney/irds-mirror/. Mirrored status properly reflected in downloads.json.

Additional comments/concerns/ideas/etc.

The document id, qrels, and topics will be distributed through a public github repository. Users need to download the actual documents through Common Crawl. Script for downloading and validating will be provided along with the doc ids.

The structure will be very similar to the future NeuCLIR collection. Whether these two collections will be distributed through the same repository is TBD.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:11 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
seanmacavaneycommented, Jan 26, 2022

Yeah, that’s why I was saying in this edge case, the user would have to take the intersection of qids in hc4/fa/train and hc4/zh/train – leaving just Topic 3. They’d also have to filter the qrels, true.

1reaction
eugene-yangcommented, Jan 7, 2022

Thanks, Sean. This makes sense. The top-level should be just placeholder. At this point, I don’t think combining all three languages makes sense. Even though there are some topics that span across languages (i.e. same title and description), but the narratives are different. So they should be considered different queries.

Reports range from 1 to 5 paragraphs. Conceptually, they are written by the analysts prior to the search to reflect some background of the information need.

Read more comments on GitHub >

github_iconTop Results From Across the Web

ODROID-HC4
ODROID-HC4 is new Home-Cloud platform based on the same ARM CPU as the ODROID-C4. We adopted a 12nm fabricated energy efficient 1.8Ghz Cortex-A55...
Read more >
Health Career College Core Curriculum (HC4) Program
Health Careers College Core Curriculum (HC4) is a supported program for adult learners with little or no experience in higher education who are...
Read more >
HC4 LED 4" Downlight Series - Cooper Lighting
The HC4 recessed 4" downlight is offered with narrow, medium, or wide beam reflectors or wall wash reflector. Installation options include new ...
Read more >
HC4 Ligand Summary Page - RCSB PDB
HC4 ; Name, 4'-HYDROXYCINNAMIC ACID ; Synonyms, PARA-COUMARIC ACID ; Identifiers, (E)-3-(4-hydroxyphenyl)prop-2-enoic acid ; Formula, C9 H8 O ; Molecular Weight ...
Read more >
HC-4 Health Care Coverage Questionnaire
TYPE 2 – A reimbursement type plan which requires the prepaid health care contractor, such as HMSA, to defray or reimburse the expenses...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found