question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Language translator node and multiple language docs support

See original GitHub issue

Is your feature request related to a problem? Please describe. In one of the project me and my friend are working on where most of textual data is in English and remaining data in other languages like Hindi, Tamil etc. So we have two use cases -

  1. How to handle multiple language docs support? In insertion and search.
  2. How to use same data source (let say English) in different language natural search query and present result in same language of query?

Describe the solution you’d like I am thinking of having generic translator node, which support multi language translation model. It can have input and output language configuration support (input language can be auto detected as well). For example as follows -

query (in hindi) --> translator(hindi to english) --> search_pipeline (QA | FAQ | Generator | Summarizer)
                                                                                                    | 
                                                                                                    V
                                                  translated docs    <---         translator(english to hindi) 

It can solve (2), but not sure about (1) regarding how to handle search in multi lingual docs.

Describe alternatives you’ve considered A clear and concise description of any alternative solutions or features you’ve considered.

Additional context Add any other context or screenshots about the feature request here.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:6 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
tholorcommented, Jan 19, 2021

I think cross-lingual QA will be an important feature for haystack in the future and it would be great to write queries in your native language -> find answers in docs of different language -> show / summarize results in native language again. So far, I have seen both approaches (translation-based and end-to-end multilingual models), but haven’t derived yet a clear picture what would be best to have in Haystack. I think we will need to review a few more research articles to understand the current capabilities and limitations. My gut feeling would be that a translator node is a rather practical but only short term solution and that eventually, it’s rather a model that does the translation internally.

1reaction
Utomo88commented, Jan 2, 2021

Can we learn from this ? Pivot Through English: Reliably Answering Multilingual Questions without Document Retrieval

https://arxiv.org/pdf/2012.14094.pdf

Read more comments on GitHub >

github_iconTop Results From Across the Web

Support multiple languages with Google Translate
Try sending your messages in different languages and see what happens. Set up the Translate API. The sample code comes with the Node...
Read more >
Language support | Cloud Translation
Translations from any language to any language in this list are supported. The list is updated as new languages are added. You can...
Read more >
Localization of Documentation - Read the Docs
Read the Docs supports hosting your docs in multiple languages. There are two different things that we support: A single project written in...
Read more >
Create multilingual SharePoint sites, pages, and news
In this article. Enable multilingual features and choose languages. Create pages for the languages you want. View a translation page on its language...
Read more >
Install a language | Multilingual guide - Drupal
Drupal.org is brought to you in part by the generous support of sponsors like: Nodes. Entities. Taxonomies. Translate all of your content ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found