Language translator node and multiple language docs support
See original GitHub issueIs your feature request related to a problem? Please describe. In one of the project me and my friend are working on where most of textual data is in English and remaining data in other languages like Hindi, Tamil etc. So we have two use cases -
- How to handle multiple language docs support? In insertion and search.
- How to use same data source (let say English) in different language natural search query and present result in same language of query?
Describe the solution you’d like I am thinking of having generic translator node, which support multi language translation model. It can have input and output language configuration support (input language can be auto detected as well). For example as follows -
query (in hindi) --> translator(hindi to english) --> search_pipeline (QA | FAQ | Generator | Summarizer)
|
V
translated docs <--- translator(english to hindi)
It can solve (2), but not sure about (1) regarding how to handle search in multi lingual docs.
Describe alternatives you’ve considered A clear and concise description of any alternative solutions or features you’ve considered.
Additional context Add any other context or screenshots about the feature request here.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:6 (5 by maintainers)
I think cross-lingual QA will be an important feature for haystack in the future and it would be great to write queries in your native language -> find answers in docs of different language -> show / summarize results in native language again. So far, I have seen both approaches (translation-based and end-to-end multilingual models), but haven’t derived yet a clear picture what would be best to have in Haystack. I think we will need to review a few more research articles to understand the current capabilities and limitations. My gut feeling would be that a translator node is a rather practical but only short term solution and that eventually, it’s rather a model that does the translation internally.
Can we learn from this ? Pivot Through English: Reliably Answering Multilingual Questions without Document Retrieval
https://arxiv.org/pdf/2012.14094.pdf