[Search] Improve and normalizes the search data model
See original GitHub issueThings to keep in mind:
- Normalize text inputs fields:
text
,inputs
,words
must be normalized and use a common pattern for all tasks - Several es analyzers for text fields:
standard
andwhitespace
(?) for fine tuning searches. Default asstandard
- What about text fields in metadata ? For now, only terms queries are supported. It’s mean that metadata fields with large content are not enabled to be queries as full text search.
- Created indices should contain mapping info only for its fields. A text classification index should not include mapping info for tokens or text predicted (text2text).
- Review filter fields and align with UI names (if any)
- What about nested fields? like token or metrics info for token classification, or label and its score for text classification. As default, query string dsl does not support nested queries, but it could be nice include some minimal support for that kind of queries.
@dvsrepo @dcfidalgo Anything to include here?
Tasks
To achieve to do the work, we need tackle following tasks (that will be created as separated issues and linked here)
- [Datasets] Avoid using global template for all indices
- [Datasets] Dataset migration mechanisms for each release
- [Datasets] New es document model per task with backward compatibility fields
- [Datasets] Apply migration to new es doc model
- [Datasets] Build searches and aggregations using new doc model
Issue Analytics
- State:
- Created 2 years ago
- Comments:11 (11 by maintainers)
Top Results From Across the Web
Use the CIM to normalize data at search time
You normalize your data to be CIM compliant at search time. See Getting Data In if you need more direction for capturing and...
Read more >Search & Data Modeling
We need to data model the following for making search work: searchable attributes; ranking attributes; attributes for filtering or facets ...
Read more >Splunk Data Models & CIM
In this post, you will find out what Splunk data models and CIM (Common Information Model) are and why they hold that much...
Read more >Splunk CIM Performance Hacks - Deductiv
This has improved over time as Splunk continues to optimize the data model search. No indexes are specified in the CIM searches by...
Read more >How to the Use CIM to Normalize Splunk Data
The CIM data model is a way for Splunk to normalize your data to ... It allows the Splunk end-users and APPs to...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Not, really. The only “problem” is that you cannot select with predicted sentence you use. It will search in all of them. But i think we can assume that
Note: PR recognai/rubrix#1018 introduces breaking changes to version <0.9.0. So we cannot include those changes until v0.11.0 in order to keep compatibility at lease 2 version prior to release