Normalizers, Analyzers, etc aren't copied from fields when cloning Index
See original GitHub issueMaybe related to #957?
My definitions of the ‘base’ Index
along with all filters, analyzers, normalizers:
from elasticsearch_dsl import Index, normalizer, analyzer, char_filter, token_filter
autocomplete_filter = token_filter(
'autocomplete', 'edgeNGram',
min_gram=1,
max_gram=20)
remove_leading_non_alphanum_char_filter = char_filter(
'remove_leading_non_alphanum', 'pattern_replace',
pattern="^(\W|_)+",
replacement="")
sorting_normalizer = normalizer(
'sorting',
filter=["lowercase", "asciifolding"])
default_analyzer = analyzer(
'default',
tokenizer="standard",
filter=["standard", "lowercase", "asciifolding", "stop", "snowball"],
char_filter=["html_strip"])
nostop_analyzer = analyzer(
'nostop',
tokenizer="standard",
filter=["standard", "lowercase", "asciifolding"],
stopwords=[],
char_filter=["html_strip", remove_leading_non_alphanum_char_filter])
autocomplete_analyzer = analyzer(
'autocomplete',
tokenizer="standard",
filter=["standard", "lowercase", "asciifolding", autocomplete_filter],
stopwords=[],
char_filter=["html_strip", remove_leading_non_alphanum_char_filter])
base_index = Index('base')
base_index.analyzer(default_analyzer)
A simple sample Index
and Document
that will generate the error that follows:
# this clone is NOT an issue, the default analyzer settings etc all transfer correctly
i = base_index.clone('user-related')
@i.document
class UserDoc(Document):
name = field.Keyword(
normalizer=sorting_normalizer,
fields={"autocomplete": field.Text(
analyzer=autocomplete_analyzer,
search_analyzer=nostop_analyzer)
})
i.to_dict()
correctly yields the following:
{
"mappings": {
"doc": {
"properties": {
"name": {
"fields": {
"autocomplete": {
"analyzer": "autocomplete",
"search_analyzer": "nostop",
"type": "text"
}
},
"normalizer": "sorting",
"type": "keyword"
}
}
}
},
"settings": {
"analysis": {
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"stopwords": [],
"filter": [
"standard",
"lowercase",
"asciifolding",
"autocomplete"
],
"char_filter": [
"html_strip",
"remove_leading_non_alphanum"
]
},
"default": {
"tokenizer": "standard",
"type": "custom",
"filter": [
"standard",
"lowercase",
"asciifolding",
"stop",
"snowball"
],
"char_filter": [
"html_strip"
]
},
"nostop": {
"type": "custom",
"tokenizer": "standard",
"stopwords": [],
"filter": [
"standard",
"lowercase",
"asciifolding"
],
"char_filter": [
"html_strip",
"remove_leading_non_alphanum"
]
}
},
"normalizer": {
"sorting": {
"type": "custom",
"filter": [
"lowercase",
"asciifolding"
]
}
},
"filter": {
"autocomplete": {
"min_gram": 1,
"max_gram": 20,
"type": "edgeNGram"
}
},
"char_filter": {
"remove_leading_non_alphanum": {
"replacement": "",
"type": "pattern_replace",
"pattern": "^(\\W|_)+"
}
}
}
}
}
But after cloning, the important elements are (incorrectly, I think?) missing; i.clone('user-related-20180905').to_dict()
outputs (note it does keep my default analyzer I added to base_index
):
{
"mappings": {
"doc": {
"properties": {
"name": {
"type": "keyword",
"normalizer": "sorting",
"fields": {
"autocomplete": {
"type": "text",
"search_analyzer": "nostop",
"analyzer": "autocomplete"
}
}
}
}
}
},
"settings": {
"analysis": {
"analyzer": {
"default": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"standard",
"lowercase",
"asciifolding",
"stop",
"snowball"
],
"char_filter": [
"html_strip"
]
}
}
}
}
}
To further illustrate, this the output from a cloned index .create()
is:
PUT http://localhost:9200/user-related-20180905-064032 [status:400 request:0.013s]
Traceback (most recent call last):
File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/vagrant/esdocs/contrib/esdjango/run.py", line 25, in <module>
run(DjangoController)
File "/vagrant/esdocs/utils.py", line 101, in run
controller.run_operation(cmd_parser=parser, **options)
File "/vagrant/esdocs/controller.py", line 85, in run_operation
getattr(self, "index_{}".format(action))(**options)
File "/vagrant/esdocs/controller.py", line 122, in index_rebuild
self._index_create(index, name, set_alias=False)
File "/vagrant/esdocs/controller.py", line 169, in _index_create
index.create(using=self.using)
File "/home/ubuntu/.virtualenvs/myproject/lib/python3.5/site-packages/elasticsearch_dsl/index.py", line 220, in create
self._get_connection(using).indices.create(index=self._name, body=self.to_dict(), **kwargs)
File "/home/ubuntu/.virtualenvs/myproject/lib/python3.5/site-packages/elasticsearch/client/utils.py", line 76, in _wrapped
return func(*args, params=params, **kwargs)
File "/home/ubuntu/.virtualenvs/myproject/lib/python3.5/site-packages/elasticsearch/client/indices.py", line 88, in create
params=params, body=body)
File "/home/ubuntu/.virtualenvs/myproject/lib/python3.5/site-packages/elasticsearch/transport.py", line 318, in perform_request
status, headers_response, data = connection.perform_request(method, url, params, body, headers=headers, ignore=ignore, timeout=timeout)
File "/home/ubuntu/.virtualenvs/myproject/lib/python3.5/site-packages/elasticsearch/connection/http_urllib3.py", line 186, in perform_request
self._raise_error(response.status, raw_data)
File "/home/ubuntu/.virtualenvs/myproject/lib/python3.5/site-packages/elasticsearch/connection/base.py", line 125, in _raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.RequestError: RequestError(400, 'mapper_parsing_exception', 'analyzer [nostop] not found for field [autocomplete]')
Issue Analytics
- State:
- Created 5 years ago
- Comments:11 (4 by maintainers)
Top Results From Across the Web
When Cloning an Issue Why Arent Fields Copying to New ...
Hello, When I clone an issue, all the fields are not copying to the new cloned issue. I want all fields to copy...
Read more >Reindex API | Elasticsearch Guide [8.5] | Elastic
Extracts the document source from the source index and indexes the documents into the destination index. You can copy all documents to the...
Read more >Apache Solr Reference Guide: For Solr 7.4
explains how a Solr schema defines the fields and field types which Solr uses to organize data within the document files it indexes....
Read more >A Neat Trick with Elasticsearch Normalizers - Dainius Jocas
To analyze the textual data Elasticsearch uses analyzers while for ... setup a keyword field with a normalizer with a char_filter . give...
Read more >Geneious Prime User Manual
one, and specify which field of the Genbank document should be copied to the “Name” ... Other Clone Manager formats such as .cx5...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@i.document
is about assigning theDcoument
to theIndex
not the other way around. We should certainly make it clearer in the docs. It is only useful when working with anIndex
object and, with types going away, there is no real reason to do that unless you are doing something very specific.For your use case, please, again, look at the example with alias migration, that is what I would recommend - no ambiguity, no extra decorators, works out of the box with the decorators without the need to redeploy your application when a new index is introduced. Settings are governed by the template and can be changed any time as well. If there is something missing from that example, please let me know.
Thank you!
Thanks @HonzaKral. Again, I understand what you’re saying (although I’ll admit I’d glossed over the ability to pass in an index on Document operations).
That said - and ignoring my use case for now - is it reasonable to expect someone who uses the
@<index>.document
decorator to have to keep that index around to do operations elsewhere in code?Where the same code without the decorator is more intuitive and less fragile:
If you still disagree, then by all means, please close this ticket. I may not understand why it is this way, but I’m willing to accept that it’s like that for a specific reason (although I would obviously still like to understand why, and why my suggestion isn’t valid - still, it’s your project, it’s of course your call).
The user must pick between:
Document
class Index
and convenience of ignoring what the index name is throughout application codeHonestly, I won’t be too upset if you just close this ticket without replying. You’ve put up with a lot of my pushback. Cheers - and thanks for all the effort on the Python bindings!