Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Normalizers, Analyzers, etc aren't copied from fields when cloning Index

See original GitHub issue

Maybe related to #957?

My definitions of the ‘base’ Index along with all filters, analyzers, normalizers:

from elasticsearch_dsl import Index, normalizer, analyzer, char_filter, token_filter

autocomplete_filter = token_filter(
    'autocomplete', 'edgeNGram',
    min_gram=1,
    max_gram=20)

remove_leading_non_alphanum_char_filter = char_filter(
    'remove_leading_non_alphanum', 'pattern_replace',
    pattern="^(\W|_)+",
    replacement="")

sorting_normalizer = normalizer(
    'sorting',
    filter=["lowercase", "asciifolding"])

default_analyzer = analyzer(
    'default',
    tokenizer="standard",
    filter=["standard", "lowercase", "asciifolding", "stop", "snowball"],
    char_filter=["html_strip"])

nostop_analyzer = analyzer(
    'nostop',
    tokenizer="standard",
    filter=["standard", "lowercase", "asciifolding"],
    stopwords=[],
    char_filter=["html_strip", remove_leading_non_alphanum_char_filter])

autocomplete_analyzer = analyzer(
    'autocomplete',
    tokenizer="standard",
    filter=["standard", "lowercase", "asciifolding", autocomplete_filter],
    stopwords=[],
    char_filter=["html_strip", remove_leading_non_alphanum_char_filter])

base_index = Index('base')
base_index.analyzer(default_analyzer)

A simple sample Index and Document that will generate the error that follows:

# this clone is NOT an issue, the default analyzer settings etc all transfer correctly
i = base_index.clone('user-related')

@i.document
class UserDoc(Document):
    name = field.Keyword(
        normalizer=sorting_normalizer,
        fields={"autocomplete": field.Text(
            analyzer=autocomplete_analyzer,
            search_analyzer=nostop_analyzer)
        })

i.to_dict() correctly yields the following:

{
  "mappings": {
    "doc": {
      "properties": {
        "name": {
          "fields": {
            "autocomplete": {
              "analyzer": "autocomplete",
              "search_analyzer": "nostop",
              "type": "text"
            }
          },
          "normalizer": "sorting",
          "type": "keyword"
        }
      }
    }
  },
  "settings": {
    "analysis": {
      "analyzer": {
        "autocomplete": {
          "type": "custom",
          "tokenizer": "standard",
          "stopwords": [],
          "filter": [
            "standard",
            "lowercase",
            "asciifolding",
            "autocomplete"
          ],
          "char_filter": [
            "html_strip",
            "remove_leading_non_alphanum"
          ]
        },
        "default": {
          "tokenizer": "standard",
          "type": "custom",
          "filter": [
            "standard",
            "lowercase",
            "asciifolding",
            "stop",
            "snowball"
          ],
          "char_filter": [
            "html_strip"
          ]
        },
        "nostop": {
          "type": "custom",
          "tokenizer": "standard",
          "stopwords": [],
          "filter": [
            "standard",
            "lowercase",
            "asciifolding"
          ],
          "char_filter": [
            "html_strip",
            "remove_leading_non_alphanum"
          ]
        }
      },
      "normalizer": {
        "sorting": {
          "type": "custom",
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        }
      },
      "filter": {
        "autocomplete": {
          "min_gram": 1,
          "max_gram": 20,
          "type": "edgeNGram"
        }
      },
      "char_filter": {
        "remove_leading_non_alphanum": {
          "replacement": "",
          "type": "pattern_replace",
          "pattern": "^(\\W|_)+"
        }
      }
    }
  }
}

But after cloning, the important elements are (incorrectly, I think?) missing; i.clone('user-related-20180905').to_dict() outputs (note it does keep my default analyzer I added to base_index):

{
  "mappings": {
    "doc": {
      "properties": {
        "name": {
          "type": "keyword",
          "normalizer": "sorting",
          "fields": {
            "autocomplete": {
              "type": "text",
              "search_analyzer": "nostop",
              "analyzer": "autocomplete"
            }
          }
        }
      }
    }
  },
  "settings": {
    "analysis": {
      "analyzer": {
        "default": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "standard",
            "lowercase",
            "asciifolding",
            "stop",
            "snowball"
          ],
          "char_filter": [
            "html_strip"
          ]
        }
      }
    }
  }
}

To further illustrate, this the output from a cloned index .create() is:

PUT http://localhost:9200/user-related-20180905-064032 [status:400 request:0.013s]
Traceback (most recent call last):
  File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/vagrant/esdocs/contrib/esdjango/run.py", line 25, in <module>
    run(DjangoController)
  File "/vagrant/esdocs/utils.py", line 101, in run
    controller.run_operation(cmd_parser=parser, **options)
  File "/vagrant/esdocs/controller.py", line 85, in run_operation
    getattr(self, "index_{}".format(action))(**options)
  File "/vagrant/esdocs/controller.py", line 122, in index_rebuild
    self._index_create(index, name, set_alias=False)
  File "/vagrant/esdocs/controller.py", line 169, in _index_create
    index.create(using=self.using)
  File "/home/ubuntu/.virtualenvs/myproject/lib/python3.5/site-packages/elasticsearch_dsl/index.py", line 220, in create
    self._get_connection(using).indices.create(index=self._name, body=self.to_dict(), **kwargs)
  File "/home/ubuntu/.virtualenvs/myproject/lib/python3.5/site-packages/elasticsearch/client/utils.py", line 76, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/home/ubuntu/.virtualenvs/myproject/lib/python3.5/site-packages/elasticsearch/client/indices.py", line 88, in create
    params=params, body=body)
  File "/home/ubuntu/.virtualenvs/myproject/lib/python3.5/site-packages/elasticsearch/transport.py", line 318, in perform_request
    status, headers_response, data = connection.perform_request(method, url, params, body, headers=headers, ignore=ignore, timeout=timeout)
  File "/home/ubuntu/.virtualenvs/myproject/lib/python3.5/site-packages/elasticsearch/connection/http_urllib3.py", line 186, in perform_request
    self._raise_error(response.status, raw_data)
  File "/home/ubuntu/.virtualenvs/myproject/lib/python3.5/site-packages/elasticsearch/connection/base.py", line 125, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.RequestError: RequestError(400, 'mapper_parsing_exception', 'analyzer [nostop] not found for field [autocomplete]')

Issue Analytics

State:
Created 5 years ago
Comments:11 (4 by maintainers)

Top GitHub Comments

1reaction

honzakralcommented, Sep 17, 2018

@i.document is about assigning the Dcoument to the Index not the other way around. We should certainly make it clearer in the docs. It is only useful when working with an Index object and, with types going away, there is no real reason to do that unless you are doing something very specific.

For your use case, please, again, look at the example with alias migration, that is what I would recommend - no ambiguity, no extra decorators, works out of the box with the decorators without the need to redeploy your application when a new index is introduced. Settings are governed by the template and can be changed any time as well. If there is something missing from that example, please let me know.

Thank you!

0reactions

jaddisoncommented, Sep 15, 2018

Thanks @HonzaKral. Again, I understand what you’re saying (although I’ll admit I’d glossed over the ability to pass in an index on Document operations).

That said - and ignoring my use case for now - is it reasonable to expect someone who uses the @<index>.document decorator to have to keep that index around to do operations elsewhere in code?

# in app/search_base.py
base_index = Index('base')
base_index.settings(...)

------------------

# in users/search.py:
from app.search_base import base_index
user_index = base_index.clone('user-related')
user_index.settings(...)  # user specific index settings

@user_index.document
class UserDoc(Document):
    id=field.Long()

------------------

# in users/views.py
from .search import user_index, UserDoc

def user_list(request):
  users = UserDoc.search(index=user_index._name).query(...).execute()
  # OR hardcoding
  users = UserDoc.search(index='user-related').query(...).execute()


def user_details(request, pk):
  users = UserDoc.get(pk, index=user_index._name)
  # OR hardcoding
  users = UserDoc.get(pk, index='user-related')

Where the same code without the decorator is more intuitive and less fragile:

# in search.py:
class UserDoc(Document):
    id=field.Long()

    class Index:
        name = 'user-related'

------------------

# in views.py
from .search import UserDoc

def user_list(request):
  users = UserDoc.search().query(...).execute()

def user_details(request, pk):
  users = UserDoc.get(pk)

If you still disagree, then by all means, please close this ticket. I may not understand why it is this way, but I’m willing to accept that it’s like that for a specific reason (although I would obviously still like to understand why, and why my suggestion isn’t valid - still, it’s your project, it’s of course your call).

The user must pick between:

having a base index (no settings repetition, etc) but repetition of the index name everywhere
copying common index settings, etc to all Document class Index and convenience of ignoring what the index name is throughout application code

Honestly, I won’t be too upset if you just close this ticket without replying. You’ve put up with a lot of my pushback. Cheers - and thanks for all the effort on the Python bindings!