Improve usage of bulk with Document
See original GitHub issueI know this has been brought up before (#843) but I think it could use a re-visit. The suggested approach of using to_dict
is starting to become less adequate as more features are added to Document.update
, Document.delete
, and Document.save
. Using bulk
with to_dict
requires re-implementing things like optimistic concurrency since that’s a feature layered on top of the result from to_dict
.
Perhaps there is an API which can be created which pulls out the logic from Document.update
, Document.delete
, and Document.save
so that it can be reused for bulk?
Just brainstorming, but perhaps something like a new to_action
method?
class Article(Document):
def to_action(self, action, using=None, index=None, **kwargs):
"""
Create action for document. For keyword arguments for each action,
see the ``update``, ``delete``, and ``save`` methods.
:arg action Action to create, one of 'index', 'create', 'delete' or 'update'
:arg index: elasticsearch index to use, if the ``Document`` is
associated with an index this can be omitted.
:arg using: connection alias to use, defaults to ``'default'``
"""
helpers.bulk(es, (article.to_action('index') for article in articles))
Another option could be to have different methods for each action, which would allow the different arguments for each action to be more clearly broken out.
Yet another option would be to create a kind of ActionBuffer
class which could be provided to the individual Document.update
, Document.delete
, and Document.save
methods and then retrieved with ActionBuffer.get_actions()
. However, now that these methods return values, this style of API wouldn’t work as well since the return value would have to be something like ‘pending’.
@HonzaKral, any thoughts on a API to make it easier to use Document
with bulk
?
Issue Analytics
- State:
- Created 4 years ago
- Comments:7 (5 by maintainers)
Top GitHub Comments
Not completely related, but I think it would save people some research if there was a bulk helper in elasticsearch-dsl itself to handle Document instances, just so we don’t need to import the low level library just for this. Maybe something like this in a
helpers
file would be useful for more people?Thanks.
Actually
to_dict(True)
should include the Optimistic concurrency control, that is a very good point as that is its purpose, thanks for bringing that up, it completely slipped my mind when adding the new code tosave()
. Would that help?There was a big refactoring that happened because of the doc type removal and moving to indices, now it should definitely be more stable as there are no outstanding issues