question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Improve usage of bulk with Document

See original GitHub issue

I know this has been brought up before (#843) but I think it could use a re-visit. The suggested approach of using to_dict is starting to become less adequate as more features are added to Document.update, Document.delete, and Document.save. Using bulk with to_dict requires re-implementing things like optimistic concurrency since that’s a feature layered on top of the result from to_dict.

Perhaps there is an API which can be created which pulls out the logic from Document.update, Document.delete, and Document.save so that it can be reused for bulk?

Just brainstorming, but perhaps something like a new to_action method?

class Article(Document):
    def to_action(self, action, using=None, index=None, **kwargs):
        """
        Create action for document. For keyword arguments for each action,
        see the ``update``, ``delete``, and ``save`` methods.

        :arg action Action to create, one of 'index', 'create', 'delete' or 'update'
        :arg index: elasticsearch index to use, if the ``Document`` is
            associated with an index this can be omitted.
        :arg using: connection alias to use, defaults to ``'default'``
        """

helpers.bulk(es, (article.to_action('index') for article in articles))

Another option could be to have different methods for each action, which would allow the different arguments for each action to be more clearly broken out.

Yet another option would be to create a kind of ActionBuffer class which could be provided to the individual Document.update, Document.delete, and Document.save methods and then retrieved with ActionBuffer.get_actions(). However, now that these methods return values, this style of API wouldn’t work as well since the return value would have to be something like ‘pending’.

@HonzaKral, any thoughts on a API to make it easier to use Document with bulk?

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
dpasqualincommented, Dec 10, 2019

Not completely related, but I think it would save people some research if there was a bulk helper in elasticsearch-dsl itself to handle Document instances, just so we don’t need to import the low level library just for this. Maybe something like this in a helpers file would be useful for more people?

from elasticsearch.helpers import bulk as _bulk
def bulk(client, actions, stats_only=False, *args, **kwargs):
    include_meta = kwargs.pop("include_meta", False)
    skip_empty = kwargs.pop("skip_empty", True)
    actions = (i.to_dict(include_meta=include_meta, skip_empty=skip_empty) for i in actions)
    return  _bulk(client, actions, stats_only, args, kwargs)

Thanks.

1reaction
honzakralcommented, May 20, 2019

To currently replicate the behavior of Document.save using just Document.to_dict(True) you need to also handle the meta fields, and now optimistic concurrency as well.

Actually to_dict(True) should include the Optimistic concurrency control, that is a very good point as that is its purpose, thanks for bringing that up, it completely slipped my mind when adding the new code to save(). Would that help?

you can’t easily use new versions of elasticsearch-dsl-py as that project chose to use inheritance, making it fragile when changes to elasticsearch-dsl-py happen.

There was a big refactoring that happened because of the doc type removal and moving to indices, now it should definitely be more stable as there are no outstanding issues

Read more comments on GitHub >

github_iconTop Results From Across the Web

Tune for indexing speed | Elasticsearch Guide [8.5] | Elastic
Use bulk requestsedit. Bulk requests will yield much better performance than single-document index requests. In order to know the optimal size of a...
Read more >
[Bulk Action] Bulk Export Document Details Using API – Have an ...
We are trying to call the document list API and then use the document IDs to fetch all the document information ... Hence...
Read more >
The Complete Guide to Increasing Your Elasticsearch Write ...
A simple solution here is to “buffer” those user changes using Redis, Kafka, or any other means and then aggregate those multiple document...
Read more >
How Can Bulk Document Scanning Help Businesses Digitize
Document scanning services help organizations convert large volume documents to digital archives and provide proper indexing.
Read more >
Use mail merge for bulk email, letters, labels, and envelopes
Document types · Letters that include a personalized greeting. · Email where each recipient's address is the only address on the To line....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found