question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Is there a way to perform batch operations across databases and containers(Cosmos DB)

See original GitHub issue

Query/Question I am looking to perform operations across databases and containers to process a large data dump. Here is the situation,

  1. I receive a data dump(large with millions of records) that I import into a database/container(say a) owned by me
  2. I need to read the records one by ones and for each record in the feed I need to ,
    • Check for a value in the record in another container(say b) and database
    • If match is found then read from that other matching record in container B
    • Create a new document in a new container in DB a with values As you can see this whole flow above is 1 operation in the step. Since we have a huge data dump I am looking for the most efficient way of handling this.

Why is this not a Bug or a feature Request? I am not sure if this is feasibly and or other methods exist within the SDK.

Setup (please complete the following information if applicable):

  • OS: PCF deployment
  • IDE: IntelliJ
  • Library/Libraries: Any java library preferably Spring-data-cosmos

Information Checklist Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

  • Query Added
  • Setup information Added

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
TheovanKraaycommented, Aug 18, 2022

@bhattacharyyasom change feed processor is definitely a good approach for this. Spark Connector is a great approach as Kushagra mentioned, but if you find that working with Dataframes does not give you the level of programmability you need for the “unit of work” you outlined above (or you prefer just Java) then recommend just using change feed processor with multiple delegates to handle processing change feed from “container a” in parallel, custom code in each delegate to handle the matching logic to container b, and use bulk api to saturate throughput when writing back to container a. Hope it helps.

1reaction
kushagraThaparcommented, Aug 17, 2022

@bhattacharyyasom - change feed processor support is not present in spring-data-cosmos. worth looking into our spark connector for cosmos db, which supports heavy data loading + computation and processing. Our spark connector supports change feed as well. You can find information on it here -

https://docs.microsoft.com/en-us/azure/cosmos-db/sql/sql-api-sdk-java-spark-v3

Read more comments on GitHub >

github_iconTop Results From Across the Web

Transactional batch operations in Azure Cosmos DB using the ...
Learn how to use TransactionalBatch in the Azure Cosmos DB .NET or Java SDK to perform a group of point operations that either...
Read more >
How to do bulk and transactional batch operations ... - YouTube
Matías Quaranta shows Donovan Brown how to do bulk operations with the Azure Cosmos DB .NET SDK to maximize throughput, and how to...
Read more >
Move multiple documents in bulk with the Azure Cosmos DB ...
The easiest way to learn how to perform a bulk operation is to attempt to push many documents to an Azure Cosmos DB...
Read more >
Azure Cosmos DB service quotas - GitHub
Azure Cosmos DB supports CRUD and query operations against resources like containers, items, and databases. It also supports transactional batch requests ...
Read more >
Uses of Package com.azure.cosmos - NET
Represents a batch of operations against items with the same PartitionKey in a container that will be performed in a transactional manner at...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found