Stateless Ingestion causes previous data to be overriden.
See original GitHub issueDescribe the bug
I am using this example to tag columns in a table. One issue I noticed is graph.get_aspect_v2
part where you always have to make a GET request to the server first to obtain all existing tags;
then append if it’s a new tag;
and then emit it to DataHub.
I find this design a little bit odd that client side has to know what all the tags are, and then server side is completely stateless.
I attempted to bypass this getting the aspect and tried out to just construct MetadataChangeProposalWrapper
with GlobalTagsClass(tags=[tag_association_to_add])
no matter what the state is. I noticed that this removes all the other tags. I was expecting that this would append only the tag that I am attempting to add, not remove other tags.
Is this intended by design? Is there a way to change this by having a flag or any other way to submit? One big issue here is the race condition, if I am submitting these changes through kafka events (or even synchronous parallel way) and there happens to be multiple MCPW of the same column, other tags could be lost.
To Reproduce An attempt to make a stateless metadata change would look like this:
public static MetadataChangeProposalWrapper createTagChange(String assetUrn, String column, String tagUrn) throws URISyntaxException {
TagAssociation tagAssociation = new TagAssociation().setTag(TagUrn.createFromString(tagUrn));
GlobalTags globalTags = new GlobalTags().setTags(new TagAssociationArray(tagAssociation));
EditableSchemaFieldInfo editableSchemaFieldInfo = new EditableSchemaFieldInfo().setFieldPath(column).setGlobalTags(globalTags);
EditableSchemaMetadata editableSchemaMetadata = new EditableSchemaMetadata()
.setEditableSchemaFieldInfo(new EditableSchemaFieldInfoArray(editableSchemaFieldInfo))
.setCreated(createCurrentAuditStamp());
return MetadataChangeProposalWrapper.builder()
.entityType("dataset").entityUrn(assetUrn).upsert().aspect(editableSchemaMetadata).build();
}
Then just emiting this with new KafkaEmitter(config).emit(createTagChange(urn,column,myTag))
would result existing tags to be removed.
Expected behavior
(In the above example) if I have 3 tags associated to a column and then I’m adding 4th myTag
. I would expect in this case to have 4 tags including myTag
however, the existing behavior is; it removes 3 tags and there’s only myTag
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:8 (1 by maintainers)
Top GitHub Comments
This is current expected behavior. All MetadataChangeProposals currently come across with ChangeType.UPSERT indicating that it is intended to be a full replacement operation. We are actively working on PATCH changetype semantics and will be rolling it out to different aspects once the behavior is supported.
The current way to get around this is to do a Read -> Modify -> Write
@sarpk By Read -> Modify -> Write, I mean within your application using the SDK to send MCPs you would perform a GET -> some code -> POST. As you pointed out, this does require the application to perform these operations synchronously as there is no locking in this scenario. With Patch, the operation will all be done in a single atomic DB transaction.
@HunterEl We don’t currently have it tracked on the OSS roadmap as the effort was requested by a customer and not the community, but keep an eye out for a PR coming in the next couple weeks or so for the initial work on the OSS side 😄