question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[schema registry] Avsc details leak through prototypes given to deserialized objects

See original GitHub issue

Forking distinct issue from #10950

Decide if it is OK that deserialized objects are given a prototype from avsc causing a slight observable difference between plain data going in and data coming back.

I debugged through this and put a repro of the underlying cause below.

Summary: avsc gives objects it deserializes a prototype that causes the result returned to have 7 additional inherited enumerable properties: clone, compare, isValid, toBuffer, toString, wrap, wrapped. This is what causes chai to throw on deepEqual comparison as it considers all enumerable properties, including functions AFAICT. JSON.stringify(left) === JSON.stringify(right), which was the workaround pending investigation holds because the extra enumerable properties are functions, which JSON.stringify skips.

So now the question remains: is this OK? I am slightly concerned that consumers of schema-registry-avro would take a dependency on these functions, which might prevent us from replacing the avro serializer with another implementation due to the risk of breaking such uses. But maybe that is too paranoid. @xirzec Thoughts?

So far it appears to be non-trivial to strip these properties out (or make them non-enumerable if we prefer) as the return value can be a graph of objects where each of the objects would have them. I’m looking into whether the resolver arg to fromBuffer might be able to allow me to do this cleanly.

Repro of underlying cause

import * as avro from "avsc";
import { assert } from "chai";

const schema: avro.schema.RecordType = {
  type: "record",
  name: "User",
  namespace: "com.azure.schemaregistry.samples",
  fields: [
    {
      name: "firstName",
      type: "string",
    },
    {
      name: "lastName",
      type: "string",
    },
  ],
};

const type = avro.Type.forSchema(schema);
const value = { firstName: "Nick", lastName: "Guerrera"};
const serialized = type.toBuffer(value); 
const deserialized = type.fromBuffer(serialized);

for (const key in value) {
    console.log(key);
}
console.log("");
for (const key in deserialized) {
    console.log(key);
}

assert.deepEqual(value, deserialized);
> node .\index.js
firstName
lastName

firstName
lastName
clone
compare
isValid
toBuffer
toString
wrap
wrapped

C:\Temp\rp\node_modules\chai\lib\chai\assertion.js:141
      throw new AssertionError(msg, {
      ^
AssertionError: expected { Object (firstName, lastName) } to deeply equal { Object (firstName, lastName) }
    at Object.<anonymous> (C:\Temp\rp\index.js:50:15)
    at Module._compile (internal/modules/cjs/loader.js:1075:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:1096:10)
    at Module.load (internal/modules/cjs/loader.js:940:32)
    at Function.Module._load (internal/modules/cjs/loader.js:781:14)
    at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:72:12)
    at internal/main/run_main_module.js:17:47 {
  showDiff: true,
  actual: { firstName: 'Nick', lastName: 'Guerrera' },
  expected: User { firstName: 'Nick', lastName: 'Guerrera' }
}

image

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
mtthcommented, Oct 14, 2020

Happy to help. Thanks for the kind words 😃.

2reactions
mtthcommented, Oct 11, 2020

Hi there. If you update avsc to 5.5.1, you can use an option to omit these methods:

const avroType = avro.Type.forSchema(JSON.parse(schema), {omitRecordMethods: true});

Note that decoded record values will still have a named constructor for performance reasons. If you’d like to hide this constructor as well, you can do so with a type hook. For example to copy the values into plain objects:

/** Minimal logical type which returns decoded records as plain objects. */
class PlainRecordType extends avro.types.LogicalType {
  _fromValue(obj) { return {...obj}; }
  _toValue(obj) { return obj; }

  /** Returns a type hook wrapping all records and errors with this logical type. */
  static createHook() {
    const visited = new Set();
    return (schema, opts) => {
      const {name, type} = schema;
      if ((type !== 'record' && type !== 'error') || visited.has(name)) {
        return; // Fall back to default processing
      }
      visited.add(name);
      return new PlainRecordType(schema, opts);
    };
  }
}

Sample usage:

const plainType = avro.Type.forSchema({
  type: 'record',
  name: 'Person',
  fields: [{name: 'name', type: 'string'}],
}, {typeHook: PlainRecordType.createHook()});

const buf = plainType.toBuffer({name: 'Ann'});
const val = plainType.fromBuffer(buf);
console.log(val.constructor.name); // Object

You can tweak the logical type’s implementation above to match the API you settle on: _fromValue generates the decoded values exposed to users, _toValue determines the data you’d like to accept when encoding. The logical type documentation has more information, including a couple examples.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Avro Schema Serializer and Deserializer
This document describes how to use Avro schemas with the Apache Kafka® Java client and console tools. Avro Serializer¶. You can plug KafkaAvroSerializer...
Read more >
Kafka tutorial #4 - Avro and the Schema Registry
Apache Avro is a binary serialization format. It relies on schemas (defined in JSON format) that define what fields are present and their...
Read more >
Kafka Schema Registry & Avro: Spring Boot Demo (2 of 2)
This Id is generated and managed by the Schema Registry, with a new unique Id being assigned when the new schema, or new...
Read more >
Azure Schema Registry in Azure Event Hubs - Microsoft Learn
An event producer uses a schema to serialize event payload and publish it to an event broker such as Event Hubs. Event consumers...
Read more >
Protobuf Deserialization - Kowl Documentation
proto files, as well as a mapping what Prototype (not file!) to use for each Kafka topic. The .proto files can be provided...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found