question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`value.connect.schema` is ignored when DelimitedRowFilter is in use

See original GitHub issue

Hi, I come across with this useful Kafka connector after being disappointed by Azure Blob Storage Source Connector which doesn’t support CSV file format. Love your fine-grained design providing more control over listing, reading, uniquely identifying files and so on.

For my use case, I need to use DelimitedRowFilter with my pre-defined Avro schema, but it seems impossible currently. DelimitedRowFilter requires configuration on either extractColumnName, autoGenerateColumnNames, or columns in order to derive the schema.

org.apache.kafka.common.config.ConfigException: At least one of those parameters should be configured [autoGenerateColumnNames,extractColumnName,columns]

However, what I actually expect it is to pick up my value.connect.schema instead of deriving from the CSV column headers.

I also notice merge.value.connect.schemas is introduced recently and merged into master. It can probably resolve my issue according to its docstring. Is there any plan to release v2.4.0?

Version I am using: 2.0.0 Configuration properties:

{
    "connector.class": "io.streamthoughts.kafka.connect.filepulse.source.FilePulseSourceConnector",
    "tasks.max": "1",
    "offset.attributes.string": "uri",
    "tasks.file.status.storage.bootstrap.servers": "kafka:29092",
    "tasks.file.status.storage.topic.partitions": "1",
    "topic": "azure-blob-poc",
    "fs.cleanup.policy.class": "io.streamthoughts.kafka.connect.filepulse.fs.clean.LogCleanupPolicy",

    "fs.listing.class": "io.streamthoughts.kafka.connect.filepulse.fs.AzureBlobStorageFileSystemListing",
    "azure.storage.connection.string":  "<redacted>",
    "azure.storage.account.name": "<redacted>",
    "azure.storage.account.key": "<redacted>",
    "azure.storage.container.name": "file-pulse-poc",

    "fs.listing.interval.ms": "10000",
    "fs.listing.filters": "io.streamthoughts.kafka.connect.filepulse.fs.filter.RegexFileListFilter",
    "file.filter.regex.pattern": ".*csv$",

    "tasks.reader.class": "io.streamthoughts.kafka.connect.filepulse.fs.reader.AzureBlobStorageRowFileInputReader",
    "skip.headers": "1",

    "filters": "CsvRowParser,KeyExtractor",
    "filters.CsvRowParser.type": "io.streamthoughts.kafka.connect.filepulse.filter.DelimitedRowFilter",
    "filters.CsvRowParser.autoGenerateColumnNames": "false",
    "filters.CsvRowParser.separator": ",",
    "filters.CsvRowParser.trimColumn": "true",
    "filters.KeyExtractor.type": "io.streamthoughts.kafka.connect.filepulse.filter.AppendFilter",
    "filters.KeyExtractor.field": "$key",
    "filters.KeyExtractor.value": "$value.ISINCode",
    "value.connect.schema": "<big-chunk-of-avro-json-string>"
  }

Thanks! And once again, great work!

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
fhussonnoiscommented, Oct 21, 2021

Hi @alankan-finocomp, Connect FilePulse 2.4 is now available on confluent-hub you shoud consider using that new version to test the new merge.value.connect.schemas property.

Please consider closing this issue if your problem is solved! Thanks

0reactions
alankan-finocompcommented, Jan 4, 2022

Sorry overlooked the message. Happy to have it closed!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Processing Filters | Kafka Connect File Pulse - GitHub Pages
These filters are available for use with Kafka Connect File Pulse: ... DelimitedRowFilter, Parses a message field's value containing columns ...
Read more >
Kafka Connect : Can any field in avro schema ignore when ...
You can use Kafka Connect's Single Message Transform (SMT) and more precisely replaceField using blacklist in order to drop a field so that ......
Read more >
SetSchemaMetadata | Confluent Documentation
SetSchemaMetadata$Value ) schema. This SMT can be used to set the schema name, version, or both on Connect records. Since the schema name...
Read more >
kafka mongodb sink connector not generate key value of topic
Hi, my kafka topic has key and value schemas like this : key.avsc { “namespace”: “tx.avro”, “type”: “record”, “name”: “key”, ...
Read more >
AWS Glue Schema Registry
Describes the use of the AWS Glue Schema Registry. ... consumers will be able to read the new schema as the extra email...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found