`value.connect.schema` is ignored when DelimitedRowFilter is in use
See original GitHub issueHi, I come across with this useful Kafka connector after being disappointed by Azure Blob Storage Source Connector which doesn’t support CSV file format. Love your fine-grained design providing more control over listing, reading, uniquely identifying files and so on.
For my use case, I need to use DelimitedRowFilter with my pre-defined Avro schema, but it seems impossible currently. DelimitedRowFilter requires configuration on either extractColumnName
, autoGenerateColumnNames
, or columns
in order to derive the schema.
org.apache.kafka.common.config.ConfigException: At least one of those parameters should be configured [autoGenerateColumnNames,extractColumnName,columns]
However, what I actually expect it is to pick up my value.connect.schema
instead of deriving from the CSV column headers.
I also notice merge.value.connect.schemas
is introduced recently and merged into master. It can probably resolve my issue according to its docstring. Is there any plan to release v2.4.0?
Version I am using: 2.0.0 Configuration properties:
{
"connector.class": "io.streamthoughts.kafka.connect.filepulse.source.FilePulseSourceConnector",
"tasks.max": "1",
"offset.attributes.string": "uri",
"tasks.file.status.storage.bootstrap.servers": "kafka:29092",
"tasks.file.status.storage.topic.partitions": "1",
"topic": "azure-blob-poc",
"fs.cleanup.policy.class": "io.streamthoughts.kafka.connect.filepulse.fs.clean.LogCleanupPolicy",
"fs.listing.class": "io.streamthoughts.kafka.connect.filepulse.fs.AzureBlobStorageFileSystemListing",
"azure.storage.connection.string": "<redacted>",
"azure.storage.account.name": "<redacted>",
"azure.storage.account.key": "<redacted>",
"azure.storage.container.name": "file-pulse-poc",
"fs.listing.interval.ms": "10000",
"fs.listing.filters": "io.streamthoughts.kafka.connect.filepulse.fs.filter.RegexFileListFilter",
"file.filter.regex.pattern": ".*csv$",
"tasks.reader.class": "io.streamthoughts.kafka.connect.filepulse.fs.reader.AzureBlobStorageRowFileInputReader",
"skip.headers": "1",
"filters": "CsvRowParser,KeyExtractor",
"filters.CsvRowParser.type": "io.streamthoughts.kafka.connect.filepulse.filter.DelimitedRowFilter",
"filters.CsvRowParser.autoGenerateColumnNames": "false",
"filters.CsvRowParser.separator": ",",
"filters.CsvRowParser.trimColumn": "true",
"filters.KeyExtractor.type": "io.streamthoughts.kafka.connect.filepulse.filter.AppendFilter",
"filters.KeyExtractor.field": "$key",
"filters.KeyExtractor.value": "$value.ISINCode",
"value.connect.schema": "<big-chunk-of-avro-json-string>"
}
Thanks! And once again, great work!
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (4 by maintainers)
Top GitHub Comments
Hi @alankan-finocomp, Connect FilePulse 2.4 is now available on confluent-hub you shoud consider using that new version to test the new
merge.value.connect.schemas
property.Please consider closing this issue if your problem is solved! Thanks
Sorry overlooked the message. Happy to have it closed!