Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`value.connect.schema` is ignored when DelimitedRowFilter is in use

See original GitHub issue

Hi, I come across with this useful Kafka connector after being disappointed by Azure Blob Storage Source Connector which doesn’t support CSV file format. Love your fine-grained design providing more control over listing, reading, uniquely identifying files and so on.

For my use case, I need to use DelimitedRowFilter with my pre-defined Avro schema, but it seems impossible currently. DelimitedRowFilter requires configuration on either extractColumnName, autoGenerateColumnNames, or columns in order to derive the schema.

org.apache.kafka.common.config.ConfigException: At least one of those parameters should be configured [autoGenerateColumnNames,extractColumnName,columns]

However, what I actually expect it is to pick up my value.connect.schema instead of deriving from the CSV column headers.

I also notice merge.value.connect.schemas is introduced recently and merged into master. It can probably resolve my issue according to its docstring. Is there any plan to release v2.4.0?

Version I am using: 2.0.0 Configuration properties:

{
    "connector.class": "io.streamthoughts.kafka.connect.filepulse.source.FilePulseSourceConnector",
    "tasks.max": "1",
    "offset.attributes.string": "uri",
    "tasks.file.status.storage.bootstrap.servers": "kafka:29092",
    "tasks.file.status.storage.topic.partitions": "1",
    "topic": "azure-blob-poc",
    "fs.cleanup.policy.class": "io.streamthoughts.kafka.connect.filepulse.fs.clean.LogCleanupPolicy",

    "fs.listing.class": "io.streamthoughts.kafka.connect.filepulse.fs.AzureBlobStorageFileSystemListing",
    "azure.storage.connection.string":  "<redacted>",
    "azure.storage.account.name": "<redacted>",
    "azure.storage.account.key": "<redacted>",
    "azure.storage.container.name": "file-pulse-poc",

    "fs.listing.interval.ms": "10000",
    "fs.listing.filters": "io.streamthoughts.kafka.connect.filepulse.fs.filter.RegexFileListFilter",
    "file.filter.regex.pattern": ".*csv$",

    "tasks.reader.class": "io.streamthoughts.kafka.connect.filepulse.fs.reader.AzureBlobStorageRowFileInputReader",
    "skip.headers": "1",

    "filters": "CsvRowParser,KeyExtractor",
    "filters.CsvRowParser.type": "io.streamthoughts.kafka.connect.filepulse.filter.DelimitedRowFilter",
    "filters.CsvRowParser.autoGenerateColumnNames": "false",
    "filters.CsvRowParser.separator": ",",
    "filters.CsvRowParser.trimColumn": "true",
    "filters.KeyExtractor.type": "io.streamthoughts.kafka.connect.filepulse.filter.AppendFilter",
    "filters.KeyExtractor.field": "$key",
    "filters.KeyExtractor.value": "$value.ISINCode",
    "value.connect.schema": "<big-chunk-of-avro-json-string>"
  }

Thanks! And once again, great work!

Issue Analytics

State:
Created 2 years ago
Comments:5 (4 by maintainers)

Top GitHub Comments

1reaction

fhussonnoiscommented, Oct 21, 2021

Hi @alankan-finocomp, Connect FilePulse 2.4 is now available on confluent-hub you shoud consider using that new version to test the new merge.value.connect.schemas property.

Please consider closing this issue if your problem is solved! Thanks

0reactions

alankan-finocompcommented, Jan 4, 2022

Sorry overlooked the message. Happy to have it closed!