question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Issue with CSVSchemaGenerator

See original GitHub issue

Hi - I tried to deploy the CSV Spooldir-Connector with auto schema generation in version 2.0.54 Does anybody have an idea what is possibly going wrong here?

2020-12-09 15:09:05,589] INFO [Worker clientId=connect-1, groupId=connect-cluster] Starting connectors and tasks using config offset 2384 (org.apache.kafka.connect.runtime.distributed.DistributedHerder)
[2020-12-09 15:09:05,589] INFO [Worker clientId=connect-1, groupId=connect-cluster] Starting connector csvimport-ps60 (org.apache.kafka.connect.runtime.distributed.DistributedHerder)
[2020-12-09 15:09:05,590] INFO ConnectorConfig values:
        config.action.reload = restart
        connector.class = com.github.jcustenborder.kafka.connect.spooldir.SpoolDirCsvSourceConnector
        errors.log.enable = false
        errors.log.include.messages = false
        errors.retry.delay.max.ms = 60000
        errors.retry.timeout = 0
        errors.tolerance = none
        header.converter = null
        key.converter = class io.confluent.connect.avro.AvroConverter
        name = csvimport-ps60
        tasks.max = 1
        transforms = []
        value.converter = class io.confluent.connect.avro.AvroConverter
 (org.apache.kafka.connect.runtime.ConnectorConfig)
[2020-12-09 15:09:05,590] INFO EnrichedConnectorConfig values:
        config.action.reload = restart
        connector.class = com.github.jcustenborder.kafka.connect.spooldir.SpoolDirCsvSourceConnector
        errors.log.enable = false
        errors.log.include.messages = false
        errors.retry.delay.max.ms = 60000
        errors.retry.timeout = 0
        errors.tolerance = none
        header.converter = null
        key.converter = class io.confluent.connect.avro.AvroConverter
        name = csvimport-test
        tasks.max = 1
        transforms = []
        value.converter = class io.confluent.connect.avro.AvroConverter
 (org.apache.kafka.connect.runtime.ConnectorConfig$EnrichedConnectorConfig)
[2020-12-09 15:09:05,590] INFO Creating connector csvimport-test of type com.github.jcustenborder.kafka.connect.spooldir.SpoolDirCsvSourceConnector (org.apache.kafka.connect.runtime.Worker)
[2020-12-09 15:09:05,591] INFO Instantiated connector csvimport-test with version 0.0.0.0 of type class com.github.jcustenborder.kafka.connect.spooldir.SpoolDirCsvSourceConnector (org.apache.kafka.connect.runtime.Worker)
[2020-12-09 15:09:05,591] INFO SpoolDirCsvSourceConnectorConfig values:
        batch.size = 1000
        cleanup.policy = MOVE
        csv.case.sensitive.field.names = false
        csv.escape.char = 92
        csv.file.charset = UTF-8
        csv.first.row.as.header = true
        csv.ignore.leading.whitespace = true
        csv.ignore.quotations = false
        csv.keep.carriage.return = false
        csv.null.field.indicator = NEITHER
        csv.quote.char = 34
        csv.rfc.4180.parser.enabled = false
        csv.separator.char = 44
        csv.skip.lines = 0
        csv.strict.quotes = false
        csv.verify.reader = true
        empty.poll.wait.ms = 500
        error.path = /tmp/fail
        file.buffer.size.bytes = 131072
        file.minimum.age.ms = 0
        files.sort.attributes = [NameAsc]
        finished.path = /tmp/dest
        halt.on.error = true
        input.file.pattern = (.*?).input
        input.path = /tmp/src
        key.schema =
        metadata.field = metadata
        metadata.location = HEADERS
        parser.timestamp.date.formats = [yyyy-MM-dd'T'HH:mm:ss, yyyy-MM-dd' 'HH:mm:ss]
        parser.timestamp.timezone = UTC
        processing.file.extension = .PROCESSING
        schema.generation.enabled = true
        schema.generation.key.fields = []
        schema.generation.key.name = com.github.jcustenborder.kafka.connect.model.Key
        schema.generation.value.name = com.github.jcustenborder.kafka.connect.model.Value
        task.count = 1
        task.index = 0
        task.partitioner = ByName
        timestamp.field =
        timestamp.mode = PROCESS_TIME
        topic = test
        value.schema =
 (com.github.jcustenborder.kafka.connect.spooldir.SpoolDirCsvSourceConnectorConfig)
[2020-12-09 15:09:05,592] INFO SpoolDirCsvSourceConnectorConfig values:
        batch.size = 1000
        cleanup.policy = MOVE
        csv.case.sensitive.field.names = false
        csv.escape.char = 92
        csv.file.charset = UTF-8
        csv.first.row.as.header = true
        csv.ignore.leading.whitespace = true
        csv.ignore.quotations = false
        csv.keep.carriage.return = false
        csv.null.field.indicator = NEITHER
        csv.quote.char = 34
        csv.rfc.4180.parser.enabled = false
        csv.separator.char = 44
        csv.skip.lines = 0
        csv.strict.quotes = false
        csv.verify.reader = true
        empty.poll.wait.ms = 500
        error.path = /tmp/fail
        file.buffer.size.bytes = 131072
        file.minimum.age.ms = 0
        files.sort.attributes = [NameAsc]
        finished.path = /tmp/dest
        halt.on.error = true
        input.file.pattern = (.*?).input
        input.path = /tmp/src
        key.schema =
        metadata.field = metadata
        metadata.location = HEADERS
        parser.timestamp.date.formats = [yyyy-MM-dd'T'HH:mm:ss, yyyy-MM-dd' 'HH:mm:ss]
        parser.timestamp.timezone = UTC
        processing.file.extension = .PROCESSING
        schema.generation.enabled = true
        schema.generation.key.fields = []
        schema.generation.key.name = com.github.jcustenborder.kafka.connect.model.Key
        schema.generation.value.name = com.github.jcustenborder.kafka.connect.model.Value
        task.count = 1
        task.index = 0
        task.partitioner = ByName
        timestamp.field =
        timestamp.mode = PROCESS_TIME
        topic = test
        value.schema =
 (com.github.jcustenborder.kafka.connect.spooldir.SpoolDirCsvSourceConnectorConfig)
[2020-12-09 15:09:05,593] INFO Key or Value schema was not defined. Running schema generator. (com.github.jcustenborder.kafka.connect.spooldir.AbstractSpoolDirSourceConnector)
[2020-12-09 15:09:05,593] ERROR WorkerConnector{id=csvimport-test} Error while starting connector (org.apache.kafka.connect.runtime.WorkerConnector)
java.lang.NoClassDefFoundError: Could not initialize class com.github.jcustenborder.kafka.connect.spooldir.CsvSchemaGenerator

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
fjahangiricommented, Dec 28, 2020

yes, by the way, It works fine with 2.0.46 and other versions. java.lang.NoClassDefFoundError: Could not initialize class com.github.jcustenborder.kafka.connect.spooldir.CsvSchemaGenerator and also with defined schema, I got this exception : java.lang.NoClassDefFoundError: com/fasterxml/jackson/annotation/JsonKey

1reaction
fjahangiricommented, Dec 28, 2020

hi, I have the same issue in version 2.0.54 I use confluentinc/cp-kafka-connect:5.5.1 image and install with confluent-hub confluent-hub install --no-prompt jcustenborder/kafka-connect-spooldir:latest

Read more comments on GitHub >

github_iconTop Results From Across the Web

How To Do CSV File Validation And Schema Generation
Invalid CSV files create challenges for those building data pipelines. Pipelines value consistency, predictability, and testability as they ensure uninterrupted ...
Read more >
timwis/csv-schema: Analyzes a CSV file and ... - GitHub
This application parses CSV files (including huge ones) within the browser. It analyzes each field to suggest the best database field type, max ......
Read more >
CSV Schema - Digital Preservation @ The National Archives
A text based schema language ( CSV Schema ) for describing data in CSV files ... Tool and API ( CSV Validator )...
Read more >
How to generate a schema from a CSV for a PostgreSQL Copy
Generate SQL statements for one or more CSV files, create execute ... -t, --tabs Specifies that the input CSV file is delimited with...
Read more >
Resolving common problems with CSV files
If either problem exists, we suggest you start from the beginning with generating your .csv file. How to generate CSV data correctly. See...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found