Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Custom DateTimeFormat "yyyy-MM-dd HH:mm:ss" in the mapping cannot be parsed

See original GitHub issue

I’m getting this error parsing date time:

15/12/15 16:02:40 ERROR Executor: Exception in task 4.0 in stage 4.0 (TID 10)
org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot invoke method public org.joda.time.DateTime org.joda.time.format.DateTimeFormatter.parseDateTime(java.lang.String)
    at org.elasticsearch.hadoop.util.ReflectionUtils.invoke(ReflectionUtils.java:93)
    at org.elasticsearch.hadoop.util.DateUtils$JodaTime.parseDate(DateUtils.java:105)
    at org.elasticsearch.hadoop.util.DateUtils.parseDate(DateUtils.java:122)
    at org.elasticsearch.spark.serialization.ScalaValueReader.createDate(ScalaValueReader.scala:134)
    at org.elasticsearch.spark.serialization.ScalaValueReader.parseDate(ScalaValueReader.scala:125)
    at org.elasticsearch.spark.serialization.ScalaValueReader$$anonfun$date$1.apply(ScalaValueReader.scala:118)
    at org.elasticsearch.spark.serialization.ScalaValueReader$$anonfun$date$1.apply(ScalaValueReader.scala:118)
    at org.elasticsearch.spark.serialization.ScalaValueReader.checkNull(ScalaValueReader.scala:70)
    at org.elasticsearch.spark.serialization.ScalaValueReader.date(ScalaValueReader.scala:118)
    at org.elasticsearch.spark.serialization.ScalaValueReader.readValue(ScalaValueReader.scala:58)
    at org.elasticsearch.spark.sql.ScalaRowValueReader.readValue(ScalaEsRowValueReader.scala:27)
    at org.elasticsearch.hadoop.serialization.ScrollReader.parseValue(ScrollReader.java:604)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:592)
    at org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:660)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:587)
    at org.elasticsearch.hadoop.serialization.ScrollReader.readHitAsMap(ScrollReader.java:382)
    at org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:317)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:212)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:186)
    at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:408)
    at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:86)
    at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:43)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
    at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99)
    at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
    at org.apache.spark.scheduler.Task.run(Task.scala:88)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at org.elasticsearch.hadoop.util.ReflectionUtils.invoke(ReflectionUtils.java:91)
    ... 33 more
Caused by: java.lang.IllegalArgumentException: Invalid format: "1990-10-11 00:00:00" is malformed at " 00:00:00"
    at org.joda.time.format.DateTimeFormatter.parseDateTime(DateTimeFormatter.java:873)
    ... 38 more

I have followed the documentation of elasticsearch that supports custom date time formats: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-date-format.html

The mapping for the field in the type is as follows:

"ACCOUNTS_Date_Created": {
  "type": "date",
  "format": "yyyy-MM-dd HH:mm:ss"
}

I’m reading the type as follows:

val accounts = EsSparkSQL.esDF(sqlContext, "myindex/ACCOUNTS")

Issue Analytics

State:
Created 8 years ago
Comments:8 (4 by maintainers)

Top GitHub Comments

1reaction

nsphungcommented, Jan 28, 2016

@sophiewachs & @ogirardot find a pretty way for us to work on our date format "format": "basic_date_time" with "elasticsearch-spark" % "2.2.0-rc1"

First, you declare a custom ScalaRowValueReader

import org.elasticsearch.hadoop.serialization.Parser
import org.elasticsearch.spark.sql.ScalaRowValueReader
import org.joda.time.format.ISODateTimeFormat

/**
* Custom reader that has the same behavior than the elasticsearch-spark reader except for the field date
* The elasticsearch-spark reader does not take into account the schema of ES especially the format of dates.
* All dates are considered "date_optional_time"
**/
class SpecificBasicDateTimeReader extends ScalaRowValueReader {
  override def date(value: String, parser: Parser): AnyRef = {
    parser.currentName() match {
      case "date" =>
        new java.sql.Timestamp(ISODateTimeFormat.basicDateTime().parseDateTime(value).getMillis).asInstanceOf[AnyRef]
      case x =>
        super.date(value, parser)
    }
  }
}

Then you just have to register it via ConfigurationOptions.ES_SERIALIZATION_READER_VALUE_CLASS

    val df = EsSparkSQL.esDF(sqlContext,
      Map(
        ConfigurationOptions.ES_RESOURCE_READ -> s"index/doc",
        ConfigurationOptions.ES_INDEX_READ_MISSING_AS_EMPTY -> "true",
        "es.read.field.include" -> "meta,id,date,source",
        // Option to specify a serialization reader
        ConfigurationOptions.ES_SERIALIZATION_READER_VALUE_CLASS -> classOf[SpecificBasicDateTimeReader].getCanonicalName
      ))

This works for us. I hope it helps.

0reactions

Heltmancommented, Sep 11, 2019

it’s so bad