question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Custom DateTimeFormat "yyyy-MM-dd HH:mm:ss" in the mapping cannot be parsed

See original GitHub issue

I’m getting this error parsing date time:

15/12/15 16:02:40 ERROR Executor: Exception in task 4.0 in stage 4.0 (TID 10)
org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot invoke method public org.joda.time.DateTime org.joda.time.format.DateTimeFormatter.parseDateTime(java.lang.String)
    at org.elasticsearch.hadoop.util.ReflectionUtils.invoke(ReflectionUtils.java:93)
    at org.elasticsearch.hadoop.util.DateUtils$JodaTime.parseDate(DateUtils.java:105)
    at org.elasticsearch.hadoop.util.DateUtils.parseDate(DateUtils.java:122)
    at org.elasticsearch.spark.serialization.ScalaValueReader.createDate(ScalaValueReader.scala:134)
    at org.elasticsearch.spark.serialization.ScalaValueReader.parseDate(ScalaValueReader.scala:125)
    at org.elasticsearch.spark.serialization.ScalaValueReader$$anonfun$date$1.apply(ScalaValueReader.scala:118)
    at org.elasticsearch.spark.serialization.ScalaValueReader$$anonfun$date$1.apply(ScalaValueReader.scala:118)
    at org.elasticsearch.spark.serialization.ScalaValueReader.checkNull(ScalaValueReader.scala:70)
    at org.elasticsearch.spark.serialization.ScalaValueReader.date(ScalaValueReader.scala:118)
    at org.elasticsearch.spark.serialization.ScalaValueReader.readValue(ScalaValueReader.scala:58)
    at org.elasticsearch.spark.sql.ScalaRowValueReader.readValue(ScalaEsRowValueReader.scala:27)
    at org.elasticsearch.hadoop.serialization.ScrollReader.parseValue(ScrollReader.java:604)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:592)
    at org.elasticsearch.hadoop.serialization.ScrollReader.map(ScrollReader.java:660)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:587)
    at org.elasticsearch.hadoop.serialization.ScrollReader.readHitAsMap(ScrollReader.java:382)
    at org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:317)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:212)
    at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:186)
    at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:408)
    at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:86)
    at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:43)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
    at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99)
    at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
    at org.apache.spark.scheduler.Task.run(Task.scala:88)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at org.elasticsearch.hadoop.util.ReflectionUtils.invoke(ReflectionUtils.java:91)
    ... 33 more
Caused by: java.lang.IllegalArgumentException: Invalid format: "1990-10-11 00:00:00" is malformed at " 00:00:00"
    at org.joda.time.format.DateTimeFormatter.parseDateTime(DateTimeFormatter.java:873)
    ... 38 more

I have followed the documentation of elasticsearch that supports custom date time formats: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-date-format.html

The mapping for the field in the type is as follows:

"ACCOUNTS_Date_Created": {
  "type": "date",
  "format": "yyyy-MM-dd HH:mm:ss"
}

I’m reading the type as follows:

val accounts = EsSparkSQL.esDF(sqlContext, "myindex/ACCOUNTS")

Issue Analytics

  • State:closed
  • Created 8 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
nsphungcommented, Jan 28, 2016

@sophiewachs & @ogirardot find a pretty way for us to work on our date format "format": "basic_date_time" with "elasticsearch-spark" % "2.2.0-rc1"

First, you declare a custom ScalaRowValueReader

import org.elasticsearch.hadoop.serialization.Parser
import org.elasticsearch.spark.sql.ScalaRowValueReader
import org.joda.time.format.ISODateTimeFormat

/**
* Custom reader that has the same behavior than the elasticsearch-spark reader except for the field date
* The elasticsearch-spark reader does not take into account the schema of ES especially the format of dates.
* All dates are considered "date_optional_time"
**/
class SpecificBasicDateTimeReader extends ScalaRowValueReader {
  override def date(value: String, parser: Parser): AnyRef = {
    parser.currentName() match {
      case "date" =>
        new java.sql.Timestamp(ISODateTimeFormat.basicDateTime().parseDateTime(value).getMillis).asInstanceOf[AnyRef]
      case x =>
        super.date(value, parser)
    }
  }
}

Then you just have to register it via ConfigurationOptions.ES_SERIALIZATION_READER_VALUE_CLASS

    val df = EsSparkSQL.esDF(sqlContext,
      Map(
        ConfigurationOptions.ES_RESOURCE_READ -> s"index/doc",
        ConfigurationOptions.ES_INDEX_READ_MISSING_AS_EMPTY -> "true",
        "es.read.field.include" -> "meta,id,date,source",
        // Option to specify a serialization reader
        ConfigurationOptions.ES_SERIALIZATION_READER_VALUE_CLASS -> classOf[SpecificBasicDateTimeReader].getCanonicalName
      ))

This works for us. I hope it helps.

0reactions
Heltmancommented, Sep 11, 2019

it’s so bad

Read more comments on GitHub >

github_iconTop Results From Across the Web

Java format yyyy-MM-dd'T'HH:mm:ss.SSSz to ... - Stack Overflow
SSSz format to yyyy-mm-dd HH:mm:ss, which should be easy but I can't get it to work. A date that has to be parsed...
Read more >
10 Examples to DateTimeFormatter in Java 8 to Parse, Format ...
In this example, we are creating a custom DateTimeFormatter pattern to show dates in an Indian date format, like dd-MM-yyyy (16-06-2016).
Read more >
format | Elasticsearch Guide [8.5] | Elastic
A generic ISO datetime parser, where the date must include the year at a minimum, and the time (separated by T ), is...
Read more >
DateTimeFormatter (Java Platform SE 8 ) - Oracle Help Center
This returns an immutable formatter capable of formatting and parsing the ISO-8601 extended offset time format. The format consists of:
Read more >
Formatting Dates and Times | ICU Documentation
DateFormat helps format and parse dates for any locale. ... the DateTimePatternGenerator can map a custom selection of time and date fields, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found