Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

OOM errors in streaming pipeline

See original GitHub issue

Scio version 0.4.7/0.5.0-beta1 Following Scio pipeline throws out of memory errors. Volume of traffic is about 65K/s. Increasing VM size/counts only delays the occurrence of the OOM errors.

     c.customInput("Read From Pubsub", PubsubIO
      .readMessages()
      .withIdAttribute("id")
      .withTimestampAttribute("ts")
      .fromTopic("topic"))
      .withFixedWindows(Duration.standardSeconds(2))
      .filter(e => {
        Try {
          e.getName != null
        }.getOrElse(false)
      })
      .map { e =>
        val name = e.getName.toString
        (name, 1L)
      }
      .sumByKey
      .withWindow
      .map(transformNameStreamsWithWindow)


    def transformArtistStreamsWithWindow(nameStreams: ((String, Long), IntervalWindow)):
      (ByteString, Iterable[Mutation]) = {
      val ((name, streams), window) = nameStreams

      val windowMillis = window.end().getMillis
      val key = s"$ByteString:$windowMillis"
      val mutation = Mutations.newSetCell(
        familyName = FAMILY_NAME,
        columnQualifier = COLUMN_QUALIFIER,
        value = ByteString.copyFrom(Longs.toByteArray(streams))
      )

    (ByteString.copyFromUtf8(key), Iterable(mutation))
  }

Java/Apache Beam version works fine

   final PubsubIO.Read<PubsubMessage> pubsubRead = PubsubIO
        .readMessages()
        .withIdAttribute("id")
        .withTimestampAttribute("ts")
        .fromTopic("topic");

    pipeline.apply("Read from", pubsubRead)
        .apply("Window Fixed",
               Window.into(FixedWindows.of(Duration.standardSeconds(2))))
        .apply("Names", ParDo.of(new GetNames()))
         .apply( Sum.<String>longsPerKey())
        .apply(ParDo.of(new CreateMutation()));

 private static class GetNames extends DoFn<EndSong, KV<String, Long>> {

    private static final long serialVersionUID = 1;

    @ProcessElement
    public void processElement(ProcessContext c) {
      final EName e = c.element();
      if (e.getName() != null) {
        final KV<String, Long> kv = KV.of(e.getName().toString(), 1L);
        c.output(kv);
      }
    }
  }

 private static class CreateMutation extends DoFn<KV<String, Long>,KV<ByteString, Iterable<Mutation>> > {

    private static final long serialVersionUID = 1;

    @ProcessElement
    public void process(ProcessContext c, BoundedWindow window) {
      final long millis = window.maxTimestamp().getMillis();
      final String key = c.element().getKey() + ":" + millis;
      final Long value = c.element().getValue();
      final Mutation mutation = Mutations.newSetCell(
          FAMILY_NAME,
          COLUMN_QUALIFIER,
          ByteString.copyFrom(Longs.toByteArray(value)));
      c.output(KV.<ByteString, Iterable<Mutation>>of(ByteString.copyFromUtf8(key),
                                                     ImmutableList.<Mutation>of(
                                                         mutation)));
      }
    }

To get heap dump run pipeline in DataFlow with this flag --dumpHeapOnOOM. May have to run a few times to get the heap dump.

Looking at the logs for heap dump location: jsonPayload: { job: "2018-02-20_15_01_56-4938495852979082999" logger: "com.google.cloud.dataflow.worker.StreamingDataflowWorker" message: "Execution of work for S0 for key cf5e9028c0d6445eb96d6c86bd3f71f6 failed with out-of-memory. Will not retry locally. Heap dump written to '/var/log/dataflow/heap_dump.hprof'." stage: "S0" thread: "316" work: "cf5e9028c0d6445eb96d6c86bd3f71f6-6846990601398222167" worker: "newrealease-test-02201501-d21b-harness-fbw5" }

Heap dump seem to show a lot KryoState objects.

kryostate

Issue Analytics

State:
Created 6 years ago
Comments:15 (9 by maintainers)

Top GitHub Comments

1reaction

bsmithgallcommented, May 14, 2018

We’re using beam pipelines to handle our streaming cases at the moment. I can bump the version and try a scio pipe and let you know what happens.

0reactions

regadascommented, May 14, 2018

Closing this now since it was fixed in #1143.

Top Results From Across the Web

Troubleshoot Dataflow out of memory errors - Google Cloud

This page provides information about memory usage in Dataflow pipelines and steps for investigating and resolving issues with Dataflow out of memory (OOM) ......

Troubleshoot out-of-memory errors • Palantir

Troubleshoot out-of-memory (OOM) errors. Out-of-memory errors can show up in a job in a few ways: Seeing “Job aborted due to stage failure”...

What is the standard way to handle exception of OOM in ...

The only likely cause of OOM errors is that over time your application uses more and more keyed state and timers.

How we find and fix OOM and memory leaks in Java Services

OOM errors represent the first category of memory issues. ... Unclosed streams and connections represent another cause for memory leaks.

How to prevent OOMs while streaming data to GCS via ...

A common problem I see in streaming data to GCS is out of memory (OOM) errors. Don't panic, you are not alone, it...