Have a recommended way of handling DNS resolution in Dataflow workers
See original GitHub issueThis issue has popped up many times:
- User code runs in a JVM inside a docker container on DF worker instances
- The container has standard Google DNS entries in
/etc/resolv.conf
- Thus the user cannot perform lookup from custom DNS servers, e.g. internal or VPN network
To workaround this we probably need:
- Some way to override
/etc/resolv.conf
by adding custom entries. But I read that this file in the container might be read-only? - Perform the override on worker startup (can’t customize workers right now) or maybe
@Setup
in an artificialDoFn
- Maybe manipulate JVM DNS cache so it resolves correctly (https://github.com/alibaba/java-dns-cache-manipulator/).
Issue Analytics
- State:
- Created 6 years ago
- Comments:8 (7 by maintainers)
Top Results From Across the Web
Pipeline options - Dataflow - Google Cloud
To run your pipeline on Dataflow, use DataflowRunner . To run your pipeline locally, use DirectRunner . DirectRunner (local mode).
Read more >Spring Cloud Data Flow Reference Guide
Once you have the Data Flow server installed locally, you probably want to get started with orchestrating the deployment of readily available pre-built ......
Read more >Monitoring Dataflow Memory - Dmitri Lerko
Hey Dmitri, I have explored options of looking into memory monitoring of dataflow. The thing is that we create dataflow jobs using terraform ......
Read more >Dataflow job has stopped acknowledging any message
GroupAlsoByWindowsParDoFn$1.output(GroupAlsoByWindowsParDoFn.java:184) > org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner$1 ...
Read more >Troubleshoot Slow or Stuck Jobs in Google Cloud Dataflow
Slow/Stuck Dataflow jobs can be caused by a number of factors, but. ... jobs 1:15 - Troubleshoot and resolve slow/stuck jobs due to...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
For future reference, you can work around this by using Google Cloud DNS forward zones to point to your internal DNS infrastructure.
Dataflow workers query 169.254.169.254 internally, which resolves using Cloud DNS. If you setup a forward zone in Cloud DNS to point to your own DNS server (you can set up private forward zones that point to internal IPs on your VPC, just make sure to whitelist dns traffic from
35.199.192.0/19
), then you can resolve your own queries.The nice thing about it, is that you can manage DNS settings centrally (no need to change code) for everything on your VPC.
Note: NS records in private forward zones are ignored, this can be a bit confusing if you’re trying to debug things 😃
We’re doing this internally now. Closing this.