question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Kafka-python do not send message from Airflow dag

See original GitHub issue

Hi, all. This is the kafka-docker.

version: '2'
services:
  zookeeper:
    image: wurstmeister/zookeeper:3.4.6
    ports:
     - "2181:2181"
  kafka:
    image: wurstmeister/kafka
    container_name: kafka_container
    ports:
        - "9092:9092"                               # expose port
    environment:
        KAFKA_ADVERTISED_HOST_NAME: kafka           # specify the docker host IP at which other containers can reach the broker
        KAFKA_CREATE_TOPICS: 'SomeTopic:1:1,SomeTopic2:1:1'
        KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181     # specify where the broker can reach Zookeeper
        KAFKA_LISTENERS: PLAINTEXT://:9092          # the list of addresses on which the Kafka broker will listen on for incoming connections.
        KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092  # Kafka sends the value of this variable to clients during their connection. After receiving that value, the clients use it for sending/consuming records to/from the Kafka broker.y connect to it.
    volumes:
        - /var/run/docker.sock:/var/run/docker.sock

That is the airflow container:

version: '3.0'
services:
    postgres:
        image: postgres:9.6
        environment:
            - POSTGRES_USER=airflow
            - POSTGRES_PASSWORD=airflow
            - POSTGRES_DB=airflow
        networks:
            - network-mongodb-service_some-nw

        logging:
            options:
                max-size: 10m
                max-file: "3"

    webserver:
        build:
            context: .
        image: puckel/docker-airflow:1.10.9
        restart: always
        depends_on:
            - postgres
        environment:
            - LOAD_EX=n
            - EXECUTOR=Local
            - whatever=${somevar}
        logging:
            options:
                max-size: 10m
                max-file: "3"
        volumes:
            - ./dags:/usr/local/airflow/dags
            - ./requirements.txt:/requirements.txt
        networks:
            - network-mongodb-service_some-nw
            - kafka_default
        external_links:
            - somelink:${somelink}
            - kafka_container:kafka
        ports:
            - "8080:8080"
        command: webserver
        healthcheck:
            test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
            interval: 30s
            timeout: 30s
            retries: 3

networks:
    network-mongodb-service_some-nw:
        external: true
    kafka_default:
        external: true

How that works if just connect via docker exec -it contairerId bash?

>>> from kafka import KafkaProducer
>>> kafka_host = 'kafka:9092'
>>> producer = KafkaProducer(bootstrap_servers=kafka_host,acks='all',retries=0)
>>> producer.send(self.where_to_send,
...                              self.encoded_payload)

Works fine and no problem met. But: If I’ll add identical code to the airflow dag:

Sending (key=None value=b'{"topic": "topic", "msg": "message", "domainName": "somedomain.com"}' headers=[]) to TopicPartition(topic='sometopic', partition=0)
Allocating a new 16384 byte message buffer for TopicPartition(topic='Webtraffic', partition=0)
Waking up the sender since TopicPartition(topic='Webtraffic', partition=0) is either full or getting a new batch
[2020-03-19 15:01:03,994] {{conn.py:378}} INFO - <BrokerConnection node_id=bootstrap-0 host=kafka:9092 <connecting> [IPv4 ('192.168.32.2', 9092)]>: connecting to kafka:9092 [('192.168.32.2', 9092) IPv4]
[2020-03-19 15:01:03,995] {{conn.py:1195}} INFO - Probing node bootstrap-0 broker version
[2020-03-19 15:01:03,995] {{conn.py:407}} INFO - <BrokerConnection node_id=bootstrap-0 host=kafka:9092 <connecting> [IPv4 ('192.168.32.2', 9092)]>: Connection complete.
[2020-03-19 15:01:04,102] {{conn.py:1257}} INFO - Broker version identified as 1.0.0
[2020-03-19 15:01:04,103] {{conn.py:1259}} INFO - Set configuration api_version=(1, 0, 0) to skip auto check_version requests on startup
Received correlation id: 3
[2020-03-19 15:01:04,113] {{parser.py:139}} DEBUG - Received correlation id: 3
Processing response MetadataResponse_v1
[2020-03-19 15:01:04,115] {{parser.py:166}} DEBUG - Processing response MetadataResponse_v1
<BrokerConnection node_id=bootstrap-0 host=kafka:9092 <connected> [IPv4 ('192.168.32.2', 9092)]> Response 3 (10.233640670776367 ms): MetadataResponse_v1(brokers=[(node_id=1001, host='kafka', port=9092, rack=None)], controller_id=1001, topics=[(error_code=0, topic='Social', is_internal=False, partitions=[(error_code=0, partition=0, leader=1001, replicas=[1001], isr=[1001])]), (error_code=0, topic='Webtraffic', is_internal=False, partitions=[(error_code=0, partition=0, leader=1001, replicas=[1001], isr=[1001])]), (error_code=0, topic='__consumer_offsets', is_internal=True, partitions=[(error_code=0, partition=0, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=10, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=20, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=40, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=30, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=9, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=39, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=11, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=31, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=13, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=18, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=22, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=8, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=32, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=43, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=29, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=34, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=1, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=6, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=41, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=27, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=48, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=5, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=15, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=35, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=25, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=46, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=26, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=36, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=44, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=16, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=37, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=17, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=45, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=3, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=4, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=24, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=38, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=33, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=23, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=28, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=2, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=12, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=19, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=14, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=47, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=49, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=42, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=7, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=21, leader=1001, replicas=[1001], isr=[1001])])])
[2020-03-19 15:01:04,121] {{conn.py:1071}} DEBUG - <BrokerConnection node_id=bootstrap-0 host=kafka:9092 <connected> [IPv4 ('192.168.32.2', 9092)]> Response 3 (10.233640670776367 ms): MetadataResponse_v1(brokers=[(node_id=1001, host='kafka', port=9092, rack=None)], controller_id=1001, topics=[(error_code=0, topic='Social', is_internal=False, partitions=[(error_code=0, partition=0, leader=1001, replicas=[1001], isr=[1001])]), (error_code=0, topic='Webtraffic', is_internal=False, partitions=[(error_code=0, partition=0, leader=1001, replicas=[1001], isr=[1001])]), (error_code=0, topic='__consumer_offsets', is_internal=True, partitions=[(error_code=0, partition=0, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=10, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=20, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=40, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=30, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=9, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=39, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=11, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=31, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=13, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=18, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=22, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=8, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=32, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=43, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=29, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=34, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=1, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=6, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=41, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=27, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=48, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=5, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=15, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=35, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=25, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=46, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=26, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=36, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=44, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=16, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=37, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=17, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=45, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=3, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=4, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=24, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=38, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=33, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=23, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=28, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=2, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=12, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=19, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=14, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=47, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=49, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=42, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=7, leader=1001, replicas=[1001], isr=[1001]), (error_code=0, partition=21, leader=1001, replicas=[1001], isr=[1001])])])
Updated cluster metadata to ClusterMetadata(brokers: 1, topics: 3, groups: 0)
[2020-03-19 15:01:04,132] {{cluster.py:325}} DEBUG - Updated cluster metadata to ClusterMetadata(brokers: 1, topics: 3, groups: 0)

And nothing happened. Message not sent. No errors. Nothing. Maybe you’ll have some fresh ideas what I’ve done wrong?

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6

github_iconTop GitHub Comments

1reaction
tvoinarovskyicommented, Mar 20, 2020

Oh, and just a side note:

    'start_date': datetime(date.today().year, date.today().month, date.today().day),

Set the start date to a specific date, you will have a bunch of problems with Admin interface is this is not static. You already do catchup=False, so it will not backfill DAGs if they are missed

1reaction
Sabutobicommented, Mar 20, 2020

@tvoinarovskyi thanks a lot for helping me. I’ve found the solution. Do not know is that perfect but: In the dag execution function, I did only sending the message from “global” producer. But what if initialize produces inside dag executor scope:

def social_daily_load():
    # line below was the solution for me
    producer = KafkaProducer(bootstrap_servers=kafka_host,acks='all',retries=0)
    payload = {
        'topic': 'Webtraffic',        "msg": 'load_webtraffic',        "domainName": 'cleaf.it'
        }
    where_to_send = payload['topic']
    encoded_payload = json.dumps(payload).encode('utf-8')
    producer.send(where_to_send,
                             encoded_payload)
    print('message sent')

Problem: for every message I’ll initialize new KafkaProducer and there can be performance issues with that. Thanks once more @tvoinarovskyi for fresh ideas.

Read more comments on GitHub >

github_iconTop Results From Across the Web

airflow message kafka topic - Stack Overflow
If you set "commit_cadence" to never, this consumer will run from the beginning of the log every time it executes. Share.
Read more >
Not able to consume messages within Airflow environment
Hi everybody! I am currently trying to write a Airflow Operator, which the main idea is to consume some messages and send them...
Read more >
Building Weather Alert Application using Python, Airflow ...
In this article we will see how to build a simple weather alert application using Python, Airflow, Kafka, ksqlDB, Faust and Docker.
Read more >
Keeping your ML model in shape with Kafka, Airflow and ...
Always wanted to learn how to keep your ML model in shape. This article explains how to do this using Kafka, Airflow and...
Read more >
How to use the EmailOperator in the airflow DAG - ProjectPro
Here in this scenario, we will schedule a dag file to run a python function to test the email operator job using the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found