question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Fail to download task log if there are Chinese characters in dag_id

See original GitHub issue

Apache Airflow version

main (development)

What happened

If there are Chinese characters in dag_id of a dag, downloading logs of tasks which belong to the dag leads to ‘Internal Server Error Page’ image image

What you expected to happen

Here’s the webserver log related to the bug which standalone mode produced:

webserver | [2022-01-26 18:29:15 +0800] [48511] [ERROR] Error handling request /get_logs_with_metadata?dag_id=%E6%B5%8B%E8%AF%95&task_id=sleep&execution_date=2022-01-25T09%3A23%3A42.145023%2B00%3A00&metadata=null&format=file&try_number=1 webserver | Traceback (most recent call last): webserver | File “/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/workers/sync.py”, line 136, in handle webserver | self.handle_request(listener, req, client, addr) webserver | File “/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/workers/sync.py”, line 185, in handle_request webserver | resp.write(item) webserver | File “/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/http/wsgi.py”, line 327, in write webserver | self.send_headers() webserver | File “/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/http/wsgi.py”, line 322, in send_headers webserver | util.write(self.sock, util.to_bytestring(header_str, “latin-1”)) webserver | File “/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/util.py”, line 565, in to_bytestring webserver | return value.encode(encoding) webserver | UnicodeEncodeError: ‘latin-1’ codec can’t encode characters in position 161-162: ordinal not in range(256) webserver | 127.0.0.1 - - [26/Jan/2022:18:29:15 +0800] “GET /get_logs_with_metadata?dag_id=%E6%B5%8B%E8%AF%95&task_id=sleep&execution_date=2022-01-25T09%3A23%3A42.145023%2B00%3A00&metadata=null&format=file&try_number=1 HTTP/1.1” 500 0 “-” “-” webserver | [2022-01-26 18:29:21 +0800] [48508] [ERROR] Error handling request /get_logs_with_metadata?dag_id=%E6%B5%8B%E8%AF%95&task_id=sleep&execution_date=2022-01-25T09%3A23%3A42.145023%2B00%3A00&metadata=null&format=file&try_number=1 webserver | Traceback (most recent call last): webserver | File “/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/workers/sync.py”, line 136, in handle webserver | self.handle_request(listener, req, client, addr) webserver | File “/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/workers/sync.py”, line 185, in handle_request webserver | resp.write(item) webserver | File “/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/http/wsgi.py”, line 327, in write webserver | self.send_headers() webserver | File “/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/http/wsgi.py”, line 322, in send_headers webserver | util.write(self.sock, util.to_bytestring(header_str, “latin-1”)) webserver | File “/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/util.py”, line 565, in to_bytestring webserver | return value.encode(encoding) webserver | UnicodeEncodeError: ‘latin-1’ codec can’t encode characters in position 161-162: ordinal not in range(256) webserver | 127.0.0.1 - - [26/Jan/2022:18:29:21 +0800] “GET /get_logs_with_metadata?dag_id=%E6%B5%8B%E8%AF%95&task_id=sleep&execution_date=2022-01-25T09%3A23%3A42.145023%2B00%3A00&metadata=null&format=file&try_number=1 HTTP/1.1” 500 0 “-” “-” triggerer | [2022-01-26 18:29:43,927] {triggerer_job.py:250} INFO - 0 triggers currently running

How to reproduce

  • I’ve tested in airflow v2.2.0 with celery executor, airflow dev version with standalone mode and airflow v1.10.12 with celery executor. The bug existed in all three version I’ve tested.
  • To reproduce, simply create a dag with some Chinese characters like ‘测试’ as dag_id. After triggering the dag, try to download a log file of any task of the dag through tree view page or graph view page and you will get redirected to some ‘Internal Server Error Page’.

Operating System

macOS Catalina, CentOS 7

Versions of Apache Airflow Providers

No response

Deployment

Other

Deployment details

No response

Anything else

  • Following the error log produced by websever, I checked /opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/http/wsgi.py line 322 and saw util.write(self.sock, util.to_bytestring(header_str, "latin-1"))
  • After changing latin-1 to utf-8, the bug got fixed. The whole function is shown as following, the commented line is added by me.
  •       def send_headers(self):
          if self.headers_sent:
              return
          tosend = self.default_headers()
          tosend.extend(["%s: %s\r\n" % (k, v) for k, v in self.headers])
    
          header_str = "%s\r\n" % "".join(tosend)
          util.write(self.sock, util.to_bytestring(header_str, "latin-1"))
          # util.write(self.sock, util.to_bytestring(header_str, "utf-8"))
          self.headers_sent = True```
    
  • However, gunicorn/http/wsgi.py is not part of airflow code, I haven’t figured out how to fix this without changing this script. May I ask if there is a better way to fix it?

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
eladkalcommented, Feb 19, 2022

I guess this might change in the future there is a good discussion in https://github.com/apache/airflow/issues/18010#issuecomment-912820115 Probably the idea of separating the id from the display name in the UI will happen in future releases.

0reactions
potiukcommented, Jun 13, 2022

It’s not supported currently - but in the future - if you want to make all the changes and PR to make it possible - I think that would be awesome @ramwin

Read more comments on GitHub >

github_iconTop Results From Across the Web

Chinese Characters in Netlogon.log - TechNet - Microsoft
Hello,. I have enabled netlogon logging, and am noticing a few things that I am unable to diagnose after further research.
Read more >
Encoding error in Python with Chinese characters
To me, there are TWO criteria for a successful decoding: firstly that raw_bytes.decode('some_encoding') didn't fail, secondly that the resultant ...
Read more >
SE39596: CHINESE CHARACTERS MISSED DURING ... - IBM
Chinese characters missed during download/upload from RDp v8.0.3 to AS400 The problem is the CCSID of this source physical file is 937 (Traditional...
Read more >
GUI Download Issue with Chinese characters | SAP Community
issue with Chinese characters while downloading the file from SAP to ECC. In 4.7 the Chinese characters are being downloaded (I haven't used...
Read more >
What If They Don't Speak English?
These students are English Language. Learners who speak a language other than English in their homes. The learning of a foreign language is...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found