MySqlToHiveOperator , ”invalid path“ while loding the extracted csv to hive
See original GitHub issueApache Airflow version: 2.0
Kubernetes version (if you are using kubernetes) (use kubectl version
):
Environment:
- Cloud provider or hardware configuration:
- OS (e.g. from /etc/os-release):
- Kernel (e.g.
uname -a
): - Install tools:
- Others:
What happened: I am trying to get data from mysql to hive , with the code:
t2 = MySqlToHiveOperator(
task_id='mysql_to_hive',
sql='select caseNo as case_num from t_ca_detected_case',
hive_table='dsj.casenum_temp',
create=True,
recreate=True,
delimiter=',',
mysql_conn_id='mysql_dsj',
hive_cli_conn_id='hive_cli_default',
start_date=days_ago(2),
owner='airflow',
dag=dag
)
[2020-12-30 21:53:21,631] {taskinstance.py:1038} INFO - Executing <Task(MySqlToHiveOperator): mysql_to_hive> on 2020-12-30T13:53:20.354276+00:00
[2020-12-30 21:53:21,638] {standard_task_runner.py:51} INFO - Started process 54185 to run task
[2020-12-30 21:53:21,643] {standard_task_runner.py:75} INFO - Running: [‘airflow’, ‘tasks’, ‘run’, ‘hello2’, ‘mysql_to_hive’, ‘2020-12-30T13:53:20.354276+00:00’, ‘–job-id’, ‘44’, ‘–pool’, ‘default_pool’, ‘–raw’, ‘–subdir’, ‘DAGS_FOLDER/test/hello2.py’, ‘–cfg-path’, ‘/tmp/tmp1qiqqk_2’]
[2020-12-30 21:53:21,647] {standard_task_runner.py:76} INFO - Job 44: Subtask mysql_to_hive
[2020-12-30 21:53:21,696] {logging_mixin.py:103} INFO - Running <TaskInstance: hello2.mysql_to_hive 2020-12-30T13:53:20.354276+00:00 [running]> on host serv98.
[2020-12-30 21:53:21,744] {taskinstance.py:1232} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=hello2
AIRFLOW_CTX_TASK_ID=mysql_to_hive
AIRFLOW_CTX_EXECUTION_DATE=2020-12-30T13:53:20.354276+00:00
AIRFLOW_CTX_DAG_RUN_ID=manual__2020-12-30T13:53:20.354276+00:00
[2020-12-30 21:53:21,763] {base.py:74} INFO - Using connection to: id: hive_cli_default. Host: 10.2.20.11, Port: 21066, Schema: default, Login: airflow, Password: XXXXXXXX, extra: XXXXXXXX
[2020-12-30 21:53:21,765] {mysql_to_hive.py:141} INFO - Dumping MySQL query results to local file
[2020-12-30 21:53:21,776] {base.py:74} INFO - Using connection to: id: mysql_dsj. Host: 192.168.2.178, Port: 3306, Schema: djzs_db, Login: root, Password: XXXXXXXX, extra: None
[2020-12-30 21:53:21,798] {mysql_to_hive.py:161} INFO - Loading file into Hive
[2020-12-30 21:53:21,798] {hive.py:445} INFO - DROP TABLE IF EXISTS dsj.casenum_temp;
CREATE TABLE IF NOT EXISTS dsj.casenum_temp (
case_num
STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’
STORED AS textfile
;
[2020-12-30 21:53:21,799] {hive.py:248} INFO - beeline -u “jdbc:hive2://10.2.20.11:21066/default” -n airflow -p Ygnet123# -hiveconf airflow.ctx.dag_id=hello2 -hiveconf airflow.ctx.task_id=mysql_to_hive -hiveconf airflow.ctx.execution_date=2020-12-30T13:53:20.354276+00:00 -hiveconf airflow.ctx.dag_run_id=manual__2020-12-30T13:53:20.354276+00:00 -hiveconf airflow.ctx.dag_owner=airflow -hiveconf airflow.ctx.dag_email= -f /tmp/airflow_hiveop_ka2jjicc/tmpnyz766bk
[2020-12-30 21:53:24,679] {hive.py:260} INFO - SLF4J: Class path contains multiple SLF4J bindings.
[2020-12-30 21:53:24,679] {hive.py:260} INFO - SLF4J: Found binding in [jar:file:/app/fi65clients/Hive/Beeline/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
[2020-12-30 21:53:24,679] {hive.py:260} INFO - SLF4J: Found binding in [jar:file:/app/fi65clients/Hive/Beeline/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
[2020-12-30 21:53:24,679] {hive.py:260} INFO - SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
[2020-12-30 21:53:24,692] {hive.py:260} INFO - SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
[2020-12-30 21:53:26,776] {hive.py:260} INFO - Connecting to jdbc:hive2://10.2.20.11:21066/default
[2020-12-30 21:53:27,079] {hive.py:260} INFO - Connected to: Apache Hive (version 3.1.0)
[2020-12-30 21:53:27,080] {hive.py:260} INFO - Driver: Hive JDBC (version 3.1.0)
[2020-12-30 21:53:27,080] {hive.py:260} INFO - Transaction isolation: TRANSACTION_REPEATABLE_READ
[2020-12-30 21:53:27,188] {hive.py:260} INFO - 0: jdbc:hive2://10.2.20.11:21066/default> INFO : Compiling command(queryId=omm_20201230213941_8e02d3e0-56e1-4ad6-aae4-35c2b7baf598): USE default–0; Current sessionId=701dd932-ae51-4edc-84bf-c1575a5536a6
[2020-12-30 21:53:27,188] {hive.py:260} INFO - INFO : Concurrency mode is disabled, not creating a lock manager
[2020-12-30 21:53:27,188] {hive.py:260} INFO - INFO : Semantic Analysis Completed (retrial = false)
[2020-12-30 21:53:27,188] {hive.py:260} INFO - INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
[2020-12-30 21:53:27,189] {hive.py:260} INFO - INFO : Completed compiling command(queryId=omm_20201230213941_8e02d3e0-56e1-4ad6-aae4-35c2b7baf598); Time taken: 0.003 seconds
[2020-12-30 21:53:27,189] {hive.py:260} INFO - INFO : Concurrency mode is disabled, not creating a lock manager
[2020-12-30 21:53:27,189] {hive.py:260} INFO - INFO : Executing command(queryId=omm_20201230213941_8e02d3e0-56e1-4ad6-aae4-35c2b7baf598): USE default–0; Current sessionId=701dd932-ae51-4edc-84bf-c1575a5536a6
[2020-12-30 21:53:27,189] {hive.py:260} INFO - INFO : Starting task [Stage-0:DDL] in serial mode
[2020-12-30 21:53:27,189] {hive.py:260} INFO - INFO : Completed executing command(queryId=omm_20201230213941_8e02d3e0-56e1-4ad6-aae4-35c2b7baf598); Time taken: 0.004 seconds
[2020-12-30 21:53:27,189] {hive.py:260} INFO - INFO : OK
[2020-12-30 21:53:27,189] {hive.py:260} INFO - INFO : Concurrency mode is disabled, not creating a lock manager
[2020-12-30 21:53:27,190] {hive.py:260} INFO - No rows affected (0.079 seconds)
[2020-12-30 21:53:27,364] {hive.py:260} INFO - 0: jdbc:hive2://10.2.20.11:21066/default> INFO : Compiling command(queryId=omm_20201230213941_88bf0601-a224-4a6f-acd7-96bed9dc977d): DROP TABLE IF EXISTS dsj.casenum_temp–0; Current sessionId=701dd932-ae51-4edc-84bf-c1575a5536a6
[2020-12-30 21:53:27,364] {hive.py:260} INFO - INFO : Concurrency mode is disabled, not creating a lock manager
[2020-12-30 21:53:27,364] {hive.py:260} INFO - INFO : Semantic Analysis Completed (retrial = false)
[2020-12-30 21:53:27,364] {hive.py:260} INFO - INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
[2020-12-30 21:53:27,364] {hive.py:260} INFO - INFO : Completed compiling command(queryId=omm_20201230213941_88bf0601-a224-4a6f-acd7-96bed9dc977d); Time taken: 0.01 seconds
[2020-12-30 21:53:27,364] {hive.py:260} INFO - INFO : Concurrency mode is disabled, not creating a lock manager
[2020-12-30 21:53:27,365] {hive.py:260} INFO - INFO : Executing command(queryId=omm_20201230213941_88bf0601-a224-4a6f-acd7-96bed9dc977d): DROP TABLE IF EXISTS dsj.casenum_temp–0; Current sessionId=701dd932-ae51-4edc-84bf-c1575a5536a6
[2020-12-30 21:53:27,365] {hive.py:260} INFO - INFO : Starting task [Stage-0:DDL] in serial mode
[2020-12-30 21:53:27,365] {hive.py:260} INFO - INFO : Completed executing command(queryId=omm_20201230213941_88bf0601-a224-4a6f-acd7-96bed9dc977d); Time taken: 0.149 seconds
[2020-12-30 21:53:27,365] {hive.py:260} INFO - INFO : OK
[2020-12-30 21:53:27,365] {hive.py:260} INFO - INFO : Concurrency mode is disabled, not creating a lock manager
[2020-12-30 21:53:27,365] {hive.py:260} INFO - No rows affected (0.165 seconds)
[2020-12-30 21:53:27,408] {hive.py:260} INFO - 0: jdbc:hive2://10.2.20.11:21066/default> . . . . . . . . . . . . . . . . . . . . > . . . . . . . . . . . . . . . . . . . . > . . . . . . . . . . . . . . . . . . . . > . . . . . . . . . . . . . . . . . . . . > . . . . . . . . . . . . . . . . . . . . > INFO : Compiling command(queryId=omm_20201230213941_9258986c-dbe2-4f5f-8a45-41e9db9aea4c): CREATE TABLE IF NOT EXISTS dsj.casenum_temp (
[2020-12-30 21:53:27,408] {hive.py:260} INFO - case_num
STRING)
[2020-12-30 21:53:27,408] {hive.py:260} INFO - ROW FORMAT DELIMITED
[2020-12-30 21:53:27,408] {hive.py:260} INFO - FIELDS TERMINATED BY ‘,’
[2020-12-30 21:53:27,408] {hive.py:260} INFO - STORED AS textfile–0; Current sessionId=701dd932-ae51-4edc-84bf-c1575a5536a6
[2020-12-30 21:53:27,409] {hive.py:260} INFO - INFO : Concurrency mode is disabled, not creating a lock manager
[2020-12-30 21:53:27,409] {hive.py:260} INFO - INFO : Semantic Analysis Completed (retrial = false)
[2020-12-30 21:53:27,409] {hive.py:260} INFO - INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
[2020-12-30 21:53:27,409] {hive.py:260} INFO - INFO : Completed compiling command(queryId=omm_20201230213941_9258986c-dbe2-4f5f-8a45-41e9db9aea4c); Time taken: 0.004 seconds
[2020-12-30 21:53:27,409] {hive.py:260} INFO - INFO : Concurrency mode is disabled, not creating a lock manager
[2020-12-30 21:53:27,409] {hive.py:260} INFO - INFO : Executing command(queryId=omm_20201230213941_9258986c-dbe2-4f5f-8a45-41e9db9aea4c): CREATE TABLE IF NOT EXISTS dsj.casenum_temp (
[2020-12-30 21:53:27,409] {hive.py:260} INFO - case_num
STRING)
[2020-12-30 21:53:27,409] {hive.py:260} INFO - ROW FORMAT DELIMITED
[2020-12-30 21:53:27,409] {hive.py:260} INFO - FIELDS TERMINATED BY ‘,’
[2020-12-30 21:53:27,409] {hive.py:260} INFO - STORED AS textfile–0; Current sessionId=701dd932-ae51-4edc-84bf-c1575a5536a6
[2020-12-30 21:53:27,409] {hive.py:260} INFO - INFO : Starting task [Stage-0:DDL] in serial mode
[2020-12-30 21:53:27,409] {hive.py:260} INFO - INFO : Completed executing command(queryId=omm_20201230213941_9258986c-dbe2-4f5f-8a45-41e9db9aea4c); Time taken: 0.028 seconds
[2020-12-30 21:53:27,410] {hive.py:260} INFO - INFO : OK
[2020-12-30 21:53:27,410] {hive.py:260} INFO - INFO : Concurrency mode is disabled, not creating a lock manager
[2020-12-30 21:53:27,410] {hive.py:260} INFO - No rows affected (0.041 seconds)
[2020-12-30 21:53:27,418] {hive.py:260} INFO - 0: jdbc:hive2://10.2.20.11:21066/default> 0: jdbc:hive2://10.2.20.11:21066/default> Closing: 0: jdbc:hive2://10.2.20.11:21066/default
[2020-12-30 21:53:27,445] {hive.py:459} INFO - LOAD DATA LOCAL INPATH ‘/tmp/tmpscnxvwz8’ OVERWRITE INTO TABLE dsj.casenum_temp ;
[2020-12-30 21:53:27,447] {hive.py:248} INFO - beeline -u “jdbc:hive2://10.2.20.11:21066/default” -n airflow -p Ygnet123# -hiveconf airflow.ctx.dag_id=hello2 -hiveconf airflow.ctx.task_id=mysql_to_hive -hiveconf airflow.ctx.execution_date=2020-12-30T13:53:20.354276+00:00 -hiveconf airflow.ctx.dag_run_id=manual__2020-12-30T13:53:20.354276+00:00 -hiveconf airflow.ctx.dag_owner=airflow -hiveconf airflow.ctx.dag_email= -f /tmp/airflow_hiveop_hpm6bvo4/tmpuszfib5a [2020-12-30 21:53:30,344] {hive.py:260} INFO - SLF4J: Class path contains multiple SLF4J bindings. [2020-12-30 21:53:30,345] {hive.py:260} INFO - SLF4J: Found binding in [jar:file:/app/fi65clients/Hive/Beeline/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] [2020-12-30 21:53:30,345] {hive.py:260} INFO - SLF4J: Found binding in [jar:file:/app/fi65clients/Hive/Beeline/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] [2020-12-30 21:53:30,345] {hive.py:260} INFO - SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. [2020-12-30 21:53:30,349] {hive.py:260} INFO - SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] [2020-12-30 21:53:32,333] {hive.py:260} INFO - Connecting to jdbc:hive2://10.2.20.11:21066/default [2020-12-30 21:53:32,673] {hive.py:260} INFO - Connected to: Apache Hive (version 3.1.0) [2020-12-30 21:53:32,674] {hive.py:260} INFO - Driver: Hive JDBC (version 3.1.0) [2020-12-30 21:53:32,674] {hive.py:260} INFO - Transaction isolation: TRANSACTION_REPEATABLE_READ [2020-12-30 21:53:32,785] {hive.py:260} INFO - 0: jdbc:hive2://10.2.20.11:21066/default> INFO : Compiling command(queryId=omm_20201230213946_f6efcb01-0914-41c2-a8a6-88e8420c4332): USE default–0; Current sessionId=c4df2d8c-553b-41a4-9998-c1b3572fa0c8 [2020-12-30 21:53:32,785] {hive.py:260} INFO - INFO : Concurrency mode is disabled, not creating a lock manager [2020-12-30 21:53:32,785] {hive.py:260} INFO - INFO : Semantic Analysis Completed (retrial = false) [2020-12-30 21:53:32,785] {hive.py:260} INFO - INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null) [2020-12-30 21:53:32,785] {hive.py:260} INFO - INFO : Completed compiling command(queryId=omm_20201230213946_f6efcb01-0914-41c2-a8a6-88e8420c4332); Time taken: 0.003 seconds [2020-12-30 21:53:32,785] {hive.py:260} INFO - INFO : Concurrency mode is disabled, not creating a lock manager [2020-12-30 21:53:32,786] {hive.py:260} INFO - INFO : Executing command(queryId=omm_20201230213946_f6efcb01-0914-41c2-a8a6-88e8420c4332): USE default–0; Current sessionId=c4df2d8c-553b-41a4-9998-c1b3572fa0c8 [2020-12-30 21:53:32,786] {hive.py:260} INFO - INFO : Starting task [Stage-0:DDL] in serial mode [2020-12-30 21:53:32,786] {hive.py:260} INFO - INFO : Completed executing command(queryId=omm_20201230213946_f6efcb01-0914-41c2-a8a6-88e8420c4332); Time taken: 0.004 seconds [2020-12-30 21:53:32,786] {hive.py:260} INFO - INFO : OK [2020-12-30 21:53:32,786] {hive.py:260} INFO - INFO : Concurrency mode is disabled, not creating a lock manager [2020-12-30 21:53:32,787] {hive.py:260} INFO - No rows affected (0.082 seconds) [2020-12-30 21:53:32,859] {hive.py:260} INFO - 0: jdbc:hive2://10.2.20.11:21066/default> Error: Error while compiling statement: FAILED: SemanticException Line 1:23 Invalid path ‘’/tmp/tmpscnxvwz8’‘: No files matching path file:/tmp/tmpscnxvwz8 (state=42000,code=40000) [2020-12-30 21:53:32,862] {hive.py:260} INFO - Closing: 0: jdbc:hive2://10.2.20.11:21066/default [2020-12-30 21:53:32,905] {taskinstance.py:1396} ERROR - SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/app/fi65clients/Hive/Beeline/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/app/fi65clients/Hive/Beeline/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Connecting to jdbc:hive2://10.2.20.11:21066/default Connected to: Apache Hive (version 3.1.0) Driver: Hive JDBC (version 3.1.0) Transaction isolation: TRANSACTION_REPEATABLE_READ 0: jdbc:hive2://10.2.20.11:21066/default> INFO : Compiling command(queryId=omm_20201230213946_f6efcb01-0914-41c2-a8a6-88e8420c4332): USE default–0; Current sessionId=c4df2d8c-553b-41a4-9998-c1b3572fa0c8 INFO : Concurrency mode is disabled, not creating a lock manager INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null) INFO : Completed compiling command(queryId=omm_20201230213946_f6efcb01-0914-41c2-a8a6-88e8420c4332); Time taken: 0.003 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=omm_20201230213946_f6efcb01-0914-41c2-a8a6-88e8420c4332): USE default–0; Current sessionId=c4df2d8c-553b-41a4-9998-c1b3572fa0c8 INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=omm_20201230213946_f6efcb01-0914-41c2-a8a6-88e8420c4332); Time taken: 0.004 seconds INFO : OK INFO : Concurrency mode is disabled, not creating a lock manager No rows affected (0.082 seconds) 0: jdbc:hive2://10.2.20.11:21066/default> Error: Error while compiling statement: FAILED: SemanticException Line 1:23 Invalid path ‘’/tmp/tmpscnxvwz8’': No files matching path file:/tmp/tmpscnxvwz8 (state=42000,code=40000) Closing: 0: jdbc:hive2://10.2.20.11:21066/default
What you expected to happen: Load mysql table to hive
file should be extracted but no found in expected path
How to reproduce it:
Anything else we need to know:
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (5 by maintainers)
@vikramkoka @eladkal could you please assign this to me. I would like to work on it.
I am seeing the same issue using
HiveCliHook.load_df()
:Error:
Error: Error while compiling statement: FAILED: SemanticException Line 1:23 Invalid path ''/tmp/airflow_hiveop_twe9yc4g/tmpervj2lgz'': No files matching path file:/tmp/airflow_hiveop_twe9yc4g/tmpervj2lgz (state=42000,code=40000)