Change of behavior of IncludeFiles on AWS Batch between versions 2.0.3 and 2.2.2
See original GitHub issueHi,
I am trying to upgrade to Metaflow’s latest version, but the code that was running fine under 2.0.3
is now breaking.
Here’s a minimal reproducible example of my problem.
I have these files in a directory.
$ tree
.
├── pipeline.py
├── query_one.sql
├── query_two.sql
└── sql_list.json
sql_list.json
contains a list of SQL files that I wand to load dynamically into my flow. Usually, this is a very long list.
# sql_list.json
{
"queries": [
{"name": "query_one", "full_path": "query_one.sql"},
{"name": "query_two", "full_path": "query_two.sql"}
]
}
I am running python3 pipeline.py run --with batch
:
# pipeline.py
import json
from pathlib import Path
from metaflow import FlowSpec, step, IncludeFile
class Flow(FlowSpec):
@step
def start(self):
print(self.query_one)
print(self.query_two)
self.next(self.end)
@step
def end(self):
...
if __name__ == "__main__":
def include_files(flow):
file_list_path = "sql_list.json"
if Path(file_list_path).exists():
with open(file_list_path) as f:
content = json.load(f)
for query in content['queries']:
name = query["name"]
path = query["full_path"]
setattr(flow, name, IncludeFile(name, default=path, help=""))
return flow
Flow = include_files(Flow)
Flow()
2.0.3
With version 2.0.3, I would get the desired output. That is, Metaflow would print the content of query_one.sql
and query_two.sql
.
2020-08-25 14:58:10.738 [1598360287065934/start/1 (pid 14046)] [1e05c59b-6007-42b9-906f-039a65c4f6b0] Task is starting (status SUBMITTED)...
2020-08-25 14:58:11.831 [1598360287065934/start/1 (pid 14046)] [1e05c59b-6007-42b9-906f-039a65c4f6b0] Task is starting (status RUNNABLE)...
2020-08-25 14:58:12.973 [1598360287065934/start/1 (pid 14046)] [1e05c59b-6007-42b9-906f-039a65c4f6b0] Task is starting (status STARTING)...
2020-08-25 14:58:14.101 [1598360287065934/start/1 (pid 14046)] [1e05c59b-6007-42b9-906f-039a65c4f6b0] Task is starting (status RUNNING)...
2020-08-25 14:58:20.147 [1598360287065934/start/1 (pid 14046)] [1e05c59b-6007-42b9-906f-039a65c4f6b0] Setting up task environment.
2020-08-25 14:58:28.542 [1598360287065934/start/1 (pid 14046)] [1e05c59b-6007-42b9-906f-039a65c4f6b0] Downloading code package.
2020-08-25 14:58:28.543 [1598360287065934/start/1 (pid 14046)] [1e05c59b-6007-42b9-906f-039a65c4f6b0] Code package downloaded.
2020-08-25 14:58:29.761 [1598360287065934/start/1 (pid 14046)] [1e05c59b-6007-42b9-906f-039a65c4f6b0] Task is starting.
2020-08-25 14:58:29.761 [1598360287065934/start/1 (pid 14046)] [1e05c59b-6007-42b9-906f-039a65c4f6b0] SELECT 1
2020-08-25 14:58:29.762 [1598360287065934/start/1 (pid 14046)] [1e05c59b-6007-42b9-906f-039a65c4f6b0] SELECT 2
2.2.2
With version 2.2.2, instead of the contents of the files listed in sql_list.json
, I get some kind of reference to where they are stored in S3.
2020-08-25 14:59:29.440 [1598360366549490/start/1 (pid 19014)] [4d6263d9-81a7-417c-a618-d990edbed596] Task is starting (status SUBMITTED)...
2020-08-25 14:59:31.657 [1598360366549490/start/1 (pid 19014)] [4d6263d9-81a7-417c-a618-d990edbed596] Task is starting (status STARTING)...
2020-08-25 14:59:35.102 [1598360366549490/start/1 (pid 19014)] [4d6263d9-81a7-417c-a618-d990edbed596] Task is starting (status RUNNING)...
2020-08-25 14:59:42.265 [1598360366549490/start/1 (pid 19014)] [4d6263d9-81a7-417c-a618-d990edbed596] Setting up task environment.
2020-08-25 14:59:49.440 [1598360366549490/start/1 (pid 19014)] [4d6263d9-81a7-417c-a618-d990edbed596] Downloading code package.
2020-08-25 14:59:49.440 [1598360366549490/start/1 (pid 19014)] [4d6263d9-81a7-417c-a618-d990edbed596] Code package downloaded.
2020-08-25 14:59:54.380 [1598360366549490/start/1 (pid 19014)] [4d6263d9-81a7-417c-a618-d990edbed596] Task is starting.
2020-08-25 14:59:54.380 [1598360366549490/start/1 (pid 19014)] [4d6263d9-81a7-417c-a618-d990edbed596] {"type": "uploader-v1", "url": "s3://i-penp-b-s3-eu-west-1/data/Flow/b381cee440378c2591cd8955a04e24b7b21642b2", "is_text": true, "encoding": null}
2020-08-25 14:59:54.380 [1598360366549490/start/1 (pid 19014)] [4d6263d9-81a7-417c-a618-d990edbed596] {"type": "uploader-v1", "url": "s3://i-penp-b-s3-eu-west-1/data/Flow/12c85e48727ade1837c6356bac6f0a71d6d3a7b3", "is_text": true, "encoding": null}
Is there a way I get code to work again?
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
AWS Batch FAQs
Q: What is AWS Batch? AWS Batch is a set of batch management capabilities that enables developers, scientists, and engineers to easily and...
Read more >Release Notes - Metaflow Docs
This release introduces a number of internal changes, removing all remaining discrepancies between the legacy version of Metaflow that was used inside Netflix ......
Read more >aws-samples/aws-batch-architecture-for-alphafold - GitHub
Contribute to aws-samples/aws-batch-architecture-for-alphafold development by creating an account on GitHub.
Read more >AWS Batch — apache-airflow-providers-amazon Documentation
Batch computing is a common way for developers, scientists, and engineers to access large amounts of compute resources. AWS Batch removes the undifferentiated ......
Read more >aws-cdk.aws-batch-alpha - PyPI
The CDK Construct Library for AWS::Batch. ... They are subject to non-backward compatible changes or removal in any future version.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Let me look into this. Yes, we changed the behavior of IncludeFile to be able to support step functions specifically. It should not have broken things like this though so I will take a look.
Cool, @romain-intel. I actually went on and included the SQL files like so --package-suffixes=‘.json,.sql’, so I could get rid of the
IncludeFile
. It’s great, though, to now have the environment decorator in my toolbelt. Thanks!