Overall task input size
See original GitHub issueTL;DR: A new function to return the total size of all inputs to a task, eg
task foo {
File input1
StructType input2
Array[File] input3
Float size_sum = total_input_size("GB")
command { ... }
runtime { ... }
output { ... }
}
Detail:
It’s a fairly regular pattern amongst people to want to base the disk/memory/cpu for a task on the size of input files. Where this gets tricky is that it’s (a) tedious and (b) error-prone to have to include every input file individually in this calculation. Especially after a refactor, it’s easy to accidentally forget to include one of the files into the sum or miss one because it’s nested in an object/struct.
I’d like to propose a function total_input_size()
, only available within task
definitions, which would return the total size of every file needing to be localized to the execution environment for the task.
Like the current size()
function, I’d include the optional unit parameter to let people specify the result in MB
, GiB
, etc.
Before I write up a SPEC change proposal does this sound like a good idea or do people have concerns?
Issue Analytics
- State:
- Created 5 years ago
- Comments:10 (9 by maintainers)
Top GitHub Comments
@vdauwera yes, that is exactly the case. I think we can actually close this in favor of #169
Would this basically be syntactic sugar to shortcut using size(Array[input Files]) as was added by #169?