question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Proposal: split function, to create subsets of an array

See original GitHub issue

Array[Array[X]] split(Array[X], Int)

Create subsets of an array of a specified maximum size n. All sub-arrays will have n elements, except possibly the last sub-array which will have at most n elements but may have fewer if the array size is not a multiple of n.

Example:

Array[Int] a = [1, 2, 3, 4, 5]
Array[Array[Int]] = split(a, 2)  # => [[1, 2], [3, 4], [5]]

There have been identified at least two use cases where this would be useful:

  • Downsampling an array: if I want only the first n elements of an array: subset(a, n)[0]
  • Breaking up a very large array for nested scatters:
    Array a = [...]
    Array chunks = split(a, 100)
    scatter (chunk in chunks) {
        scatter (i in chunk) {
            call foo { input: i = i }
        }
    }
    Array[Int] results = flatten(foo.result)
    

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:4
  • Comments:15 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
freeseekcommented, Sep 6, 2022

It is possible to achieve what @jdidion is asking using the development version of WDL:

version development

workflow main {
  Array[Int] a = [1, 2, 3, 4, 5]
  Int n = 2
  scatter (i in range(length(a))) { Int b = i / n }
  Map[Int, Array[Int]] c = collect_by_key(zip(b, a))
  scatter (i in range((length(a) + n - 1) / n)) { Array[Int] d = c[i] }
  output { Array[Array[Int]] e = d }
}

This will indeed generate output:

{
  "main.e": [[1, 2], [3, 4], [5]]
}

If the function unzip() was supported by Cromwell, it would be even more simple:

version development

workflow main {
  Array[Int] a = [1, 2, 3, 4, 5]
  Int n = 2
  scatter (i in range(length(a))) { Int b = i / n }
  output { Array[Array[Int]] c = unzip(as_pairs(collect_by_key(zip(b, a)))).right }
}

I would actually rather reserve split() for a function that is sort of the reverse for function sep(String, Array[String]) so to have a functionality similar to what python has with the String split() method

0reactions
markjschreibercommented, Dec 8, 2022

Although it might be possible to do this (albeit in the development version) it certainly isn’t as obvious what you are doing as having some kind of group, split, window type function.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Print all possible ways to split an array into K subsets
Approach: The problem can be solved using backtracking to generate and print all the subsets. Follow the steps below to solve the problem:....
Read more >
How to split an array into two subsets and keep sum of sub ...
Create a new list, in which, each element will be the sum of the matching sublist in the normalized list. Use some approximation...
Read more >
SPLIT in R with split() function [Learn how to split ... - R Coder
Use the SPLIT function in R to DIVIDE data sets based on GROUPS ✂️ Learn ... This will create four subsets with all...
Read more >
Array Subset Function - NI - National Instruments
You can use the Index Array function to modify the shape of the subarray. For example, if the input to an Index Array...
Read more >
Subsets - LeetCode
Given an integer array nums of unique elements, return all possible subsets (the power set). The solution set must not contain duplicate subsets....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found