Wanted: Strawman proposals for new collections architecture
See original GitHub issueA redesign of the standard library is on the roadmap for one of the next Scala versions (could be as early as 2.13). Some requirements for the redesign should be:
- Normal usage of collections should stay the same. That is, the standard operations
on collections should still be available under the same names. Some more exotic and advanced
scenarios such as
breakOut
can be dropped if alternative ways to achieve the same functionality exist. - User-defined implementations of collections should port to the new library as far as is reasonable. We should allow some breakage here, if it is necessary to achieve other goals.
- We should strive for simplicity in APIs and implementations.
- We should strive to better separate interfaces from implementation and avoid the fragile base class problem, where too much gets inherited automatically.
- We should try to simplify the inheritance graphs, in particular those of often-used collections such as lists.
- We should improve the integration of strict and lazy collections by means of a better architecture for views. Views should avoid accidental forcing, and should omit transforms from collections that require such forcing. However, forcing is still needed to support aggregations.
- We should try to integrate Java8 streams and/or have our own high-performance implementations for parallel collections.
- We should generally be at least on par with current collections in what concerns efficiency. In particular, we should still allow specializations of collection operations for particular implementations. These optimizations should still work if the static collection type is abstract. E.g. an optimized implementation of
++
onArrayBuffer
s should be called even if the static types of the operands of++
areIterable
s. - The design should be friendly to optimizations such as inlining and, possibly, more advanced whole program optimizations.
To gain experience it would be good to experiment with some prototypes that illustrate possible designs and can point to strengths and weaknesses. A full collections design is too heavy to fill this role. It takes too long to write and understand. As an alternative, I propose to implement just a subset of the required functionality. The subset should implementable in about 500 LOC or less and at the same time should be as representative as possible of the whole collections. The API should be the same for all strawman proposals.
After some experimentation, I came up with the following proposal for the API to be implemented by strawman proposals. There’s still some modest room to add things if people feel it is important.
Base Traits
Iterator
Iterable
Seq
For now we leave out maps and sets (which does not say they are not important). We gloss over the mutabile/immutable distinction. All collections are in the same package (this is only for the strawman, to keep things simple, not for the full collections, of course).
Iterable
and Seq
are considered collection types, but Iterator
is not.
Collection Implementations
List
ListBuffer
ArrayBuffer
java.lang.String
View
List
is a simplified version of Scala lists. It’s an immutable collection favoring linear access patterns.
List
operations should be tail-recursive, hence the addition of ListBuffer
to hold intermediate results. ArrayBuffer
is a mutable collection that supports random access. String
should demonstrate how the architecture integrates externally defined types. String
should be seen by Scala as an immutable random access sequence of Char
. Finally, views should be constructible over all other collections. They are immutable and lazy.
List
, ListBuffer
and ArrayBuffer
should have companion objects that let one construct collections given their elements in the usual way:
List(1, 2, 3)
ListBuffer("a", "bc")
ArrayBuffer()
Collection operations
The following operations should be supported by all collections.
foldLeft
foldRight
isEmpty
head
iterator
view
to
foldLeft
and foldRight
are the basis of all sorts of aggregations. isEmpty
and head
exemplify
tests and element projections. iterator
and view
construct iterators and views over a collection. collect
is not part of the current Scala collections, but is useful to support views and Java 8 streams. It should construct a new collection of a given type from the elements of the receiver. An invocation pattern of to
could be either of the following:
xs.to[List]
xs.to(List)
Strawman collections also should support the following transforms:
filter
map
flatMap
++
zip
partition
drop
map
and flatMap
together support all monadic operations. ++
and zip
are examples of operations with multiple inputs. drop
was chosen as a typical projection that yields a sub-collection. partition
was chosen in addition filter
because it exemplifies transforms with multiple outputs.
Strawman Seq
s (but not necessarily Iterable
s or View
s) should also support the following operations
length
apply
indexWhere
reverse
Sequences in a way are defined by their length
and their indexing method apply
. indexWhere
is an example of a method that combines indices and sequences in an interesting way. reverse
is an example of a transform that suggests an access pattern different from left-to-right.
ArrayBuffer
and ListBuffer
should in addition support the following mutation operations
+=
++=
trimStart
Required Optimizations
There are a number of possible specializations that the strawman architecture should support. Here are two examples:
xs.drop(n)
on a listxs
should take time proportional ton
, the retained part of the list should not be copied.xs ++ ys
on array buffersxs
andys
should be implemented by efficient array copy operations.s1 ++ s2
on strings should use native string concatenation.partition
should require only a single collection traversal if the underlying collection is strict (i.e., not a view).
These specializations should be performed independently of the static types of their arguments.
Why No Arrays?
Collections definitely should support arrays as they do now. In particular, arrays should have the same representation as in Java and all collection operations should be applicable to them. Arrays are left out of the strawman requirements because of the bulk their implementation would add. Even though no explicit implementation is demanded, we should still check all designs for how they would support arrays.
Issue Analytics
- State:
- Created 8 years ago
- Comments:66 (28 by maintainers)
Top GitHub Comments
I really expect LazyList or Stream add these powerful and convenient methods : repeat/cycle, repeatedly/generate, iterate, unfold, resource, reject, like in Elixir and Clojure: https://hexdocs.pm/elixir/Stream.html#unfold/2 This symbol #:: is too low,strange to verbose,unreadable. Also, I hope add some xxxIndexed like in Kotlin Sequence: https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.sequences/index.html I think, one of the programming trend is the Lazy Sequences .
As somebody who writes C# at work and Scala at home, there are some conveniences in the .NET collection API that I miss in Scala. I realize that most of the discussion here is related to the implementation details of the collections library, whereas my comments are more concerned with the user API of the collections library. Still, maybe these thoughts will be useful.
Suppose I have a list of pairs (to be interpreted as key/value pairs). I’d like to group the values by their corresponding keys, with the understanding that there may be duplicate keys. In Scala, I would do that like this (unless there’s a simpler way that I’m not seeing):
Or, to be more verbose but perhaps also more efficient, I might do this:
Whereas in C#, I could use an overload of GroupBy do this:
The .Net library realizes that simultaneous mapping while grouping is a common use case, so they provide a way to do that. The C# version feels far more ergonomic to me than either Scala snippet.
Another nice feature of the .Net collection API is that all methods that iterate the collection with a callback have overloads whose callback function takes the current index. So while I would need to do this in Scala:
I could do this in C#
It’s easy enough for Select, SelectMany, and friends to also track the index while iterating, so they provide overloads exposing that functionality.
The .Net collection API includes many such conveniences. It’s clear that the designers put a lot of thought into it. And though having methods take multiple function parameters seems like a questionable idea, in practice, it works out surprisingly well.
I realize that I’m arguing for convenience, which goes somewhat against goal #3, above. I also realize that the sort of convenience methods that I’m talking about could be implemented by a third-party library. But I believe that the Scala collection library would be better for having these conveniences built-in.