Dataset rework
See original GitHub issueDescribe the feature request
Currently the datasets are all over the place. Some have base classes, some don’t. Mixable charts use the covariant IMixableDataset<T>
which allows for different types of datasets in one chart but with many downsides. I once was really proud of my implementation of those mixable datasets (when I started learning about covariance) but let’s face it, I did horribly. Having to use wrappers for all value types is really bad and extending this thing seems like a nightmare. Now with the recent interop-layer rework (not #70, that one 4 month ago), we need to have an id for each dataset so now we also have an IDataset
interface that ensures that we have an accessable id.
The datasets are stored in config.data.dataset
. This is usually a collection (either List
or HashSet
) containing objects of type IMixableDataset<object>
or some specific class like BubbleDataset
. For charts that only support one dataset-type with one data structure, using a List
along with a specific class, works fine. However, things get complicated once you’re dealing with charts like the line chart. The line chart allows for different types of datasets (it’s perfectly legal to have a bar dataset in a line chart) and the line chart also accepts data in different forms (array of numbers, array of number-points, array of time-points). These have to somehow be mixable without breaking typesafety and without allowing you to add other types of datasets.
Which charts does this feature request apply to?
All charts
Describe the solution you’d like
I’d like the datasets and the collection of datasets to be enjoyable to use, performant and extendable.
Enjoyable to use
Dataset
- Typesafe. No
ArrayList
orList<object>
or anything of the sorts. If you can modify it, it should be in a typesafe manner. - Implements
IList<T>
. Until recently (and maybe even now) some datasets don’t really allow modification but I’d like to have all the methods ofIList<T>
and as a bonus alsoAddRange
. - Allow value types. Having to use wrappers most of the time because value types are not allowed is not great. The wrappers are stable but are missing equality features, require extra code for serialization (which also hurts performance) and aren’t very intuitive to use. The
Wrap()
extension methods are helpful but it’s still not great. - Every chart the same way. As I said in the introduction, the datasets are all over the place. This should be unified so that you can expect the same behaviour and the same usage from every dataset there is.
- Convenient. You should be able to use both an object initializer (for custom dataset properties) and a collection initializer (for adding the data). In the normal use case you’re going to use the object initializer and add the data afterwards through some database or API, so it’s not important to be able to use both at the same time (also quite hard to implement).
Dataset collection
- Typesafe. You shouldn’t be able to add
LineDataset<string>
to a line chart because it doesn’t support string values. This needs to be a compilation error, not aNotSupportedException
! - You should be able to edit the dataset collection. This is achieved by implementing all members of
IList<IDataset>
but preventing the user from adding anyIDataset
. Instead theAdd
methods are exposed with only supported overloads. Anything else throws aNotSupportedException
(or better, a compilation error, if possible). - Keep it consistent. It’s harder and also less important than with the datasets but you should be able to expect the same interface and usage from all dataset collections.
- Convenient. It should also support a collection initializer.
Performant
That’s actually not that big of a deal considering the bottleneck in this library is likely the usage of reflection and dynamic features but I have not tested it. Still it would be nice if we weren’t in boxing hell like we are currently with IMixableDataset
and the wrappers.
Extendable
We should split these datasets into interfaces and base classes so we can extend and modify them if needed. After all we are modeling a JavaScript library so it’s definitely of value to be flexible. We don’t want to get rid of typesafety (!) but being able to extend the model from outside without heavy reflection usage is great for something like this.
API proposal
I have already implemented the base system and this time I think I can actually be proud of it.
It’s entirely typesafe but still flexible, raises compiler errors when you try to add a not supported dataset, has full support for structs, allows for object and collection initializers and implements IList
as best as possible.
There is the base interface IDataSet
. This interface has no type associated with it. It’s more a semantic restriction than anything else but it does contain the most important properties being Id
and Type
. It’s also important when storing datasets since it’s the base of any dataset.
Implementing that base interface is the generic version IDataSet<T>
where T
can be any type. No in
or out
modifiers apply. This interface also assures that implementers need to expose a read-only list of T
.
Then the base class for all the datasets is DataSet<T>
. This is a class that implements IDataSet<T>
and IList<T>
for modification. It exposes the contents through a read-only property and thereby implements IDataSet<T>.Data
. It also contains the AddRange
methods we love.
Now the dataset collections. In many cases we can just use List<BubbleDataset>
or something similar so we don’t want too much abstraction.
For charts that support more than that, we have the handy class DatasetCollection
. It implements IList<IDataset>
so you can do modifications as you please with the exception of adding IDataset
. The Add
and Insert
methods are implemented explicitly (and therefore can’t be called unless casted) and throw a NotSupportedException
if used. However, they expose the protected
method AddDataset
which can add any dataset. This is the only way (disregarding reflection) to add datasets to this collection.
Now we can derive from that collection and add our own Add
methods. The Add
method is overloaded for every supported dataset. In the case of the line chart, this means every dataset that consists of either int
s, long
s, double
s, Point
s or TimePoint
s. For each possibility there is an Add
method. Not only is the name intuitive, it’s also the key to collection initializers. They search for an overload of Add
when the base implements IEnumerable
(which DatasetCollection
already does). We can add some abstraction to those collections like the NumberDatasetCollection
and the NumberAndPointDatasetCollection
but for now, there aren’t interfaces like INumberDatasetCollection
and IPointDatasetCollection
. You’d still need to implement both so it’s not really useful unless we start using composition but then things get even more complex.
And here's the code
public interface IDataset
{
string Id { get; } // for interop
string Type { get; } // for mixed charts
}
// this is actually implemented and the dataset collections
// restrict to IDataset<int>, IDataset<whatever> through their Add methods
public interface IDataset<T> : IDataset, IList<T>
{
IReadOnlyList<T> Data { get; }
}
This is the base implementation for every dataset.
[JsonObject]
public abstract class Dataset<T> : Collection<T>, IDataset<T>
{
/// <summary>
/// Gets the id used on interop-level.
/// </summary>
public string Id { get; }
/// <summary>
/// Gets the data contained in this dataset. This property is read-only.
/// </summary>
public IReadOnlyList<T> Data { get; }
/// <summary>
/// Gets the type of this dataset. Important for mixed charts.
/// </summary>
public string Type { get; }
public Dataset(string type = null, string id = null) : base(new List<T>())
{
Data = new ReadOnlyCollection<T>(Items);
Id = id ?? Guid.NewGuid().ToString();
Type = type;
}
public void AddRange(IEnumerable<T> items) => ((List<T>)Items).AddRange(items ?? throw new ArgumentNullException(nameof(items)));
public void AddRange(params T[] items) => AddRange(items as IEnumerable<T>);
public override bool Equals(object obj) => obj is Dataset<T> set &&
Id == set.Id &&
EqualityComparer<IList<T>>.Default.Equals(Items, set.Items);
public override int GetHashCode() => HashCode.Combine(Items, Id);
public static bool operator ==(Dataset<T> left, Dataset<T> right) =>
EqualityComparer<Dataset<T>>.Default.Equals(left, right);
public static bool operator !=(Dataset<T> left, Dataset<T> right) => !(left == right);
}
And here is how you can store those datasets.
// Supports every operation of IList<IDataset> except for adding and inserting.
// There are protected methods for implementors but only the supported IList
// members are implemented implicitly. Those don't show up in code complete unless
// you cast it to IList<IDataset> in which case they will throw a NotSupportedException.
// Also since the Add methods take precedence the more concrete they are and ILists Add
// is hidden, you can use the Collection Initializer with different types (check Program.cs it's AWESOME)
public abstract class DatasetCollection : IReadOnlyList<IDataset>, IList<IDataset>
{
private const string NotSupportedMessageModificationThroughInterface =
"This collection doesn't support adding datasets through the IList or ICollection interface.";
private readonly List<IDataset> _datasets;
[JsonIgnore]
public int Count => _datasets.Count;
[JsonIgnore]
public bool IsReadOnly => false;
IDataset IList<IDataset>.this[int index]
{
get => this[index];
set => ThrowNotSupported();
}
public IDataset this[int index] => _datasets[index];
protected DatasetCollection()
{
_datasets = new List<IDataset>();
}
public bool Contains(IDataset dataset) => _datasets.Contains(dataset ?? throw new ArgumentNullException(nameof(dataset)));
public void CopyTo(IDataset[] array, int index) => _datasets.CopyTo(array, index);
public IEnumerator<IDataset> GetEnumerator() => _datasets.GetEnumerator();
public int IndexOf(IDataset dataset) => _datasets.IndexOf(dataset ?? throw new ArgumentNullException(nameof(dataset)));
protected void AddDataset(IDataset dataset) => _datasets.Add(dataset ?? throw new ArgumentNullException(nameof(dataset)));
protected void InsertDataset(int index, IDataset dataset) => _datasets.Insert(index, dataset ?? throw new ArgumentNullException(nameof(dataset)));
public bool Remove(IDataset dataset) => _datasets.Remove(dataset ?? throw new ArgumentNullException(nameof(dataset)));
public void RemoveAt(int index) => _datasets.RemoveAt(index);
public void Clear() => _datasets.Clear();
IEnumerator IEnumerable.GetEnumerator() => ((IEnumerable)_datasets).GetEnumerator();
void IList<IDataset>.Insert(int index, IDataset item) => ThrowNotSupported();
void ICollection<IDataset>.Add(IDataset item) => ThrowNotSupported();
private void ThrowNotSupported() => throw new NotSupportedException(NotSupportedMessageModificationThroughInterface);
}
Now for some examples, shall we?
This is the simplified dataset for a line chart. Fully generic, supports value types and has the type already assigned. If you implement a new line-like chart, you can derive from this class and use the protected constructor to inject your own type.
public class LineDataset<T> : Dataset<T>
{
public LineDataset() : this("line")
{
}
protected LineDataset(string type) : base(type)
{
}
public int[] BorderDash { get; set; }
public int? PointBorderWidth { get; set; }
public int? PointHoverRadius { get; set; }
public bool? Fill { get; set; }
public double? LineTension { get; set; }
public bool? SpanGaps { get; set; }
}
In order to store all the datasets a line chart supports, we also need a DatasetCollection
with the appropriate Add
methods. We could just write those all in one but it makes more sense to get a certain degree of abstraction which helps implementing other charts such as the bar chart.
public class NumberDatasetCollection : DatasetCollection
{
public void Add(IDataset<int> dataset) => AddDataset(dataset);
public void Add(IDataset<long> dataset) => AddDataset(dataset);
public void Add(IDataset<double> dataset) => AddDataset(dataset);
}
public class NumberPointDatasetCollection : NumberDatasetCollection
{
public void Add(IDataset<Point> dataset) => AddDataset(dataset);
public void Add(IDataset<TimePoint<int>> dataset) => AddDataset(dataset);
public void Add(IDataset<TimePoint<long>> dataset) => AddDataset(dataset);
public void Add(IDataset<TimePoint<double>> dataset) => AddDataset(dataset);
}
What’s missing is the LineData
class. It just contains the correct dataset collection and the labels. The labels are only serialized when they contain data but they’re still get-only.
public class LineData
{
public List<string> Labels { get; } = new List<string>();
[JsonProperty("xLabels")]
public List<string> XLabels { get; } = new List<string>();
[JsonProperty("yLabels")]
public List<string> YLabels { get; } = new List<string>();
// Supported: https://www.chartjs.org/docs/latest/charts/line.html#data-structure
public NumberPointDatasetCollection Datasets { get; } = new NumberPointDatasetCollection();
[Obsolete("json.net", true)]
public bool ShouldSerializeLabels() => Labels.Count > 0;
[Obsolete("json.net", true)]
public bool ShouldSerializeXLabels() => XLabels.Count > 0;
[Obsolete("json.net", true)]
public bool ShouldSerializeYLabels() => YLabels.Count > 0;
}
The bar example is similar but a bit more specific.
public class BarDataset<T> : Dataset<T>
{
public BarDataset(bool horizontal = false) : this(horizontal ? "horizontalBar" : "bar")
{
}
protected BarDataset(string type) : base(type)
{
}
public double? BarPercentage { get; set; }
public double? CategoryPercentage { get; set; }
}
public class BarDatasetCollection : NumberPointDatasetCollection
{
public void Add(IDataset<FloatingBarPoint> dataset) => AddDataset(dataset);
}
[JsonConverter(typeof(FloatingBarPointConverter))]
public readonly struct FloatingBarPoint : IEquatable<FloatingBarPoint>
{
public readonly double Start, End;
public FloatingBarPoint(double start, double end)
{
Start = start;
End = end;
}
public override bool Equals(object obj) => obj is FloatingBarPoint point && Equals(point);
public bool Equals(FloatingBarPoint other) => Start == other.Start && End == other.End;
public override int GetHashCode() => HashCode.Combine(Start, End);
public static bool operator ==(FloatingBarPoint left, FloatingBarPoint right) => left.Equals(right);
public static bool operator !=(FloatingBarPoint left, FloatingBarPoint right) => !(left == right);
}
internal class FloatingBarPointConverter : JsonConverter<FloatingBarPoint>
{
public override FloatingBarPoint ReadJson(JsonReader reader, Type objectType, FloatingBarPoint existingValue, bool hasExistingValue, JsonSerializer serializer)
{
//todo
throw new NotImplementedException();
}
public override void WriteJson(JsonWriter writer, FloatingBarPoint value, JsonSerializer serializer)
{
writer.WriteStartArray();
writer.WriteValue(value.Start);
writer.WriteValue(value.End);
writer.WriteEndArray();
}
}
public class BarData
{
public List<string> Labels { get; } = new List<string>();
// Supported: https://www.chartjs.org/docs/latest/charts/bar.html#data-structure
public BarDatasetCollection Datasets { get; } = new BarDatasetCollection();
[Obsolete("json.net", true)]
public bool ShouldSerializeLabels() => Labels.Count > 0;
}
Another example is for the bubble chart. This one is less specific (both spectrums are well supported).
public class BubbleDataset : Dataset<BubblePoint>
{
public BubbleDataset() : this("bubble")
{
}
protected BubbleDataset(string type) : base(type)
{
}
public int? Rotation { get; set; }
public int? Radius { get; set; }
}
public readonly struct BubblePoint : IEquatable<BubblePoint>
{
public readonly double X, Y, R;
public BubblePoint(double x, double y, double r)
{
X = x;
Y = y;
R = r;
}
public override bool Equals(object obj) => obj is BubblePoint point && Equals(point);
public bool Equals(BubblePoint other) => X == other.X && Y == other.Y && R == other.R;
public override int GetHashCode() => HashCode.Combine(X, Y, R);
public static bool operator ==(BubblePoint left, BubblePoint right) => left.Equals(right);
public static bool operator !=(BubblePoint left, BubblePoint right) => !(left == right);
}
public class BubbleData
{
public IList<BubbleDataset> Datasets { get; } = new List<BubbleDataset>();
}
And here’s all the rest I have/you need.
public readonly struct Point : IEquatable<Point>
{
public readonly double X, Y;
public Point(double x, double y)
{
X = x;
Y = y;
}
public override bool Equals(object obj) => obj is Point point && Equals(point);
public bool Equals(Point other) => X == other.X && Y == other.Y;
public override int GetHashCode() => HashCode.Combine(X, Y);
public static bool operator ==(Point left, Point right) => left.Equals(right);
public static bool operator !=(Point left, Point right) => !(left == right);
}
public readonly struct TimePoint<T> : IEquatable<TimePoint<T>>
{
[JsonProperty("t")]
public readonly DateTime Time;
public readonly T Y;
public TimePoint(DateTime time, T y)
{
Time = time;
Y = y;
}
public override bool Equals(object obj) => obj is TimePoint<T> point && Equals(point);
public bool Equals(TimePoint<T> other) => Time == other.Time && EqualityComparer<T>.Default.Equals(Y, other.Y);
public override int GetHashCode() => HashCode.Combine(Time, Y);
public static bool operator ==(TimePoint<T> left, TimePoint<T> right) => left.Equals(right);
public static bool operator !=(TimePoint<T> left, TimePoint<T> right) => !(left == right);
}
// This contract resolver is necessary because `Collection<T>` (the base of) `Dataset<T>`, has a non-virtual `Count` property
// this property always gets serialized and can't be influenced by `JsonIgnoreAttribute` or `ShouldSerializeCount`.
// This contract resolver is currently the only way I konw to get rid of that `Count` property and also to keep the possibility
// of having a `Count` property in a dataset as an options (which should get serialized, but currently there's not even a use-case for that).
internal class IgnoreDatasetCountContractResolver : DefaultContractResolver
{
protected override IList<JsonProperty> CreateProperties(Type type, MemberSerialization memberSerialization)
{
IList<JsonProperty> baseProps = base.CreateProperties(type, memberSerialization);
if (typeof(IDataset).IsAssignableFrom(type))
{
string countName = nameof(ICollection.Count);
if (NamingStrategy != null)
{
countName = NamingStrategy.GetPropertyName(countName, false);
}
foreach (var prop in baseProps)
{
if (prop.PropertyName == countName &&
prop.DeclaringType.IsGenericType &&
prop.DeclaringType.GetGenericTypeDefinition() == typeof(Collection<>))
{
prop.Ignored = true;
break;
}
}
}
return baseProps;
}
}
class Program
{
static void Main(string[] args)
{
foreach (object sampleData in GetSampleData())
{
string serialized = JsonConvert.SerializeObject(sampleData, Formatting.Indented, JsonSerializerSettings);
Console.WriteLine(serialized + Environment.NewLine);
}
Console.ReadLine();
}
private static IEnumerable<object> GetSampleData()
{
{
BubbleDataset bubbleDs1 = new BubbleDataset
{
Radius = 5
};
bubbleDs1.AddRange(new BubblePoint(1, 2, 3),
new BubblePoint(1, 10, 100),
new BubblePoint(-100, double.MinValue, double.MaxValue));
BubbleDataset bubbleDs2 = new BubbleDataset
{
Rotation = 10
};
bubbleDs2.AddRange(new BubblePoint(123.456, 1.2, 3.4),
new BubblePoint(-1.2, 2.3, -3.4),
new BubblePoint(1 / 999, double.MinValue, double.MaxValue));
BubbleData bubbleData = new BubbleData
{
Datasets =
{
bubbleDs1,
bubbleDs2,
// new BarDataset<string>() // compilation error, only BubbleDataset
}
};
yield return bubbleData;
}
{
LineDataset<int> ls1 = new LineDataset<int>
{
Fill = true
};
ls1.AddRange(1, 2, 3, 4, 5);
LineDataset<double> ls2 = new LineDataset<double>
{
1.1, 2.2, 3.3, 1 / 999, double.MinValue, double.MaxValue
};
LineData lineData = new LineData
{
XLabels =
{
"a", "b", "c"
},
Datasets =
{
ls1,
ls2,
new LineDataset<Point>
{
new Point(10.5, 7),
new Point(-99.999, 13.2),
new Point(1, -1)
},
new BarDataset<double>
{
1.13254, double.MinValue, double.MaxValue, 2.572, 37.753, 999 / 5
},
// new LineDataset<CustomObject>() // compilation error, no compatible Add overload
}
};
yield return lineData;
}
{
BarDataset<int> ls1 = new BarDataset<int>
{
BarPercentage = 20
};
ls1.AddRange(Enumerable.Range(2, 5).Select(i => i * i * i));
BarDataset<Point> ls2 = new BarDataset<Point>
{
new Point(-99456.95993, 13346.2),
new Point(1560.5, 63465),
new Point(12587.5, 733465),
new Point(145.5, 7245),
new Point(154, -14)
};
BarData barData = new BarData
{
Labels =
{
"a", "b", "c", "d", "e"
},
Datasets =
{
ls1,
ls2,
new BarDataset<FloatingBarPoint>
{
new FloatingBarPoint(3, 7),
new FloatingBarPoint(-1, 10),
new FloatingBarPoint(100, -100),
new FloatingBarPoint(1 / 123456789, 1 / 987654321),
new FloatingBarPoint(double.MinValue, double.MaxValue)
},
new LineDataset<int>(), // allowed, it's an IDataset<int>
// new BarDataset<string>() // compilation error, no compatible Add overload
}
};
yield return barData;
}
}
private class CustomObject
{
public int MyInt { get; set; }
public string SomeString { get; set; }
}
private static readonly JsonSerializerSettings JsonSerializerSettings = new JsonSerializerSettings
{
NullValueHandling = NullValueHandling.Ignore,
ContractResolver = new IgnoreDatasetCountContractResolver
{
NamingStrategy = new CamelCaseNamingStrategy(true, false)
}
};
}
Additional context
- When this is done, we can also finally implement the
Point
andTimeTuple
and many other classes as structs as seen here. They might need some tweaking but are already improved a lot over the current classes. - We’ll need to look at the supported data-structures for all the charts in order to design the dataset collections.
- It’s another breaking change… should we start batching these? Maybe for release 2.0 we could do #70 and #95? And then for 3.0 we could do #78, this and more
Nullable<T>
(for the defaults, issue incoming). We’ll have to see. - Excuse my wording and other mistakes, it’s quite a big issue and I’m quite tired.
Final words
Although I already thought this issue through quite a bit, I’m not sure on all the specifics yet. Please tell me anything you see wrong with this, potential pitfalls, missing features, etc.
I’d love to hear feedback on this!
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (1 by maintainers)
For the end user it shouldn’t be complex at all (that was one of my biggest requirements for this rework). They can use the datasets like a normal
List<T>
with some additional properties. That’s why it implementsIList<T>
along withAddRange
-methods.For the dataset collections it’s the same thing. They can just call the
Add
methods or use the collection initializer like they would with any otherList<T>
. The only difference being that there are multiple overloads for theAdd
method so they can add different types than justT
ofList<T>
. Also all the other methods for modifying the dataset collection don’t provide all those overloads because there isn’t much value in that. It could however be done easily by applying the same technique to those methods that we used for theAdd
methods.I don’t know if I have time to push it as a branch today but in the meantime you should check out the last code piece in the issue. It’s the
Program.cs
of my demo application and shows the use case. If you want, I can upload my test project here (just as zip), so you don’t have to copy the code over.For the last point, I also think it’s important to keep it flexible (that was another big requirement). This solution is very flexible because you can subclass any datset to add properties, you can implement your own datasets with
Dataset<T>
orIDataset<T>
and you can also implement your own dataset collections by subclassing the existing ones likeNumberDatasetCollection
, subclassingDatasetCollection
or creating your own complex dataset collection from the ground up. The only thing that could be a bit more flexible is that the dataset collections are get-only properties in the data-classes. If you want to use your own dataset collection in a bar chart, you’ll have to subclassBarData
and override theDatasets
property with your own type. I think this way it’s better. Adding a setter there doesn’t really make sense (unless we abstract all possible dataset collections into interfaces) and you can customize almost everything about this through subclassing.As a final note, I know it’s unfortunate to have so many breaking changes but I really do believe they’re necessary. Especially the datasets are a mess right now but there are other things that need reworking. I would suggest that we start working on release branches where we batch breaking changes. Only once multiple breaking changes are done, we can rebase onto master to get all the bug fixes and small features that were done there in the meantime and then merge back into master. It takes much more merge effort and requires all the developers to keep track of where we currently stand but it’ll make actual releases much more reasonable and allows us to release minor versions when small features and bug fixes are done without conflicting with breaking changes. This would also be a good time to tidy up our GitHub project and possibly start using milestones to mark which features belong in which release. That way we can plan major releases beforehand and work on our big features in an organized manner.
Thanks for the feedback.
I’m not sure how to understand your first paragraph. Could you elaborate on what you mean by that?
Regarding the second paragraph, I’d have to disagree. The way I want to implement the API on the dataset collections (the datasets themselfs don’t throw) is modeled after how C# arrays handle it.
They implement
IList<T>
but implement all the modification members likeAdd
,Insert
andRemove
explicitly and make them throw aNotSupportedException
. This way you can use the collection in methods that expect anIList<T>
orICollection<T>
without having to create a wrapper around it or copying the data.The only difference to C# arrays is that the actual implementation to the interfaces are provided at runtime whereas we’re required to already have them at compile time (but as far as I understand that doesn’t really make a difference in our case).
Here’s the paragraph I’m referring to from the
Array
class API documentation:You could argue that it’s not necessary to use the dataset collections in such APIs and I’d agree but in what way does it hurt implement the interface explicitly?
Totally agree on the third paragraph. I don’t know if it makes sense to put all the breaking changes that are currently planned (this, #70, #95, #78) in a single release, I’ll have to think about that a bit. I think it might be better but it’ll be a lot of effort so there won’t really be updates until that’s released. Also it will probably result in a lot of merging efforts especially if there are new, smaller features added in master while we work on release 2.0. But I think it wouldn’t be easier if we decided to split it into multiple releases.