Review of merged #235 and #262
See original GitHub issue@dsyme this merge #235 is slightly worrying. Let’s discuss when we meet next time.
About the Tensor0
case: I understand that you have some very good reason for this based on how you feel the codebase will evolve. I don’t like the change, but I have a feeling it needs to happen. Tensor0
is not a great name. Let’s think about a better one TensorC
can be better. What about Tensor_
?
Another thing is the “%A” printing which has been relying on the default DU printer. So that we got it look like Tensor [1.; 2.; 3.]
I have a feeling this can be addressed through other means. (Later edit: I can see now that there is a new tensor printing code merged here #234, will check.)
More importantly I have significant worries about the following two changes. I have a feeling that the mistakes you identified were not mistakes but were some intentional design influenced by Python-like conventions. I need to remember and inspect the behavior and then try to understand your change better.
Fix a mistake in the slicing shape computation where the 1 flag was being set for slices like t.[3…] resulting in incorrect result shapes when using that notaion to slice the first or last single element.
Fix a mistake in boundsToShape, which is used in two slightly different ways - one to get the result shape of slicing, the other to push adjoints in the SliceTT reverse mode op back into the adjoint for the originating tensor.
Later edit: I think this is also connected with why you added the check
here
member t.reversePush(value:Tensor) =
let check (v:Tensor,t:Tensor) =
// check the shapes of the adjoints match the nodes to which they are being propagated
assert (Shape.canExpand v.shape t.derivative.shape || Shape.canExpand t.derivative.shape v.shape)
I remember there were some very clear reasons and a careful design for the behavior you changed and it was also about the reverse mode shapes. I’m quite concerned about this merged code and I think the expand
check is not a good idea. We don’t want things in reverse mode to rely on broadcasting/expansions etc. We need to check if the reverse pushed value has exactly the same shape with the primal of the node to which we are propagating. Anything other than that can be a cause for silent erroneous behavior with derivatives in future code. I think it’s very likely I would like to revert this merge.
Issue Analytics
- State:
- Created 3 years ago
- Comments:21 (7 by maintainers)
Top GitHub Comments
OK no problem
This was basically reverted in 262. There’s no change here now end to end
I don’t mind the name,
TensorC
orTensor_
are both fine by me. I guess TensorC would be my preference