Review of merged #235 and #262See original GitHub issue
Tensor0 case: I understand that you have some very good reason for this based on how you feel the codebase will evolve. I don’t like the change, but I have a feeling it needs to happen.
Tensor0 is not a great name. Let’s think about a better one
TensorC can be better. What about
Another thing is the “%A” printing which has been relying on the default DU printer. So that we got it look like
Tensor [1.; 2.; 3.] I have a feeling this can be addressed through other means. (Later edit: I can see now that there is a new tensor printing code merged here #234, will check.)
More importantly I have significant worries about the following two changes. I have a feeling that the mistakes you identified were not mistakes but were some intentional design influenced by Python-like conventions. I need to remember and inspect the behavior and then try to understand your change better.
Fix a mistake in the slicing shape computation where the 1 flag was being set for slices like t.[3…] resulting in incorrect result shapes when using that notaion to slice the first or last single element.
Fix a mistake in boundsToShape, which is used in two slightly different ways - one to get the result shape of slicing, the other to push adjoints in the SliceTT reverse mode op back into the adjoint for the originating tensor.
Later edit: I think this is also connected with why you added the
member t.reversePush(value:Tensor) = let check (v:Tensor,t:Tensor) = // check the shapes of the adjoints match the nodes to which they are being propagated assert (Shape.canExpand v.shape t.derivative.shape || Shape.canExpand t.derivative.shape v.shape)
I remember there were some very clear reasons and a careful design for the behavior you changed and it was also about the reverse mode shapes. I’m quite concerned about this merged code and I think the
expand check is not a good idea. We don’t want things in reverse mode to rely on broadcasting/expansions etc. We need to check if the reverse pushed value has exactly the same shape with the primal of the node to which we are propagating. Anything other than that can be a cause for silent erroneous behavior with derivatives in future code. I think it’s very likely I would like to revert this merge.
- Created 3 years ago
- Comments:21 (7 by maintainers)
Top GitHub Comments
About the Tensor0 case: I understand that you have some very good reason for this based on how you feel the codebase will evolve. I don’t like the change, but I have a feeling it needs to happen. Tensor0 is not a great name. Let’s think about a better one TensorC can be better. What about Tensor_?
I don’t mind the name,
Tensor_ are both fine by me. I guess TensorC would be my preference