Refactor/Restructure Modin
See original GitHub issueInspired by refactoring of Modin to follow algebra laid out in paper and documenting internal functionality and system architecture I would like to start the following discussions:
1) Restructure Modin
Reading System Architecture and looking at High-Level Architectural View it may appear the sense of ambiguous between engines and backends, and also layered structure of Modin in general. In order to facilitate understanding of Modin structure I would like to suggest the following structure of Modin (comments/objections are welcome):
.modin/
.config/
.distributed/
.pandas/
.spreadsheet/
.sql/
.test/
.rest files
.core/
.data_management/
. # the same as it is now
.base/
.frame/
# base/abstract classes are located here
.axis_partition.py # BaseFrameAxisPartition
.frame.py # BaseFrame
.partition_manager.py # BaseFrameManager
.partition.py # BaseFramePartition
.io/
.column_stores/
. # the same as it is now
.sql/
. # the same as it is now
.text/
. # the same as it is now
.file_dispatcher.py # the same as it is now
.io.py # the same as it is now
.backends/
.base/
./query_compiler
.query_compiler.py
.engines/
.ray/
.generic/
.frame/
.axis_partition.py # GenericRayFrameAxisPartition
.frame.py # GenericRayFrame
.partition_manager.py # GenericRayFrameManager
.partition.py # GenericRayFramePartition
.io/
.io.py # GenericRayIO
.task_wrapper.py # GenericRayTask
.utils.py # init_ray
.dask/
. # the same as for ray
.python/
. # the same as for ray
.pandas/
./query_compiler
.query_compiler.py
.engines/
.ray/
.frame/
.axis_partition.py # PandasOnRayFrameAxisPartition
.frame.py # PandasOnRayFrame
.partition_manager.py # PandasOnRayFrameManager
.partition.py # PandasOnRayFramePartition
.io/
.io.py # PandasOnRayIO
.dask/
. # the same as for ray
.python/
. # the same as for ray
.experimental/ # this is in "modin" folder but not in "core" folder
./cloud
. # the same as it is now
./pandas
. # the same as it is now
./sklearn
. # the same as it is now
./sql
. # the same as it is now
./xgboost
. # the same as it is now
./backends
./omnisci
./query_compiler
.query_compiler.py
.engines/
.ray/
.frame/
.axis_partition.py # OmniSciOnRayFrameAxisPartition
.frame.py # OmniSciOnRayFrame
.partition_manager.py # OmniSciOnRayFrameManager
.partition.py # OmniSciOnRayFramePartition
./pyarrow
./query_compiler
.query_compiler.py
.engines/
.ray/
.frame/
.axis_partition.py # PyarrowOnRayFrameAxisPartition
.frame.py # PyarrowOnRayFrame
.partition_manager.py # PyarrowOnRayFrameManager
.partition.py # PyarrowOnRayFramePartition
I think this stress the difference of engines and backends and reflects also layered structure of Modin more clear.
2) Refactor Modin
a) We need to define public API (with no underscore in front of a method name) and private/protected API (with underscore in front of a method name) for Modin Frame (Partition Manager as well) because currently everything is mixed.
b) We need to check operations of Modin Frame and Modin Partition Manager and rework/reimplement those operations which were implemented in an inappropriate way. I believe there should be something.
@modin-project/modin-core , thoughts?
Issue Analytics
- State:
- Created 2 years ago
- Comments:11 (11 by maintainers)
Let’s revisit after 0.10 release is cut.
Cool! Thanks to all!