question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Refactor/Restructure Modin

See original GitHub issue

Inspired by refactoring of Modin to follow algebra laid out in paper and documenting internal functionality and system architecture I would like to start the following discussions:

1) Restructure Modin

Reading System Architecture and looking at High-Level Architectural View it may appear the sense of ambiguous between engines and backends, and also layered structure of Modin in general. In order to facilitate understanding of Modin structure I would like to suggest the following structure of Modin (comments/objections are welcome):

.modin/
   .config/
   .distributed/
   .pandas/
   .spreadsheet/
   .sql/
   .test/
   .rest files
   .core/
      .data_management/
         . # the same as it is now
      .base/
         .frame/
            # base/abstract classes are located here
            .axis_partition.py # BaseFrameAxisPartition
            .frame.py # BaseFrame
            .partition_manager.py # BaseFrameManager
            .partition.py # BaseFramePartition
         .io/
            .column_stores/
               . # the same as it is now
            .sql/
               . # the same as it is now
            .text/
               . # the same as it is now
            .file_dispatcher.py # the same as it is now
            .io.py # the same as it is now
      .backends/
         .base/
            ./query_compiler
               .query_compiler.py
            .engines/
               .ray/
                  .generic/
                     .frame/
                        .axis_partition.py # GenericRayFrameAxisPartition
                        .frame.py # GenericRayFrame
                        .partition_manager.py # GenericRayFrameManager
                        .partition.py # GenericRayFramePartition
                     .io/
                        .io.py # GenericRayIO
                     .task_wrapper.py # GenericRayTask
                     .utils.py # init_ray
               .dask/
                  . # the same as for ray
               .python/
                  . # the same as for ray
         .pandas/
            ./query_compiler
               .query_compiler.py
            .engines/
               .ray/
                  .frame/
                     .axis_partition.py # PandasOnRayFrameAxisPartition
                     .frame.py # PandasOnRayFrame
                     .partition_manager.py # PandasOnRayFrameManager
                     .partition.py # PandasOnRayFramePartition
                  .io/
                     .io.py # PandasOnRayIO
               .dask/
                  . # the same as for ray
               .python/
                  . # the same as for ray
   .experimental/ # this is in "modin" folder but not in "core" folder
      ./cloud
         . # the same as it is now
      ./pandas
         . # the same as it is now
      ./sklearn
         . # the same as it is now
      ./sql
         . # the same as it is now
      ./xgboost
         . # the same as it is now
      ./backends
         ./omnisci
            ./query_compiler
               .query_compiler.py
            .engines/
               .ray/
                  .frame/
                     .axis_partition.py # OmniSciOnRayFrameAxisPartition
                     .frame.py # OmniSciOnRayFrame
                     .partition_manager.py # OmniSciOnRayFrameManager
                     .partition.py # OmniSciOnRayFramePartition
         ./pyarrow
            ./query_compiler
               .query_compiler.py
            .engines/
               .ray/
                  .frame/
                     .axis_partition.py # PyarrowOnRayFrameAxisPartition
                     .frame.py # PyarrowOnRayFrame
                     .partition_manager.py # PyarrowOnRayFrameManager
                     .partition.py # PyarrowOnRayFramePartition

I think this stress the difference of engines and backends and reflects also layered structure of Modin more clear.

modin_structure

2) Refactor Modin

a) We need to define public API (with no underscore in front of a method name) and private/protected API (with underscore in front of a method name) for Modin Frame (Partition Manager as well) because currently everything is mixed.

b) We need to check operations of Modin Frame and Modin Partition Manager and rework/reimplement those operations which were implemented in an inappropriate way. I believe there should be something.

@modin-project/modin-core , thoughts?

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:11 (11 by maintainers)

github_iconTop GitHub Comments

1reaction
devin-petersohncommented, May 24, 2021

Let’s revisit after 0.10 release is cut.

0reactions
Garra1980commented, Oct 13, 2021

Cool! Thanks to all!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Modin - Read the Docs
Scale your pandas workflow by changing a single line of code#. Modin uses Ray, Dask or Unidist to provide an effortless way to...
Read more >
System Architecture - Modin - Read the Docs
The user - Data Scientist interacts with the Modin system by sending interactive or batch commands through API and Modin executes them using...
Read more >
Usage Guide - Modin
This guide describes both basic and advanced Modin usage, including usage examples, details regarding Modin configuration settings, as well as tips and tricks ......
Read more >
Modin Configuration Settings
Config Name Env. Variable Name Default Value Options AsvDataSizeConfig MODIN_ASV_DATASIZE_CONFIG AsvImplementation MODIN_ASV_USE_IMPL modin ('modin', 'pandas') BenchmarkMode MODIN_BENCHMARK_MODE False
Read more >
Query Compiler — Modin 0.11.0+0.gc3b8d7e.dirty ...
Query compilers of all backends implement a common API, which is used by the Modin Dataframe to support dataframe queries. The role of...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found