Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DataFrame rendering stylists?

See original GitHub issue

Been working a bit on the idea of DataFrame rendering stylists. Would like some feedback on the idea / API.

Idea is that on top of the formatters used in DataFrame.to_string() and DataFrame.to_html(), i would like to add stylists. A stylist takes as input a string and returns a string, allowing to alter the string representation of each individual cell of a DataFrame. Each DataFrame element goes first through a formatter and the string result from this goes through a stylist. This allows to use e.g ANSI escape sequences to change text, background color of how a DataFrame cell is displayed on screen. Or in the DataFrame.to_html() case, a stylists can add html div, class tags - which combined with css change the rendering of a html table cell.

This is the description of stylists and new API for DataFrame.to_string() and DataFrame.to_html

class DataFrameFormatter(object):
    """
    Render a DataFrame

    self.to_string() : console-friendly tabular output
    self.to_html() : html table
    """
    def __init__(self, frame, buf=None, columns=None, col_space=None,
                 na_rep='NaN', formatters=None, float_format=None,
                 sparsify=True, index_names=True, stylists=None):
        """
        Parameters
        ----------
        stylists : object when indexed with [row_key][column_key] returns a
            callable object taking as input a string and outputs a string.
            When rendering a DataFrame, each cell can be altered individually
            using stylists, a cell gets formatted first with formatters after
            which a style can be applied. For example the stylist can add ansi
            escape sequences to display a cell in a different color, add
            html class or div tags, ...
            If stylist[row_key][column_key] does not exist, no styling is done
            to this particular cell of the DataFrame.
        """

class DataFrame():
    def to_string(self, buf=None, columns=None, colSpace=None,
                  na_rep='NaN', formatters=None, float_format=None,
                  sparsify=True, nanRep=None, index_names=True,
                  stylists=None):

    def to_html(self, buf=None, columns=None, colSpace=None,
                na_rep='NaN', formatters=None, float_format=None,
                sparsify=True, index_names=True, stylists=None):

A little demo when using stylists on screen:

import pandas
import numpy as np
from colorama import Fore, Back, Style

df = pandas.DataFrame(np.random.randint(0, 10, (5, 2)),
                      columns=['A', 'B'],
                      index=['a', 'b', 'c', 'd', 'e'])

red = lambda x: Back.RED + x + Back.RESET
green = lambda x: Back.GREEN + x + Back.RESET
yellow = lambda x: Back.YELLOW + x + Back.RESET

stylists = {'a': {'A': red, 'B': yellow},
            'b': {'A': green},
            'c': {'B': green}}

Results in the following (there should be an image here below):

stylists demo

As you can see, more work is needed. The ANSI escape sequences are taken into account when determining the number of characters needed for each column, this is not needed since they are invisible. Solution is e.g to set column widths before stylists are applied, implying that a stylist can not change the width of a column - seems reasonable.

Or maybe this should be taken one step further and find some way to combine the functionality of both formatters and stylists into one (have not thought about how this should look)? Ideas, feedback?

Issue Analytics

State:
Created 12 years ago
Comments:7 (5 by maintainers)

Top GitHub Comments

1reaction

ghost711commented, Aug 18, 2019

The approach below appears to work great for calculating the correct column print width when ANSI codes are involved.

You can replace the “TextAdjustment” class with the version below in this file: site-packages/pandas/io/formats/format.py

class TextAdjustment(object):  
    def __init__(self):
        import re
        self.ansi_regx = re.compile(r'\x1B[@-_][0-?]*[ -/]*[@-~]')
        self.encoding  = get_option("display.encoding")
    
    def len(self, text):  
        return compat.strlen(self.ansi_regx.sub('', text), 
                             encoding=self.encoding) 
            
    def justify(self, texts, max_len, mode='right'):       
        jfunc = str.ljust if (mode == 'left')  else \
                str.rjust if (mode == 'right') else str.center     
        out = [];  
        for s in texts:
            escapes = self.ansi_regx.findall(s)    
            if len(escapes) == 2:
                out.append(escapes[0].strip() + 
                           jfunc(self.ansi_regx.sub('', s), max_len) + 
                           escapes[1].strip()) 
            else:
                out.append(jfunc(s, max_len)) 
        return out;  
      
    def _join_unicode(self, lines, sep=''):
        try:
            return sep.join(lines)
        except UnicodeDecodeError:
            sep = compat.text_type(sep)
            return sep.join([x.decode('utf-8') if isinstance(x, str) else x
                                                            for x in lines])
    
    def adjoin(self, space, *lists, **kwargs): 
        # Add space for all but the last column: 
        pads = ([space] * (len(lists) - 1)) + [0] 
        max_col_len = max([len(col) for col in lists])
        new_cols = []
        for col, pad in zip(lists, pads): 
            width = max([self.len(s) for s in col]) + pad
            c     = self.justify(col, width, mode='left')
            # Add blank cells to end of col if needed for different col lens: 
            if len(col) < max_col_len:
                c.extend([' ' * width] * (max_col_len - len(col)))
            new_cols.append(c)
             
        rows = [self._join_unicode(row_tup) for row_tup in zip(*new_cols)] 
        return self._join_unicode(rows, sep='\n')

0reactions

TomAugspurgercommented, Oct 18, 2019

FYI, pandas.compat is considered private: https://pandas.pydata.org/pandas-docs/stable/reference/index.html

On Fri, Oct 18, 2019 at 9:59 AM Celyn Walters notifications@github.com wrote:

Thanks @ghost711 https://github.com/ghost711! For posterity, compat.strlen seems to have been removed in #25903 https://github.com/pandas-dev/pandas/pull/25903. I added it back in to this file with (simply):

def strlen(data, encoding=None): return len(data)

and also needed:

import pandas.compat as compat

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/459?email_source=notifications&email_token=AAKAOIUBL5ISIDLJEGNKT4DQPHFOXA5CNFSM4AMJVTN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBUYGKI#issuecomment-543785769, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOIRYAVQAC4FDUOM7K7DQPHFOXANCNFSM4AMJVTNQ .