question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ENH: Possible to add dtype/converters as arguments for pandas.read_xml() ?

See original GitHub issue

Is your feature request related to a problem?

I am using pandas lib to read xml for further processes, however a number of columns with leading ZERO are always converted to numbers, so I lost the original data.

Describe the solution you’d like

It would be great to add dtype/converter arguments for pandas.read_xml() to force pandas to interprete certain columns with given dtype/converters. Just like similar IO read (read_csv, read_html, etc)

read_xml read_csv

API breaking implications

Probably not, this argument could be optional.

Describe alternatives you’ve considered

Write my own code to pull data by each xml nodes, which results in very bad performance.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
ParfaitGcommented, Sep 18, 2021

As a current workaround, consider running XSLT to quote the nodes with leading zeroes and then convert on the pandas side. If using the default lxml parser, XSLT 1.0 scripts are supported in read_xml. Below XSLT runs the standard Identity Template and encloses the text values of the zip with double quotes.

import pandas as pd

xml = \
'''<root>
     <row>
        <zip>08540</zip>
        <dat>123</dat>
     </row>
     <row>
        <zip>08628</zip>
        <dat>456</dat>
     </row>
     <row>
        <zip>27599</zip>
        <dat>789</dat>
     </row>
    </root>'''

xsl = \
'''<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" omit-xml-declaration="no" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <!-- IDENTITY TEMPLATE TO COPY XML AS IS -->
    <xsl:template match="node()|@*">
       <xsl:copy>
         <xsl:apply-templates select="node()|@*"/>
       </xsl:copy>
    </xsl:template>
    
    <!-- ENCLOSE zip NODES WITH DOUBLE QUOTES -->
    <xsl:template match="zip">
      <xsl:copy>
        <xsl:variable name="quot">"</xsl:variable>
        <xsl:value-of select="concat($quot, text(), $quot)"/>
      </xsl:copy>
    </xsl:template>
    
</xsl:stylesheet>'''

df = (
    pd.read_xml(xml, stylesheet = xsl)
      .assign(zip = lambda x: x["zip"].str.replace('"', ''))
)

df
     zip  dat
0  08540  123
1  08628  456
2  27599  789
1reaction
ParfaitGcommented, Sep 15, 2021

Agreed! Good feature to add to running list. Also, read_xml passes parsed data to TextParser shared by other io readers.

Read more comments on GitHub >

github_iconTop Results From Across the Web

pandas.read_xml — pandas 1.5.2 documentation
Read XML document into a DataFrame object. New in version 1.3.0. Parameters. path_or_bufferstr, path object, or file-like object.
Read more >
Applying function with multiple arguments to create a new ...
You can go with @greenAfrican example, if it's possible for you to ... If you need to create multiple columns at once: ......
Read more >
pandas-read-xml - PyPI
A tool to read XML files as pandas dataframes.
Read more >
Possible to add dtype/converters as arguments for pandas ...
ENH : Possible to add dtype/converters as arguments for pandas.read_xml() ? ... I am using pandas lib to read xml for further processes,...
Read more >
Pandas DataFrame apply() Examples - DigitalOcean
The important parameters are: func: The function to apply to each row or column of the DataFrame. axis: axis along which the function...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found