question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add support for IP Address and MAC Address data

See original GitHub issue

Hi all, this is a proposal to add a new block and type for representing IP Addresses. There are still some details that need ironing out, but I wanted to gauge reactions to including this in pandas before spending too much more time on it.

Here’s a notebook demonstrating the basics: http://nbviewer.jupyter.org/gist/TomAugspurger/3ba2bc273edfec809b61b5030fd278b9

Abstract

Proposal to add support for storing and operating on IP Address data. Adds a new block type for ip address data and an ip accessor to Series and Index.

Rationale

For some communities, IP and MAC addresses are a common data format. The data format was deemed important enough to add the ipaddress module to the standard library (see PEP 3144_). At Anaconda, we hear from customers who would use a first-class IP address array container if it existed in pandas.

I turned to StackOverflow to gauge interest in this topic. A search for “IP” on the pandas stackoverflow tag turns up 300 results. Under the NumPy tag there are another 80. For comparison, I ran a few other searches to see what interest there is in other “specialized” data types (this is a very rough, probably incorrect, way of estimating interest):

term results
financial 251
geo 120
ip 300
logs 590

Categorical, which is already in pandas, turned up 1,089 items.

Overall, I think there’s enough interest relative to the implementation / maintenance burden to warrant adding the support for IP Addresses. I don’t anticipate this causing any issues for the arrow transition, once ARROW-1587 is in place. We can be careful which parts of the storage layer are implementation details.

Specification

The proposal is to add

  1. A type and container for IPAddress and MACAddress (similar to CategoricalDtype and Categorical).
  2. A block for IPAddress and MACAddress (similar to CategoricalBlock).
  3. A new accessor for Series and Indexes, .ip, for operating on IP addresses and MAC addresses (similar to .cat).

The type and block should be generic IP address blocks, with no distinction between IPv4 and IPv6 addresses. In our experience, it’s common to work with data from multiple sources, some of which may be IPv4, and some of which may be IPv6. This also matches the semantics of the default ipaddress.ip_address factory function, which returns an IPv4Address or IPv6Address as needed. Being able to deal with ip addresses in an IPv4 vs. IPv6 agnostic fashion is useful.

Data Layout

Since IPv6 addresses are 128 bits, they do not fit into a standard NumPy uint64 space. This complicates the implementation (but, gives weight to accepting the proposal, since doing this on your own can be tricky).

Each record will be composed of two uint64s. The first element contains the first 64 bits, and the second array contains the second 64 bits. As a NumPy structured dtype, that’s

base = np.dtype([('lo', '>u8'), ('hi', '>u8')])

This is a common format for handling IPv4 and IPv6 data:

Hybrid dual-stack IPv6/IPv4 implementations recognize a special class of addresses, the IPv4-mapped IPv6 addresses. These addresses consist of an 80-bit prefix of zeros, the next 16 bits are one, and the remaining, least-significant 32 bits contain the IPv4 address.

From here

Missing Data

Use the lowest possible IP address as a marker. According to RFC2373,

The address 0:0:0:0:0:0:0:0 is called the unspecified address. It must never be assigned to any node. It indicates the absence of an address.

See here.

Methods

The new user-facing IPAddress (analogous to a Categorical) will have a few methods for easily constructing arrays of IP addresses.

IPAddress.from_pyints(cls, values: Sequence[int]) -> 'IPAddress':
    """Construct an IPAddress array from a sequence of python integers.

    >>> IPAddress.from_pyints([10, 18446744073709551616])
    <IPAddress(['0.0.0.10', '::1'])>
    """

IPAddress.from_str(cls, values: Sequence[str]) -> 'IPAddress':
    """Construct an IPAddress from a sequence of strings."""

The methods in the new .ip namespace should follow the standard library’s design.

Properties

  • is_multicast
  • is_private
  • is_global
  • is_unspecificed
  • is_reserved
  • is_loopback
  • is_link_local

Reference Implementation

An implementation of the types and block is available at pandas-ip (at the moment it’s a proof of concept).

Alternatives

Adding a new block type to pandas is a major change. Downstream libraries may have special-cased handling for pandas’ extension types, so this shouldn’t be adopted without careful consideration.

Some alternatives to this that exist outside of pandas:

  1. Store ipaddress.IPv4Address or ipaddress.IPv6Address objects in an object dtype array. The .ip namespace could still be included with an extension decorator. The drawback here is the poor performance, as every operation would be done element-wise.
  2. A separate library that provides a container and methods. The downside here is that the library would need to subclass Series, DataFrame, and Index so that the custom blocks and types are interpreted correctly. Users would need to use the custom IPSeries, IPDataFrame, etc., which increases friction when working with other libraries that may expect / coerce to pandas objects.

To expand a bit on the (current) downside of alternative 2, when the pandas constructors see an “unknown” object, they falls back to object dtype and stuffs the actual Python object into whatever container is being created:

In [1]: import pandas as pd

In [2]: import pandas_ip as ip

In [3]: arr = ip.IPAddress.from_pyints([1, 2])

In [4]: arr
Out[4]: <IPAddress(['0.0.0.1', '0.0.0.2'])>

In [5]: pd.Series(arr)
Out[5]:
0    <IPAddress(['0.0.0.1', '0.0.0.2'])>
dtype: object

I’d rather not have to make a subclass of Series, just to stick an array-like thing into a Series.

If pandas could provide an interface such that objects satisfying that interface are treated as array-like, and not a simple python object, then I’ll gladly close this issue and develop the IP-address specific functionality in another package. That might be the best possible outcome to all this.

References

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:2
  • Comments:13 (12 by maintainers)

github_iconTop GitHub Comments

1reaction
TomAugspurgercommented, May 23, 2018

Closing this. It’s implemented in https://cyberpandas.readthedocs.io/.

0reactions
TomAugspurgercommented, Jun 20, 2018

Yes, cyberpandas has a MACArray type. https://cyberpandas.readthedocs.io/en/latest/api.html#macarray

Feel free to open an issue at https://github.com/ContinuumIO/cyberpandas if you have questions / issues.

On Tue, Jun 19, 2018 at 5:32 PM, Mike Pennington notifications@github.com wrote:

@TomAugspurger https://github.com/TomAugspurger the title of this issue mentions mac-addresses; I see that cyberpandas groks IPs now, but is there a solution for mac addresses? If so, can you elaborate?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/18767#issuecomment-398567209, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQHIvW-6fwHAYx8eAnpz85usHpqYp0Mks5t-XwPgaJpZM4RA0QJ .

Read more comments on GitHub >

github_iconTop Results From Across the Web

Understanding IP and MAC Addresses: What Are They Good ...
Learn the differences between IP addresses and MAC addresses, and why they're important for all internet-connected devices.
Read more >
Find Device or IP Address Using MAC Address: 2022 Guide
To find the IP address of your device, go to Command Prompt, and type “ipconfig“. Scroll down to see your IPv4 and IPv6...
Read more >
How to Find an IP Address using a MAC Address - Comparitech
We show you how to use common and freely available tools and methods to find an IP address using a MAC address. Updated...
Read more >
MAC address vs IP address: What's the difference?
MAC pools can group MAC addresses together for functionality. Each MAC address can be linked to multiple IP addresses of different networks.
Read more >
Use private Wi-Fi addresses on iPhone, iPad, iPod touch, and ...
If the device always uses the same Wi-Fi MAC address across all networks, network operators and other network observers can more easily ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found