IntervalArray.shift and missing value handling #22428

TomAugspurger · 2018-08-20T12:57:39Z

Followup to #22387

The default implementation of shift fails when dtype.na_dtype can't be stored in a dtype array (e.g. int can't hold na).

In [24]: idx = IntervalArray.from_breaks(range(10))

In [25]: idx.shift()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-25-1b2c2192e1e6> in <module>()
----> 1 idx.shift()

~/sandbox/pandas/pandas/core/arrays/base.py in shift(self, periods)
    422             return self.copy()
    423         empty = self._from_sequence([self.dtype.na_value] * abs(periods),
--> 424                                     dtype=self.dtype)
    425         if periods > 0:
    426             a = empty

~/sandbox/pandas/pandas/core/arrays/interval.py in _from_sequence(cls, scalars, dtype, copy)
    193     @classmethod
    194     def _from_sequence(cls, scalars, dtype=None, copy=False):
--> 195         return cls(scalars, dtype=dtype, copy=copy)
    196
    197     @classmethod

~/sandbox/pandas/pandas/core/arrays/interval.py in __new__(cls, data, closed, dtype, copy, fastpath, verify_integrity)
    138
    139         return cls._simple_new(left, right, closed, copy=copy, dtype=dtype,
--> 140                                verify_integrity=verify_integrity)
    141
    142     @classmethod

~/sandbox/pandas/pandas/core/arrays/interval.py in _simple_new(cls, left, right, closed, copy, dtype, verify_integrity)
    156                 raise TypeError(msg.format(dtype=dtype))
    157             elif dtype.subtype is not None:
--> 158                 left = left.astype(dtype.subtype)
    159                 right = right.astype(dtype.subtype)
    160

~/sandbox/pandas/pandas/core/indexes/numeric.py in astype(self, dtype, copy)
    316         elif is_integer_dtype(dtype) and self.hasnans:
    317             # GH 13149
--> 318             raise ValueError('Cannot convert NA to integer')
    319         return super(Float64Index, self).astype(dtype, copy=copy)
    320

ValueError: Cannot convert NA to integer

Perhaps we can investigate using our IntegerNA extension array for the storage of int-dtyped IntervalArrays?

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2018-08-22T09:26:51Z

Perhaps we can investigate using our IntegerNA extension array for the storage of int-dtyped IntervalArrays?

Alternatively, we could also thinking about using a mask-based approach for missing values (similar as for IntegerArray).
I don't know how much extra complexity it would cause (I would think mainly changes in isna and in construction methods), and for sure it would cause an increase in memeory, but it would ensure a consistent handling of missing values regardless of the dtype of the breaks (whether that dtype has missing value support or not).

jorisvandenbossche · 2018-08-22T09:27:15Z

cc @jschendel

jorisvandenbossche · 2018-08-22T09:28:28Z

Of course, if people directly access and use .left and .right`, this might give unexpected results ..

mroeschke · 2020-04-27T00:03:30Z

This looks close to solved, probably more ideal if we used IntegerNA instead of converting to float

In [52]: pandas.arrays.IntervalArray.from_breaks(range(10)).shift()
Out[52]:
<IntervalArray>
[nan, (0.0, 1.0], (1.0, 2.0], (2.0, 3.0], (3.0, 4.0], (4.0, 5.0], (5.0, 6.0], (6.0, 7.0], (7.0, 8.0]]
Length: 9, closed: right, dtype: interval[float64]

In [53]: pd.__version__
Out[53]: '1.1.0.dev0+1390.gf3fdab389'

jbrockmendel · 2022-01-08T20:09:35Z

Closed by #31502

TomAugspurger added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Interval Interval data type ExtensionArray Extending pandas with custom dtypes or arrays. labels Aug 20, 2018

TomAugspurger mentioned this issue Aug 20, 2018

Support NDFrame.shift with EAs #22387

Merged

jorisvandenbossche added this to the 0.24.0 milestone Aug 22, 2018

jreback modified the milestones: 0.24.0, 0.25.0 Nov 6, 2018

jreback modified the milestones: 0.25.0, Contributions Welcome Apr 20, 2019

jschendel mentioned this issue May 21, 2019

BUG: .shift() on IntervalArray column raises ValueError #26479

Closed

mroeschke added the Bug label Apr 27, 2020

mroeschke mentioned this issue Apr 27, 2020

IntervalArray[datetime64[ns]].shift() raises TypeError #31504

Closed

jbrockmendel closed this as completed Jan 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IntervalArray.shift and missing value handling #22428

IntervalArray.shift and missing value handling #22428

TomAugspurger commented Aug 20, 2018

jorisvandenbossche commented Aug 22, 2018

jorisvandenbossche commented Aug 22, 2018

jorisvandenbossche commented Aug 22, 2018

mroeschke commented Apr 27, 2020

jbrockmendel commented Jan 8, 2022

IntervalArray.shift and missing value handling #22428

IntervalArray.shift and missing value handling #22428

Comments

TomAugspurger commented Aug 20, 2018

jorisvandenbossche commented Aug 22, 2018

jorisvandenbossche commented Aug 22, 2018

jorisvandenbossche commented Aug 22, 2018

mroeschke commented Apr 27, 2020

jbrockmendel commented Jan 8, 2022