Skip to content

API: Specify the dtype of new columns added in reindex #33586

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
burk opened this issue Apr 16, 2020 · 3 comments
Closed
2 of 3 tasks

API: Specify the dtype of new columns added in reindex #33586

burk opened this issue Apr 16, 2020 · 3 comments
Labels
API Design Indexing Related to indexing on series/frames, not to indexes themselves Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@burk
Copy link

burk commented Apr 16, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

df = pd.DataFrame({'x': [np.nan, 1., 2.]}).astype(pd.SparseDtype("float", np.nan))
df = df.reindex(['x', 'y'], axis='columns')
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
x    -4 non-null Sparse[float64, nan]
y    -6 non-null float64
dtypes: Sparse[float64, nan](1), float64(1)
memory usage: 176.0 bytes

Problem description

When re-indexing the columns of a sparse dataframe, new columns are not sparse. This is problematic especially since the new columns would be completely sparse.

Expected Output

I'd expect that the new column was also of type Sparse[float64, 0.0].

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : None python : 3.7.5.final.0 python-bits : 64 OS : Linux OS-release : 5.3.0-45-generic machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 0.25.0
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 42.0.2
Cython : 0.29.15
pytest : 5.3.5
hypothesis : None
sphinx : 2.4.1
blosc : None
feather : 0.4.0
xlsxwriter : None
lxml.etree : 4.4.2
html5lib : None
pymysql : 0.9.3
psycopg2 : 2.8.4 (dt dec pq3 ext lo64)
jinja2 : 2.11.1
IPython : 7.12.0
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.3.1
fastparquet : None
gcsfs : None
lxml.etree : 4.4.2
matplotlib : 3.1.3
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : 0.16.0
pytables : None
s3fs : None
scipy : 1.3.2
sqlalchemy : 1.3.13
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None

@burk burk added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 16, 2020
@TomAugspurger
Copy link
Contributor

I don't agree that the column y should automatically be sparse. That kind of implicit dependence to the dtypes on the the other columns would lead to surprises.

What reindex lacks is a way to specify the dtype of the new columns. Something like

df.reindex(columns=['x', 'y'], dtype=pd.SparseDtype('float64'))

would be reasonable.

This is closely related to #31874, where the dtype would be specified by the other DataFrame introducing new columns.

@TomAugspurger TomAugspurger added API Design Indexing Related to indexing on series/frames, not to indexes themselves Reshaping Concat, Merge/Join, Stack/Unstack, Explode and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 16, 2020
@TomAugspurger TomAugspurger changed the title BUG: Reindexing columns of sparse dataframe leads to new non-sparse columns API: Specify the dtype of new columns added in reindex Apr 16, 2020
@burk
Copy link
Author

burk commented Apr 17, 2020

Thanks for having a look. I agree that specifying the dtype of the new columns would be reasonable and sufficient.

@TomAugspurger
Copy link
Contributor

Looks like this overlaps with #20513.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Indexing Related to indexing on series/frames, not to indexes themselves Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

2 participants