Skip to content

ENH: Make metadata from read_spss available #34682

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mario-bermonti opened this issue Jun 10, 2020 · 5 comments
Closed

ENH: Make metadata from read_spss available #34682

mario-bermonti opened this issue Jun 10, 2020 · 5 comments
Labels
Closing Candidate May be closeable, needs more eyeballs Enhancement IO Data IO issues that don't fit into a more specific label metadata _metadata, .attrs

Comments

@mario-bermonti
Copy link

Is your feature request related to a problem?

I would like to have the metadata that pyreadstats provides available when reading files from SPSS.

This would be really helpful because it would provide an easy way to have variable labels (descriptions), value labels, and other important metadata available to format results/reports (by replacing the variable names manually with .replace function).

Those kinds of metadata are widely used in social sciences because it makes understanding results really easy. For example, SPSS changes the variable names to the variable labels in the output of analyses. Users could manually do this if the metadata was available.

Describe the solution you'd like

The metadata read by pyreadstats could be stored in the df's _metadata attribute and that would make it readily available

API breaking implications

I don't think there would be any implications if it's stored in the _metadata attribute because it was developed for this kind of use-case. I'm I right?

Describe alternatives you've considered

I could use the pyreadstats directly without using the df.read_spss. I can't think of any other options.

Additional context

This is related to issues #11179 and #39.

@mario-bermonti mario-bermonti added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 10, 2020
@TomAugspurger
Copy link
Contributor

The metadata read by pyreadstats could be stored in the df's _metadata attribute and that would make it readily available

NDFrame._metadata is primarily for subclasses. We would want to use .attrs instead.

@TomAugspurger TomAugspurger added IO Data IO issues that don't fit into a more specific label and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 10, 2020
@TomAugspurger TomAugspurger added this to the Contributions Welcome milestone Jun 10, 2020
@mario-bermonti
Copy link
Author

So in theory this shouldn’t cause too much trouble? I’m going to give it a try and get back with the results

@TomAugspurger
Copy link
Contributor

Yeah, won't cause any trouble. There will be some operations that fail to propagate the metadata, and we still need to determine how metadata propagates when multiple dataframes (possibly with different metadata) are involved. But won't cause any issues.

Thanks for working on this.

@mario-bermonti
Copy link
Author

Hi. I’m so sorry for this late reply. I haven’t been able to work on this because this last year has been very hard. I hope to work on it sometime in the near future. So I will keep in touch.

@mroeschke mroeschke added the metadata _metadata, .attrs label Aug 7, 2021
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@jbrockmendel
Copy link
Member

Closed by #55472?

@jbrockmendel jbrockmendel added the Closing Candidate May be closeable, needs more eyeballs label Nov 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closing Candidate May be closeable, needs more eyeballs Enhancement IO Data IO issues that don't fit into a more specific label metadata _metadata, .attrs
Projects
None yet
Development

No branches or pull requests

4 participants