ENH: Make metadata from read_spss available #34682

mario-bermonti · 2020-06-10T00:50:27Z

Is your feature request related to a problem?

I would like to have the metadata that pyreadstats provides available when reading files from SPSS.

This would be really helpful because it would provide an easy way to have variable labels (descriptions), value labels, and other important metadata available to format results/reports (by replacing the variable names manually with .replace function).

Those kinds of metadata are widely used in social sciences because it makes understanding results really easy. For example, SPSS changes the variable names to the variable labels in the output of analyses. Users could manually do this if the metadata was available.

Describe the solution you'd like

The metadata read by pyreadstats could be stored in the df's _metadata attribute and that would make it readily available

API breaking implications

I don't think there would be any implications if it's stored in the _metadata attribute because it was developed for this kind of use-case. I'm I right?

Describe alternatives you've considered

I could use the pyreadstats directly without using the df.read_spss. I can't think of any other options.

Additional context

This is related to issues #11179 and #39.

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2020-06-10T01:44:46Z

The metadata read by pyreadstats could be stored in the df's _metadata attribute and that would make it readily available

NDFrame._metadata is primarily for subclasses. We would want to use .attrs instead.

mario-bermonti · 2020-06-21T21:22:18Z

So in theory this shouldn’t cause too much trouble? I’m going to give it a try and get back with the results

TomAugspurger · 2020-06-22T13:27:56Z

Yeah, won't cause any trouble. There will be some operations that fail to propagate the metadata, and we still need to determine how metadata propagates when multiple dataframes (possibly with different metadata) are involved. But won't cause any issues.

Thanks for working on this.

mario-bermonti · 2021-02-07T20:27:12Z

Hi. I’m so sorry for this late reply. I haven’t been able to work on this because this last year has been very hard. I hope to work on it sometime in the near future. So I will keep in touch.

jbrockmendel · 2023-11-01T16:52:46Z

Closed by #55472?

mario-bermonti added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 10, 2020

TomAugspurger added IO Data IO issues that don't fit into a more specific label and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 10, 2020

TomAugspurger added this to the Contributions Welcome milestone Jun 10, 2020

mroeschke added the metadata _metadata, .attrs label Aug 7, 2021

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

jbrockmendel added the Closing Candidate May be closeable, needs more eyeballs label Nov 1, 2023

mroeschke closed this as completed Nov 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Make metadata from read_spss available #34682

ENH: Make metadata from read_spss available #34682

mario-bermonti commented Jun 10, 2020

TomAugspurger commented Jun 10, 2020

mario-bermonti commented Jun 21, 2020

TomAugspurger commented Jun 22, 2020

mario-bermonti commented Feb 7, 2021

jbrockmendel commented Nov 1, 2023

ENH: Make metadata from read_spss available #34682

ENH: Make metadata from read_spss available #34682

Comments

mario-bermonti commented Jun 10, 2020

Is your feature request related to a problem?

Describe the solution you'd like

API breaking implications

Describe alternatives you've considered

Additional context

TomAugspurger commented Jun 10, 2020

mario-bermonti commented Jun 21, 2020

TomAugspurger commented Jun 22, 2020

mario-bermonti commented Feb 7, 2021

jbrockmendel commented Nov 1, 2023