Skip to content

Feature request: df that only return dfs when indexing. #3237

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
twiecki opened this issue Apr 2, 2013 · 7 comments
Closed

Feature request: df that only return dfs when indexing. #3237

twiecki opened this issue Apr 2, 2013 · 7 comments

Comments

@twiecki
Copy link
Contributor

twiecki commented Apr 2, 2013

With more complex dataframes, I often stumble over this:

In [1]: x = pandas.DataFrame({'a': [1,2,3], 'b': [4,5,6]})
In [2]: x.ix[0].a
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-8-8358f19a1e70> in <module>()
----> 1 x.ix[0].a

AttributeError: 'Series' object has no attribute 'a'

In [3]: x.a.ix[0]
Out[3]: 1

This is just an example, at other times I end up converting the Series back to a DataFrame because that's what the rest of the code expects.

I know that there have been attempts to solve this issue by adding attribute lookup to Series (e.g. #1904) but they seem to come with a performance penalty.

Often, however, I care more about expressiveness than performance. I thus propose the addition of an option like:

x = DataFrame(data, slicing_returns_df=True)

Which will cause x.ix[0] or x.a to return again a DataFrame rather than a Series and make the above work. The default would be False so that there are no backward compat issues. Alternatively there could be a new DataFrame class that inherits from DataFrame and has the desired behavior.

I'm happy to gives this a crack, however, I wanted to first make sure that it's not only me who thinks that'd be a good idea or that this can't work for obvious reason X.

@jreback
Copy link
Contributor

jreback commented Apr 2, 2013

are you looking for something like this? (this is 0.11-dev)
equiv in 0.10.1 is x.get_value(0,'a')

In [6]: x.at[0,'a']
Out[6]: 1

@jreback
Copy link
Contributor

jreback commented Apr 2, 2013

And if you ALWAYS want to force a df to return (the above is ALWAYS a scalar)

In [12]: x.loc[[0],['a']]
Out[12]: 
   a
0  1

@twiecki
Copy link
Contributor Author

twiecki commented Apr 2, 2013

.get_value() does not seem to support multi-indices x.get_value([0,1], 'a').

However, x.loc seems to do exactly what I need. Thanks!

@twiecki twiecki closed this as completed Apr 2, 2013
@jreback
Copy link
Contributor

jreback commented Apr 2, 2013

@twiecki great!

yes, by definition at (and get_value) return are for fast scalar access, see: http://pandas.pydata.org/pandas-docs/dev/indexing.html#fast-scalar-value-getting-and-setting

while loc (and ix) are more flexibile label based (and have a small performance penatly in order to figure out what you are after)

@twiecki
Copy link
Contributor Author

twiecki commented May 13, 2013

I tried this again just now but df.loc[0,:] returns a series again. Was this changed by any chance? This is with current master.

@twiecki twiecki reopened this May 13, 2013
@jreback
Copy link
Contributor

jreback commented May 13, 2013

You asked for a Series, this will always return a series, enclose the list of rows with a [] to always get a frame

and FYI, the 0 is a label here (and not a location!),

In [1]: df = DataFrame(randn(5,2),columns=list('AB'))

In [2]: df.loc[0,:]
Out[2]: 
A   -1.080064
B   -0.499382
Name: 0, dtype: float64

In [3]: df.loc[[0],:]
Out[3]: 
          A         B
0 -1.080064 -0.499382

@twiecki
Copy link
Contributor Author

twiecki commented May 13, 2013

Perfect. Thanks!

@twiecki twiecki closed this as completed May 13, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants