Feature request: df that only return dfs when indexing. #3237

twiecki · 2013-04-02T13:08:03Z

With more complex dataframes, I often stumble over this:

In [1]: x = pandas.DataFrame({'a': [1,2,3], 'b': [4,5,6]})
In [2]: x.ix[0].a
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-8-8358f19a1e70> in <module>()
----> 1 x.ix[0].a

AttributeError: 'Series' object has no attribute 'a'

In [3]: x.a.ix[0]
Out[3]: 1

This is just an example, at other times I end up converting the Series back to a DataFrame because that's what the rest of the code expects.

I know that there have been attempts to solve this issue by adding attribute lookup to Series (e.g. #1904) but they seem to come with a performance penalty.

Often, however, I care more about expressiveness than performance. I thus propose the addition of an option like:

x = DataFrame(data, slicing_returns_df=True)

Which will cause x.ix[0] or x.a to return again a DataFrame rather than a Series and make the above work. The default would be False so that there are no backward compat issues. Alternatively there could be a new DataFrame class that inherits from DataFrame and has the desired behavior.

I'm happy to gives this a crack, however, I wanted to first make sure that it's not only me who thinks that'd be a good idea or that this can't work for obvious reason X.

The text was updated successfully, but these errors were encountered:

jreback · 2013-04-02T13:15:40Z

are you looking for something like this? (this is 0.11-dev)
equiv in 0.10.1 is x.get_value(0,'a')

In [6]: x.at[0,'a']
Out[6]: 1

jreback · 2013-04-02T13:20:56Z

And if you ALWAYS want to force a df to return (the above is ALWAYS a scalar)

In [12]: x.loc[[0],['a']]
Out[12]: 
   a
0  1

twiecki · 2013-04-02T14:17:25Z

.get_value() does not seem to support multi-indices x.get_value([0,1], 'a').

However, x.loc seems to do exactly what I need. Thanks!

jreback · 2013-04-02T14:21:03Z

@twiecki great!

yes, by definition at (and get_value) return are for fast scalar access, see: http://pandas.pydata.org/pandas-docs/dev/indexing.html#fast-scalar-value-getting-and-setting

while loc (and ix) are more flexibile label based (and have a small performance penatly in order to figure out what you are after)

twiecki · 2013-05-13T15:50:30Z

I tried this again just now but df.loc[0,:] returns a series again. Was this changed by any chance? This is with current master.

jreback · 2013-05-13T15:56:45Z

You asked for a Series, this will always return a series, enclose the list of rows with a [] to always get a frame

and FYI, the 0 is a label here (and not a location!),

In [1]: df = DataFrame(randn(5,2),columns=list('AB'))

In [2]: df.loc[0,:]
Out[2]: 
A   -1.080064
B   -0.499382
Name: 0, dtype: float64

In [3]: df.loc[[0],:]
Out[3]: 
          A         B
0 -1.080064 -0.499382

twiecki · 2013-05-13T16:54:59Z

Perfect. Thanks!

twiecki closed this as completed Apr 2, 2013

twiecki reopened this May 13, 2013

twiecki closed this as completed May 13, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: df that only return dfs when indexing. #3237

Feature request: df that only return dfs when indexing. #3237

twiecki commented Apr 2, 2013

jreback commented Apr 2, 2013

jreback commented Apr 2, 2013

twiecki commented Apr 2, 2013

jreback commented Apr 2, 2013

twiecki commented May 13, 2013

jreback commented May 13, 2013

twiecki commented May 13, 2013

Feature request: df that only return dfs when indexing. #3237

Feature request: df that only return dfs when indexing. #3237

Comments

twiecki commented Apr 2, 2013

jreback commented Apr 2, 2013

jreback commented Apr 2, 2013

twiecki commented Apr 2, 2013

jreback commented Apr 2, 2013

twiecki commented May 13, 2013

jreback commented May 13, 2013

twiecki commented May 13, 2013