Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass list of Callables and list of param_dicts to build_vset #50

Closed
GiannisPikoulis opened this issue Nov 7, 2022 · 4 comments
Closed
Assignees

Comments

@GiannisPikoulis
Copy link

Hello again.

I'm having trouble passing a list of Callables and list of param_dicts to build_vset. The following error occurs: obj must be callable.
I'm passing a list of sklearn models and a corresponding list of parameter dicts. According to the documentation, this should work.

@jpdunc23 jpdunc23 self-assigned this Nov 9, 2022
@jpdunc23
Copy link
Collaborator

jpdunc23 commented Nov 9, 2022

Hey @GiannisPikoulis, thanks for checking out vflow!

Would you min sending a small reprex that I can test out on my end?

@GiannisPikoulis
Copy link
Author

GiannisPikoulis commented Nov 11, 2022

Hello.

Thanks for replying to my message. So when I tried running PCA with multiple params:

from vflow import build_vset
from sklearn.decomposition import PCA

pca = PCA()

# hyperparameters to try
pca_params = {
    'n_components': [10, 20, 50],
    'svd_solver': ['randomized', 'full', 'auto']
}

# we could instead pass a list of distinct models and corresponding param dicts
pca_set = build_vset('PCA', PCA, pca_params, is_async=True)

X_trains_pca = pca_set.fit_transform(X_trains)
X_tests_pca = pca_set.transform(X_tests)

the following error occured: TypeError: '>=' not supported between instances of 'dict' and 'int'.
When I tried running your RF_set snippet from the 04_feat_importance_stability notebook, a similar error occurs:

from sklearn.ensemble import RandomForestRegressor as RF
from vflow import build_vset

# hyperparameters to
RF_params = {
    'n_estimators': [100, 300],
    'min_samples_split': [2, 10]
}

# we could instead pass a list of distinct models and corresponding param dicts
RF_set = build_vset('RF', RF, RF_params, is_async=True)
RF_set.fit(X_trains, y_train)

ValueError: n_estimators must be an integer, got <class 'dict'>

@jpdunc23
Copy link
Collaborator

jpdunc23 commented Nov 15, 2022

Thanks for the example and sorry for the trouble. I think you're using the v0.1.1 tag which is on PyPI, but master (and unfortunately the docs) has quite a few updates not reflected there. We need to get a new release out on PyPI.

I ran the 04_feat_importance_stability notebook using the latest commit on master and it's working. You can install it via pip install git+https://github.com/Yu-group/veridical-flow --user.

With the v0.1.1 version, I think your error can be partially fixed by using param_dict=pca_params. We updated the signature to build_vset, so in the latest dev version the example is working as intended:

v0.1.1:

def build_vset(name: str, obj, *args, param_dict=None, reps: int = 1,
               is_async: bool = False, output_matching: bool = False,
               lazy: bool = False, cache_dir: str = None, verbose: bool = True,
               tracking_dir: str = None, **kwargs) -> Vset:

00cc344 :

def build_vset(name: str, func, param_dict=None, reps: int = 1,
               is_async: bool = False, output_matching: bool = False,
               lazy: bool = False, cache_dir: str = None,
               tracking_dir: str = None, **kwargs) -> Vset:

But I think your PCA example may have uncovered another issue related to async Vsets with multiple steps. Basically, after calling fit on a Vset with is_async=True, transform tries to call ray.get but gets an array instead of a ray.objectRef. As a non-ideal workaround, you can do something like this:

pca_set = build_vset('PCA', PCA, pca_params, lazy=True, is_async=True)
X_trains_pca = {key: val() for key, val in X_trains_pca.items() if key != '__prev__'}

I just created an issue related to this and will try to find some time this week to look into it: #51

@jpdunc23
Copy link
Collaborator

Closing this, but please reopen if you still have issues with the version of build_vset on master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants