Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed: limit on number of packages + add rworkflows #67

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

bschilder
Copy link

  • Fix cran_downloads and prevent it from failing with many packages:
    limit on number of packages as argument to cran_downloads #56
  • Speed up cran_downloads with parallelisation.
  • Add new unit test for cran_downloads with lots of packages.
  • Remove dontrun{} from cran_downloads and cran_top_downloads examples.
    Unclear why this was here?
  • New support function: message_parallel, split_batches
  • Replace travis (failing) and pkgdown with rworkflows.
  • Update Authors in DESCRIPTION to reflect contribution, make fields explicit.
  • Add node_modules$ to .Rbuildignore.

All you need to do on your end @gaborcsardi is add GH token as a GH secret named "PAT_GITHUB" to the cranlogs repo.

Best,
Brian

@bschilder
Copy link
Author

bschilder commented Dec 20, 2022

Heads up @gaborcsardi , cranlogs.org seems to be down today. So none of the functions will work until that is back up:
https://cranlogs.r-pkg.org/

Screenshot 2022-12-20 at 10 40 45

Site is back up now.

@bschilder
Copy link
Author

bschilder commented Jan 3, 2023

@gaborcsardi have you (or any other maintainers of cranlogs) had a chance to look this over?

Thanks,
Brian

@gaborcsardi
Copy link
Contributor

Thanks for the PR, but I am not going to merge this, sorry.

I prefer using our workflows instead of rworkflows.

As for the parallelization, I prefer not to use parallel for HTTP.

@bschilder
Copy link
Author

Thanks for the PR, but I am not going to merge this, sorry.

Ok, I suppose that's up to you, but would you mind providing some justifications for the benefit of users who requested some of these features? eg #56

I prefer using our workflows instead of rworkflows.

It seems your workflows has been failing for several years now, unless I'm missing something. If you were to fix this i think it would be a decent alternative.

Screenshot 2023-01-03 at 11 05 07

As for the parallelization, I prefer not to use parallel for HTTP.

Is there a technical reason for this? I'd be curious to know myself! Similarly, is there an alternative implementation that would suit this purpose better?

If you do not wish to support parallelization at all, you have some other options available to you (in order of preference):

  1. Simply replace parallel::mclapply with lapply. This will at least resolve the bug described here limit on number of packages as argument to cran_downloads #56
  2. Adding some documentation or warnings to notify users that cranlogs is currently unable to request info for more than ~X packages at a time.

Happy to adjust the PR as needed, just let me know what you'd like to accomplish with this package.

@gaborcsardi
Copy link
Contributor

It seems your workflows has been failing for several years now, unless I'm missing something. If you were to fix this i think it would be a decent alternative.

I prefer using usethis::use_tidy_github_actions()

Is there a technical reason for this?

Yes, several. Using multiple processes for concurrent HTTP is an anti-pattern. mcapply() is also unsafe on macOS (search for "fork without exec on macOS"), and does not work on Windows. Here is a good implementation for parallel HTTP with curl: https://github.com/r-lib/rcmdcheck/blob/0cb1073db067d59f2659e600678758f25e00a512/R/http.R

However, in general, I am not sure if I want to provide a quick way for people to "rank" all CRAN packages according to downloads. I don't really want people to think that this is an indication of package quality.

@bschilder
Copy link
Author

It seems your workflows has been failing for several years now, unless I'm missing something. If you were to fix this i think it would be a decent alternative.

I prefer using usethis::use_tidy_github_actions()

Haven't come across this one! I'll check it out though. As you may already be aware, there's also biocthis, just FYI.

Even so, if you're no longer using Travis, perhaps it's time to retire it from here?

Is there a technical reason for this?

Yes, several. Using multiple processes for concurrent HTTP is an anti-pattern. mcapply() is also unsafe on macOS (search for "fork without exec on macOS"), and does not work on Windows. Here is a good implementation for parallel HTTP with curl: https://github.com/r-lib/rcmdcheck/blob/0cb1073db067d59f2659e600678758f25e00a512/R/http.R

This is great information to have! Thanks for the explanation.

However, in general, I am not sure if I want to provide a quick way for people to "rank" all CRAN packages according to downloads. I don't really want people to think that this is an indication of package quality.

Hm, I hadn't thought of it that way. I personally don't agree with this reasoning as I think it's best to give the users the complete data and make decisions about how to responsibly use it themselves (perhaps with a disclaimer in the documentation). For example, in my case I'm using the downloads to assess package usage, but I wouldn't think to use this as a metric of quality (not sure how many other users would?).

Would you mind updating this thread here to let the other users know of your decision either way? Thanks! #56

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants