Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] Memory leak when storing Booster objects in a list? #11355

Closed
fabvass opened this issue Mar 20, 2025 · 4 comments · Fixed by #11357
Closed

[R] Memory leak when storing Booster objects in a list? #11355

fabvass opened this issue Mar 20, 2025 · 4 comments · Fixed by #11357

Comments

@fabvass
Copy link

fabvass commented Mar 20, 2025

Hi there,

Apologies if this was already addressed somewhere else, but I could not find an Issue here that touches exactly on this.

So, while trying XGBoost 3.0 in R enabled for GPU, I noted that, storing the Booster that is returned by xgboost() inside a list leads to what seems to be a GPU memory leak. Clearly, this appears to be related to Boosters now being R 'ALTLIST' with external pointers.

To illustrate the problem, consider the following set-up:

##### Showing GPU memory not being released by XGBoost
library(xgboost)
xgb.set.config(verbosity = 0)
dat <- data.matrix(mtcars)
y <- ifelse(dat[,2] <= 6, 1, 0)
x <- dat[,-2]
S <- 100
outl <- list()

Now, if I simply re-train a model on those data in a loop, here's the GPU memory accumulation that I get:

### With XGBoost 3.0.0 and no gc():

# GPU memory in use before the loop: ~0.58 Gb
t0 <- Sys.time()
for(i in 1:S)
{
    model <- xgboost(x=x, y=as.factor(y),
                     nrounds = 3,
                     objective = "binary:logistic",
                     device = "cuda",
                     tree_method = "hist")
    outl[[i]] <- model
}
t1 <- Sys.time()
t1 - t0 # runs in ~4.35 seconds
yhat <- ifelse(predict(outl[[S]], x)>.5,1,0)
sum(y==yhat)/length(y) # returns ~0.999
# GPU peak during the loop execution: ~4.1 Gb
# GPU memory in use after the loop: ~3.9 Gb

Calling garbage collection explicitly, as suggested here, does not help:

### With XGBoost 3.0.0 plus gc():

# GPU memory in use before the loop: ~0.58 Gb
t0 <- Sys.time()
for(i in 1:S)
{
    model <- xgboost(x=x, y=as.factor(y),
                     nrounds = 3,
                     objective = "binary:logistic",
                     device = "cuda",
                     tree_method = "hist")
    outl[[i]] <- model
    rm(model)
    gc()
}
t1 <- Sys.time()
t1 - t0 # runs in ~17.61 seconds
yhat <- ifelse(predict(outl[[S]], x)>.5,1,0)
sum(y==yhat)/length(y) # returns ~0.999
# GPU peak during the loop execution: ~4.0 Gb
# GPU memory in use after the loop: ~3.9 Gb

Note that calling garbage collection at the end makes no difference. Also, note that if model is not stored in a list AND garbage collection is called, the problem almost entirely disappears (well, we still remain with 36% more GPU memory used, but in absolute terms it's a small amount):

### With XGBoost 3.0.0 plus gc() without storing 'model' in a list:

# GPU memory in use before the loop: ~0.58 Gb
t0 <- Sys.time()
for(i in 1:S)
{
    model <- xgboost(x=x, y=as.factor(y),
                     nrounds = 3,
                     objective = "binary:logistic",
                     device = "cuda",
                     tree_method = "hist")
    rm(model)
    gc()
}
t1 <- Sys.time()
t1 - t0 # runs in ~17.61 seconds
# GPU peak during the loop execution: ~0.83 Gb
# GPU memory in use after the loop: ~0.79 Gb

For reference, the issue does not happen in any of the above examples if using XGBoost 1.5.0 instead of 3.0.0.

Given the criticality of storing Boosters in data containers, I wonder if there is a way work around it? If not, I think this side effect of storing Boosters should be made very clear in the docs.

Thanks!

OBS: this was tested using a Nvidia GeForce RTX 3080 12Gb GPU, in a system running Ubuntu 22.04.

@trivialfis
Copy link
Member

It's the memory held by the booster object for things like gradient and prediction cache.

cc @david-cortes

We can use the reset method #11042 upon returning from xgb.train. Internally, the method serializes and deserializes (which can be used as a workaround for now) the booster to free all GPU memory.

@fabvass
Copy link
Author

fabvass commented Mar 20, 2025

@trivialfis thanks! Looking forward to the linked solution in the future release.

In the meantime, I didn't understand what is the workaround for now in the R case, if any. I see that in the case of Julia's intrerface you mentioned:

easiest workaround is simply copying the model and ditch the old one.

But that does not seem to work in R.

@trivialfis
Copy link
Member

Opened a PR for implementing the reset function for R #11357 .

@trivialfis
Copy link
Member

I have tested the memory usage. Please use nightly build for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants