[R] Memory leak when storing Booster objects in a list? #11355

fabvass · 2025-03-20T16:54:17Z

Hi there,

Apologies if this was already addressed somewhere else, but I could not find an Issue here that touches exactly on this.

So, while trying XGBoost 3.0 in R enabled for GPU, I noted that, storing the Booster that is returned by xgboost() inside a list leads to what seems to be a GPU memory leak. Clearly, this appears to be related to Boosters now being R 'ALTLIST' with external pointers.

To illustrate the problem, consider the following set-up:

##### Showing GPU memory not being released by XGBoost
library(xgboost)
xgb.set.config(verbosity = 0)
dat <- data.matrix(mtcars)
y <- ifelse(dat[,2] <= 6, 1, 0)
x <- dat[,-2]
S <- 100
outl <- list()

Now, if I simply re-train a model on those data in a loop, here's the GPU memory accumulation that I get:

### With XGBoost 3.0.0 and no gc():

# GPU memory in use before the loop: ~0.58 Gb
t0 <- Sys.time()
for(i in 1:S)
{
    model <- xgboost(x=x, y=as.factor(y),
                     nrounds = 3,
                     objective = "binary:logistic",
                     device = "cuda",
                     tree_method = "hist")
    outl[[i]] <- model
}
t1 <- Sys.time()
t1 - t0 # runs in ~4.35 seconds
yhat <- ifelse(predict(outl[[S]], x)>.5,1,0)
sum(y==yhat)/length(y) # returns ~0.999
# GPU peak during the loop execution: ~4.1 Gb
# GPU memory in use after the loop: ~3.9 Gb

Calling garbage collection explicitly, as suggested here, does not help:

### With XGBoost 3.0.0 plus gc():

# GPU memory in use before the loop: ~0.58 Gb
t0 <- Sys.time()
for(i in 1:S)
{
    model <- xgboost(x=x, y=as.factor(y),
                     nrounds = 3,
                     objective = "binary:logistic",
                     device = "cuda",
                     tree_method = "hist")
    outl[[i]] <- model
    rm(model)
    gc()
}
t1 <- Sys.time()
t1 - t0 # runs in ~17.61 seconds
yhat <- ifelse(predict(outl[[S]], x)>.5,1,0)
sum(y==yhat)/length(y) # returns ~0.999
# GPU peak during the loop execution: ~4.0 Gb
# GPU memory in use after the loop: ~3.9 Gb

Note that calling garbage collection at the end makes no difference. Also, note that if model is not stored in a list AND garbage collection is called, the problem almost entirely disappears (well, we still remain with 36% more GPU memory used, but in absolute terms it's a small amount):

### With XGBoost 3.0.0 plus gc() without storing 'model' in a list:

# GPU memory in use before the loop: ~0.58 Gb
t0 <- Sys.time()
for(i in 1:S)
{
    model <- xgboost(x=x, y=as.factor(y),
                     nrounds = 3,
                     objective = "binary:logistic",
                     device = "cuda",
                     tree_method = "hist")
    rm(model)
    gc()
}
t1 <- Sys.time()
t1 - t0 # runs in ~17.61 seconds
# GPU peak during the loop execution: ~0.83 Gb
# GPU memory in use after the loop: ~0.79 Gb

For reference, the issue does not happen in any of the above examples if using XGBoost 1.5.0 instead of 3.0.0.

Given the criticality of storing Boosters in data containers, I wonder if there is a way work around it? If not, I think this side effect of storing Boosters should be made very clear in the docs.

Thanks!

OBS: this was tested using a Nvidia GeForce RTX 3080 12Gb GPU, in a system running Ubuntu 22.04.

The text was updated successfully, but these errors were encountered:

trivialfis · 2025-03-20T18:15:37Z

It's the memory held by the booster object for things like gradient and prediction cache.

cc @david-cortes

We can use the reset method #11042 upon returning from xgb.train. Internally, the method serializes and deserializes (which can be used as a workaround for now) the booster to free all GPU memory.

fabvass · 2025-03-20T20:40:06Z

@trivialfis thanks! Looking forward to the linked solution in the future release.

In the meantime, I didn't understand what is the workaround for now in the R case, if any. I see that in the case of Julia's intrerface you mentioned:

easiest workaround is simply copying the model and ditch the old one.

But that does not seem to work in R.

trivialfis · 2025-03-22T17:50:26Z

Opened a PR for implementing the reset function for R #11357 .

trivialfis · 2025-03-24T06:26:45Z

I have tested the memory usage. Please use nightly build for now.

trivialfis mentioned this issue Mar 22, 2025

[R] Implement Booster reset. #11357

Merged

trivialfis closed this as completed in #11357 Mar 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[R] Memory leak when storing Booster objects in a list? #11355

[R] Memory leak when storing Booster objects in a list? #11355

fabvass commented Mar 20, 2025 •

edited by trivialfis

Loading

trivialfis commented Mar 20, 2025

fabvass commented Mar 20, 2025

trivialfis commented Mar 22, 2025

trivialfis commented Mar 24, 2025

[R] Memory leak when storing Booster objects in a list? #11355

[R] Memory leak when storing Booster objects in a list? #11355

Comments

fabvass commented Mar 20, 2025 • edited by trivialfis Loading

trivialfis commented Mar 20, 2025

fabvass commented Mar 20, 2025

trivialfis commented Mar 22, 2025

trivialfis commented Mar 24, 2025

fabvass commented Mar 20, 2025 •

edited by trivialfis

Loading