Skip to content

Possible memory leak in WiFiClientSecure on failed client.connect() #3808

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
robert-rudolph opened this issue Mar 11, 2020 · 6 comments
Closed
Labels
Status: Stale Issue is stale stage (outdated/stuck)

Comments

@robert-rudolph
Copy link

Hardware:

Board: ESP32 Dev Module (DevKitC / WROOM-32)
Core Installation version: 1.0.4
IDE name: Arduino IDE
Flash Frequency: 80Mhz
PSRAM enabled: no
Upload Speed: 921600
Computer OS: Mac OSX

Description:

I believe I've found a source of a memory leak in WiFiClientSecure, that occurs only when a client.connect() fails, with root_ca, key, and certificates having been set.

Test Setup:

We're chasing a memory leak that occurs only when a call to client.connect() fails. To simulate this, we'll force the handshake to fail by setting the timeout to 0. Then, we'll constantly free, re-allocate, and call connect on a WiFiClientSecure object via delete and new. We'll watch for heap usage as we do this.

Observations:

  • If connections succeed, heap usage stabilizes after first 15-20 connections (at ~4200)
  • If connections fail, heap usage continually increases (by about 4k for each failed connect), eventually leading to failed memory allocations in the wifi controller and in WifiClientSecure.
  • Compile with 'Core Debug Level' = Verbose to see these failed memory allocation messages.

Theory:

Deleting the object exercises the destructor WiFiClientSecure::~WiFiClientSecure(), which calls WiFiClientSecure::stop(), which calls stop_ssl_socket() in ssl_client.cpp.

I believe stop_ssl_socket() fails to free the certificates, and these calls are missing:

  • mbedtls_x509_crt_free(&ssl_client->ca_cert);
  • mbedtls_x509_crt_free(&ssl_client->client_cert);
  • mbedtls_pk_free(&ssl_client->client_key);

My Fix:

Testing with this modified function body for stop_ssl_socket() in ssl_client.cpp, solves the memory leak problem, at least in my testing, heap usage stabilizes at <2k.

void stop_ssl_socket(sslclient_context *ssl_client, const char *rootCABuff, const char *cli_cert, const char *cli_key)
  {
      log_v("Cleaning SSL connection.");

      if (ssl_client->socket >= 0) {
          close(ssl_client->socket);
          ssl_client->socket = -1;
      }

      // beginning of added section
      if (cli_key != NULL) {
          mbedtls_pk_free(&ssl_client->client_key);
      }
      if (rootCABuff != NULL) {
          mbedtls_x509_crt_free(&ssl_client->ca_cert);
      }
      if (cli_cert != NULL) {
          mbedtls_x509_crt_free(&ssl_client->client_cert);
      }
      // end of added section

      mbedtls_ssl_free(&ssl_client->ssl_ctx);
      mbedtls_ssl_config_free(&ssl_client->ssl_conf);
      mbedtls_ctr_drbg_free(&ssl_client->drbg_ctx);
      mbedtls_entropy_free(&ssl_client->entropy_ctx);
  }

Test Sketch:

/*
  Minimum program to check for a memory leak in WiFiClientSecure on ESP32.
  Robert Rudolph | March 11, 2020
  Tested on ESP32-DevKitC (ESP32-WROOM-32) | esp-arduino | core 1.0.4.
*/

#include <WiFiClientSecure.h>

#define SIMULATE_FAILED_HANDSHAKE  true

// wifi creds
const char* ssid     = "shed";
const char* password = "drawingideas";

// server URL and port
const char* server = "replace-with-your-endpoint.amazonaws.com";
const int   port = 8883;

const char root_ca[] = "-----BEGIN CERTIFICATE-----\n"
...
"-----END CERTIFICATE-----\n";

const char client_key[] = "-----BEGIN RSA PRIVATE KEY-----\n"
...
"-----END RSA PRIVATE KEY-----\n";

const char client_cert[] = "-----BEGIN CERTIFICATE-----\n"
...
"-----END CERTIFICATE-----\n";


// track some statistics as we go
long startup_heap_free = 0;
long iterations = 0;
long success_count = 0;
long failure_count = 0;

WiFiClientSecure *https_client;


void setup()
{
  // open serial debug
  Serial.begin(115200);
  delay(100);

  // connect to wifi
  Serial.printf("WiFi connecting to %s / %s...", ssid, password);
  WiFi.begin(ssid, password);
  while (WiFi.status() != WL_CONNECTED) {
    Serial.printf(".");
    delay(1000);
  }
  Serial.printf(" done.\n");

  // the first time through loop(), this will just get deleted
  https_client = new WiFiClientSecure();
}


void loop()
{

  // delete old instance and create a new instance
  delete https_client;
  https_client = new WiFiClientSecure();

  //  ^^^
  // This should be memory neutral (no significant decrease in free heap),
  // at least after it converges (heap usage should converge after about 15-20 iterations).
  // Below, we'll log heap usage to see if it's acutally neutral.


  // setup certificates and keys (defined at the top)
  https_client->setCACert(root_ca);
  https_client->setCertificate(client_cert);
  https_client->setPrivateKey(client_key);

  // change this flag at the top
  // we'll set the handshake timeout to 0 to make connect intentioanlly fail.
  if ( SIMULATE_FAILED_HANDSHAKE ) {
    https_client->setHandshakeTimeout(0); // seconds, default is 120
  }

  // try to connect
  Serial.printf("https_client connecting... ");
  https_client->connect(server, port);

  // check result and count our successes and failures
  if (https_client->connected()) {
    Serial.printf(" success!!\n");
    success_count++;
  } else {
    Serial.printf(" failure.\n");
    failure_count++;
  }

  // this will only run the first time through
  // we're just recording a baseline / heap watermark to compare against
  if (iterations == 0)   startup_heap_free = ESP.getFreeHeap();

  // print out our debugging info
  Serial.printf("Total heap consumed = %d  (%d iterations: %d successes, %d failures)\n", (startup_heap_free-ESP.getFreeHeap()), iterations, success_count, failure_count);

  iterations++;

  delay(1000);
}
@robert-rudolph
Copy link
Author

An update, the fix causes a panic if start_ssl_client() fails before certificates are loaded - e.g. if DNS resolution fails.

Moving calls to mbedtls_pk_free() and mbedtls_x509_crt_free to inside start_ssl_client() might be the better fix. Specifically:

log_v("Performing the SSL/TLS handshake...");
unsigned long handshake_start_time=millis();
while ((ret = mbedtls_ssl_handshake(&ssl_client->ssl_ctx)) != 0) {
    if (ret != MBEDTLS_ERR_SSL_WANT_READ && ret != MBEDTLS_ERR_SSL_WANT_WRITE) {
        // ++++++++++ ADDED TO FIX MEMORY LEAK ON FAILED CONNECTION ++++++++++
        if (cli_key != NULL)  mbedtls_pk_free(&ssl_client->client_key);
        if (rootCABuff != NULL) mbedtls_x509_crt_free(&ssl_client->ca_cert);
        if (cli_cert != NULL)  mbedtls_x509_crt_free(&ssl_client->client_cert);
        // ++++++++++ END ++++++++++
        return handle_error(ret);
    }
    if((millis()-handshake_start_time)>ssl_client->handshake_timeout) {
        // ++++++++++ ADDED TO FIX MEMORY LEAK ON FAILED CONNECTION ++++++++++
        if (cli_key != NULL)  mbedtls_pk_free(&ssl_client->client_key);
        if (rootCABuff != NULL) mbedtls_x509_crt_free(&ssl_client->ca_cert);
        if (cli_cert != NULL)  mbedtls_x509_crt_free(&ssl_client->client_cert);
        // ++++++++++ END ++++++++++
        return -1;
    }
    vTaskDelay(10 / portTICK_PERIOD_MS);
}

@stale
Copy link

stale bot commented Jun 5, 2020

[STALE_SET] This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale Issue is stale stage (outdated/stuck) label Jun 5, 2020
@stale
Copy link

stale bot commented Jun 20, 2020

[STALE_DEL] This stale issue has been automatically closed. Thank you for your contributions.

@stale stale bot closed this as completed Jun 20, 2020
@VoLinhTruc
Copy link
Contributor

The issue open on 12 Mar
But no response from espressif's dev.
So sad.

@brokentoaster
Copy link

This is still present in the current release. I am seeing it in normal operation on ESP32s with constrained throughput where the SSL handshake is taking longer than expected.

@robert-rudolph
Copy link
Author

@brokentoaster did the fix above solve your leak?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Stale Issue is stale stage (outdated/stuck)
Projects
None yet
Development

No branches or pull requests

3 participants