Home

GPU Monitor Wiki

Welcome to the GPU Monitor Wiki! This comprehensive guide will help you understand, install, and use the GPU Monitor dashboard.

Overview

GPU Monitor is a real-time NVIDIA GPU monitoring dashboard that provides comprehensive metrics and performance data through an intuitive web interface. Built with Docker for easy deployment and cross-platform compatibility, it offers real-time monitoring, historical data tracking, and customizable alerts.

Features

Real-Time Monitoring

Temperature Tracking: Monitor GPU temperature in real-time with color-coded indicators
Utilization Metrics: Track GPU usage percentage with historical data
Memory Usage: Monitor VRAM usage and availability
Power Consumption: Track power usage and efficiency metrics

Historical Data

Multiple timeframe views:
- 15 Minutes
- 30 Minutes
- 1 Hour
- 6 Hours
- 12 Hours
- 24 Hours
Interactive performance graphs
Statistical analysis of historical data

Alert System

Configurable threshold alerts for:
- Temperature
- GPU Utilization
- Power Usage
Multiple notification methods:
- Visual alerts
- Sound notifications
- Browser notifications
Persistent alert settings

Dashboard Components

Real-time gauge displays
Interactive performance history graph
Recent statistics table
24-hour statistics overview

Installation

Prerequisites

Docker
NVIDIA GPU with drivers installed
NVIDIA Container Toolkit

Quick Start

Using Docker Run

docker run -d \
  --name gpu-monitor \
  -p 8081:8081 \
  -e TZ=America/Los_Angeles \
  -v /etc/localtime:/etc/localtime:ro \
  -v ./history:/app/history \
  -v ./logs:/app/logs \
  --gpus all \
  --restart unless-stopped \
  bigsk1/gpu-monitor:latest

Using Docker Compose

version: '3.8'

services:
  gpu-monitor:
    image: bigsk1/gpu-monitor:latest
    container_name: gpu-monitor
    ports:
      - "8081:8081"
    environment:
      - TZ=America/Los_Angeles
    volumes:
      - /etc/localtime:/etc/localtime:ro
      - ./history:/app/history
      - ./logs:/app/logs
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    restart: unless-stopped
    runtime: nvidia

Access

Access the dashboard at: http://localhost:8081

Configuration

Time Zone Configuration

Set your local timezone using the TZ environment variable:

-e TZ=America/Los_Angeles

List of available timezones

Alert Settings

Configure alert thresholds through the UI:

Temperature (°C)
GPU Utilization (%)
Power Usage (W)

Alert settings are persistent and stored in browser local storage.

Data Persistence

Data is persisted through Docker volumes:

volumes:
  - ./history:/app/history    # Historical data
  - ./logs:/app/logs         # Application logs

Usage Guide

Dashboard Navigation

Real-Time Metrics

Temperature: Current GPU temperature with color-coded gauge
GPU Utilization: Current usage percentage
Memory Usage: VRAM usage in MiB
Power Usage: Current power consumption in watts

Historical Data View

Select timeframe using the buttons:
- 15 Minutes
- 30 Minutes
- 1 Hour
- 6 Hours
- 12 Hours
- 24 Hours
View performance indicators:
- Peak Temperature
- Average Utilization
- Maximum Memory Usage
- Power Efficiency

Interactive Graph

Toggle metrics by clicking on gauge cards
View multiple metrics simultaneously
Color-coded lines for easy identification:
- Temperature: Red
- GPU Usage: Green
- Memory: White
- Power: Purple

Alert Management

Access Alert Settings panel
Configure thresholds:
- Temperature Threshold (°C)
- GPU Utilization Threshold (%)
- Power Threshold (W)
Enable/disable:
- Sound Alerts
- Browser Notifications

Components

Backend Components

NVIDIA SMI Integration
Data Collection Service
Web Server (aiohttp)
Historical Data Processing

Frontend Components

Real-time Gauges
Interactive Charts (Chart.js)
Alert System
Responsive Design

Monitoring Features

Metrics Collection

4-second update interval for real-time data
Efficient data buffering
Automatic log rotation
Historical data aggregation

Data Visualization

Color-coded gauges
Multi-metric graphing
Responsive design
Mobile compatibility

Troubleshooting

Common Issues

NVIDIA SMI Not Found

# Verify NVIDIA drivers
nvidia-smi

# Test Docker NVIDIA runtime
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

Container Fails to Start

Check Docker logs:

docker logs gpu-monitor

Verify GPU access
Check port availability

Dashboard Not Accessible

Verify container is running:

docker ps

Check port mapping
Verify network access

Debug Logging

Enable debug logging by uncommenting in monitor_gpu.sh:

DEBUG=true

Advanced Topics

Custom Alert Sounds

Replace alert.mp3 in the sounds directory with your preferred sound file.

Data Retention

Configure log rotation in monitor_gpu.sh:

local max_size=$((10 * 1024 * 1024))  # 10MB
local max_age=$((2 * 24 * 3600))      # 2 days

Performance Optimization

Buffer size configuration
Update interval adjustment
Log rotation settings

Security Considerations

Container isolation
Volume permissions
Network access control