-
-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Welcome to the GPU Monitor Wiki! This comprehensive guide will help you understand, install, and use the GPU Monitor dashboard.
- Overview
- Features
- Installation
- Configuration
- Usage Guide
- Components
- Monitoring Features
- Troubleshooting
- Advanced Topics
GPU Monitor is a real-time NVIDIA GPU monitoring dashboard that provides comprehensive metrics and performance data through an intuitive web interface. Built with Docker for easy deployment and cross-platform compatibility, it offers real-time monitoring, historical data tracking, and customizable alerts.
- Temperature Tracking: Monitor GPU temperature in real-time with color-coded indicators
- Utilization Metrics: Track GPU usage percentage with historical data
- Memory Usage: Monitor VRAM usage and availability
- Power Consumption: Track power usage and efficiency metrics
- Multiple timeframe views:
- 15 Minutes
- 30 Minutes
- 1 Hour
- 6 Hours
- 12 Hours
- 24 Hours
- Interactive performance graphs
- Statistical analysis of historical data
- Configurable threshold alerts for:
- Temperature
- GPU Utilization
- Power Usage
- Multiple notification methods:
- Visual alerts
- Sound notifications
- Browser notifications
- Persistent alert settings
- Real-time gauge displays
- Interactive performance history graph
- Recent statistics table
- 24-hour statistics overview
- Docker
- NVIDIA GPU with drivers installed
- NVIDIA Container Toolkit
docker run -d \
--name gpu-monitor \
-p 8081:8081 \
-e TZ=America/Los_Angeles \
-v /etc/localtime:/etc/localtime:ro \
-v ./history:/app/history \
-v ./logs:/app/logs \
--gpus all \
--restart unless-stopped \
bigsk1/gpu-monitor:latest
version: '3.8'
services:
gpu-monitor:
image: bigsk1/gpu-monitor:latest
container_name: gpu-monitor
ports:
- "8081:8081"
environment:
- TZ=America/Los_Angeles
volumes:
- /etc/localtime:/etc/localtime:ro
- ./history:/app/history
- ./logs:/app/logs
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
restart: unless-stopped
runtime: nvidia
Access the dashboard at: http://localhost:8081
Set your local timezone using the TZ
environment variable:
-e TZ=America/Los_Angeles
Configure alert thresholds through the UI:
- Temperature (°C)
- GPU Utilization (%)
- Power Usage (W)
Alert settings are persistent and stored in browser local storage.
Data is persisted through Docker volumes:
volumes:
- ./history:/app/history # Historical data
- ./logs:/app/logs # Application logs
- Temperature: Current GPU temperature with color-coded gauge
- GPU Utilization: Current usage percentage
- Memory Usage: VRAM usage in MiB
- Power Usage: Current power consumption in watts
- Select timeframe using the buttons:
- 15 Minutes
- 30 Minutes
- 1 Hour
- 6 Hours
- 12 Hours
- 24 Hours
- View performance indicators:
- Peak Temperature
- Average Utilization
- Maximum Memory Usage
- Power Efficiency
- Toggle metrics by clicking on gauge cards
- View multiple metrics simultaneously
- Color-coded lines for easy identification:
- Temperature: Red
- GPU Usage: Green
- Memory: White
- Power: Purple
- Access Alert Settings panel
- Configure thresholds:
- Temperature Threshold (°C)
- GPU Utilization Threshold (%)
- Power Threshold (W)
- Enable/disable:
- Sound Alerts
- Browser Notifications
- NVIDIA SMI Integration
- Data Collection Service
- Web Server (aiohttp)
- Historical Data Processing
- Real-time Gauges
- Interactive Charts (Chart.js)
- Alert System
- Responsive Design
- 4-second update interval for real-time data
- Efficient data buffering
- Automatic log rotation
- Historical data aggregation
- Color-coded gauges
- Multi-metric graphing
- Responsive design
- Mobile compatibility
# Verify NVIDIA drivers
nvidia-smi
# Test Docker NVIDIA runtime
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
- Check Docker logs:
docker logs gpu-monitor
- Verify GPU access
- Check port availability
- Verify container is running:
docker ps
- Check port mapping
- Verify network access
Enable debug logging by uncommenting in monitor_gpu.sh:
DEBUG=true
Replace alert.mp3
in the sounds directory with your preferred sound file.
Configure log rotation in monitor_gpu.sh:
local max_size=$((10 * 1024 * 1024)) # 10MB
local max_age=$((2 * 24 * 3600)) # 2 days
- Buffer size configuration
- Update interval adjustment
- Log rotation settings
- Container isolation
- Volume permissions
- Network access control