GPU Hardware Specifications
Complete technical specifications, benchmarks, and comparisons for data center and consumer GPUs
📊 Popular Comparisons
🚀 Enterprise GPUs
NVIDIA B200LATEST
Blackwell Architecture • 2025
- Memory: 192GB HBM3e
- Bandwidth: 8.0 TB/s
- FP8 PFLOPS: 4.5/9.0 (dense/sparse)
- TDP: 1000W
- Interconnect: NVLink 5.0
Cloud Price:
Compare
$5.89 - $7.99/hr
NVIDIA H200NEW
Hopper Architecture • 2024
- Memory: 141GB HBM3e
- Bandwidth: 4.8 TB/s
- FP8 PFLOPS: 32 (peak)
- TDP: 700W
- Interconnect: NVLink 4.0
Cloud Price:
Compare
$3.58 - $10.60/hr
NVIDIA H100
Hopper Architecture • 2022
- Memory: 80GB HBM3
- Bandwidth: 3.35 TB/s
- FP16 TFLOPS: 989
- TDP: 700W
- Interconnect: NVLink 4.0
Cloud Price:
Compare
$2.29 - $4.99/hr
NVIDIA A100
Ampere Architecture • 2020
- Memory: 40/80GB HBM2e
- Bandwidth: 2.0 TB/s
- FP16 TFLOPS: 312
- TDP: 400W
- Interconnect: NVLink 3.0
Cloud Price:
Compare with H100
$1.65 - $3.67/hr
NVIDIA V100
Volta Architecture • 2017
- Memory: 16/32GB HBM2
- Bandwidth: 900 GB/s
- FP16 TFLOPS: 125
- TDP: 300W
- Interconnect: NVLink 2.0
Cloud Price:
Compare
$1.11 - $2.30/hr
AMD MI355X2025
CDNA 4 • June 2025
- Memory: 288GB HBM3e
- Bandwidth: 8.0 TB/s
- FP8 TFLOPS: 2,615
- TDP: 750W
- Process: 3nm N3P
Cloud Price:
Compare
$4.99/hr
AMD MI325X
CDNA 3 • 2024
- Memory: 256GB HBM3e
- Bandwidth: 6.0 TB/s
- FP16 TFLOPS: 1,307
- TDP: 750W
- Interconnect: Infinity Fabric
Cloud Price:
Compare
$2.25 - $2.49/hr
Intel Gaudi 3
5nm Process • 2024
- Memory: 128GB HBM2e
- Bandwidth: 3.7 TB/s
- FP8 PFLOPS: 1.835
- TDP: 600W
- Network: 24× 200GbE
Cloud Price:
Compare
$1.99/hr
Google TPU v7PREVIEW
Ironwood • 2025
- Memory: 192GB HBM
- Bandwidth: 7.2 TB/s
- FP8 PFLOPS: 4.6
- BF16 TFLOPS: 2,300
- Interconnect: 1.2 Tbps ICI
Cloud Price:
Learn More
$8.50/hr (preview)
AWS Trainium3DEC 2025
3nm Process • Dec 2025
- Memory: 144GB HBM3e
- Bandwidth: 4.9 TB/s
- FP8 PFLOPS: 2.52
- Performance: 4.4× vs Trn2
- Efficiency: 4× perf/watt
Cloud Price:
Compare
$3.85/hr
🎮 Consumer/Prosumer GPUs
RTX 4090FLAGSHIP
Ada Lovelace • 2022
- Memory: 24GB GDDR6X
- Bandwidth: 1.01 TB/s
- FP16 TFLOPS: 82.6
- TDP: 450W
- CUDA Cores: 16,384
Cloud Price:
Compare
$0.65 - $0.79/hr
RTX A6000
Ampere Workstation • 2020
- Memory: 48GB GDDR6
- Bandwidth: 768 GB/s
- FP16 TFLOPS: 77.0
- TDP: 300W
- CUDA Cores: 10,752
Cloud Price:
Compare
$1.28 - $1.89/hr
RTX 3090LEGACY
Ampere Gaming • 2020
- Memory: 24GB GDDR6X
- Bandwidth: 936 GB/s
- FP16 TFLOPS: 71.0
- TDP: 350W
- CUDA Cores: 10,496
Cloud Price:
Compare
$0.44 - $0.69/hr
A40
Ampere Professional • 2020
- Memory: 48GB GDDR6
- Bandwidth: 696 GB/s
- FP16 TFLOPS: 74.7
- TDP: 300W
- CUDA Cores: 10,752
Cloud Price:
Compare
$1.28 - $1.65/hr
📊 Performance Benchmarks
| GPU Model | LLaMA 7B Training | Stable Diffusion | BERT Fine-tune | Inference (tok/s) | Score |
|---|---|---|---|---|---|
| B200 192GB | 1.5 hours | 28 img/s | 15 min | 6,200 |
|
| H200 141GB | 2.5 hours | 18 img/s | 25 min | 4,100 |
|
| H100 80GB | 4 hours | 12 img/s | 45 min | 2,850 |
|
| MI325X 256GB | 10 hours | 10 img/s | 70 min | 1,450 |
|
| A100 80GB | 12 hours | 8 img/s | 90 min | 1,200 |
|
| A100 40GB | 14 hours | 7 img/s | 100 min | 1,100 |
|
| V100 32GB | 22 hours | 5 img/s | 150 min | 650 |
|
| RTX 4090 | 18 hours | 10 img/s | 120 min | 850 |
|
| RTX 3090 | 26 hours | 6 img/s | 180 min | 500 |
|
📚 2026 Hardware Selection Guide
- Frontier Models (>100B): B200 192GB, MI355X 288GB, or multiple H200s
- Large Language Models (30-100B): H200 141GB, MI325X 256GB, or H100 80GB clusters
- Medium Models (7-30B): H100 80GB, A100 80GB, or MI300X 192GB
- Small Models (<7B): RTX 4090, A100 40GB, or Gaudi 3
- Image Generation: RTX 4090 (best value), H100 for production scale
- Inference at Scale: Groq LPU, AWS Trainium3, or Intel Gaudi 3
- Cost-Optimized Training: MI325X (best GB/$), Intel Gaudi 3 (lowest $/hr)
- Research/Development: RTX 4090 or cloud spot instances