Computing Power Glossary
A
A100
[ay-one-hundred]
NVIDIA's flagship data center GPU based on the Ampere architecture, released in 2020. Available in 40GB and 80GB HBM2e memory configurations.
Example:
"We need 8x A100 80GB GPUs to train our 70B parameter model efficiently."
Ampere Architecture
NVIDIA's GPU microarchitecture generation released in 2020, featuring improved tensor cores and support for sparsity acceleration.
B
Batch Size
The number of training samples processed together in one forward/backward pass during neural network training. Larger batch sizes can improve GPU utilization but require more VRAM.
Example:
"Increasing the batch size from 32 to 128 improved our H100 utilization from 60% to 95%."
C
CUDA
[koo-dah]
Compute Unified Device Architecture - NVIDIA's parallel computing platform and API that allows software to use GPUs for general-purpose processing.
Example:
"This model requires CUDA 11.8 or higher for optimal performance."
CUDA Cores
The parallel processing units within NVIDIA GPUs. More CUDA cores generally means better parallel processing performance.
F
FLOPS
[flops]
Floating Point Operations Per Second - A measure of computer performance, especially in scientific computations. Common scales: TFLOPS (trillion), PFLOPS (quadrillion), EFLOPS (quintillion).
Example:
"The H100 delivers up to 989 TFLOPS of FP16 tensor performance."
FP16 (Half Precision)
16-bit floating-point format that uses half the memory of FP32, enabling faster computation and reduced memory usage with minimal accuracy loss for many AI workloads.
G
GPU
[jee-pee-you]
Graphics Processing Unit - A specialized processor originally designed for graphics rendering, now widely used for parallel computing tasks including AI/ML training and inference.
H
H100
[aych-one-hundred]
NVIDIA's latest flagship data center GPU based on the Hopper architecture, released in 2022. Features 80GB of HBM3 memory and delivers up to 3x the performance of the A100.
Example:
"The H100's FP8 support cuts our training time in half compared to the A100."
HBM (High Bandwidth Memory)
A type of memory interface for 3D-stacked DRAM that provides significantly higher bandwidth than traditional GDDR memory. HBM3 in the H100 provides 3TB/s of bandwidth.
I
Inference
The process of using a trained neural network model to make predictions on new data. Generally requires less compute power than training.
Example:
"For inference workloads, a single A100 can serve 100+ concurrent users."
L
LLM (Large Language Model)
AI models with billions of parameters trained on vast text datasets. Examples include GPT-4, LLaMA, and Claude. Requires significant GPU resources for training and inference.
Example:
"Training a 70B parameter LLM requires at least 16x A100 80GB GPUs."
M
Mixed Precision Training
A technique that uses both FP16 and FP32 computations to accelerate training while maintaining model accuracy. Can provide 2-3x speedup on modern GPUs.
N
NVLink
NVIDIA's high-speed interconnect technology for direct GPU-to-GPU communication. NVLink 4.0 in H100 provides 900 GB/s bidirectional bandwidth.
P
Parameters
The learnable weights in a neural network. Model size is often described by parameter count (e.g., 7B, 70B, 175B parameters).
Example:
"Each billion parameters requires roughly 2GB of memory in FP16 format."
Q
Quantization
The process of reducing the precision of model weights and activations (e.g., from FP16 to INT8) to reduce memory usage and increase inference speed.
Example:
"4-bit quantization allows us to run a 70B model on a single A100 40GB."
S
Spot Instance
Unused cloud compute capacity available at significant discounts (up to 90% off) but can be interrupted with short notice.
Example:
"We use spot instances for batch inference jobs, saving 70% on GPU costs."
T
Tensor Cores
Specialized processing units in NVIDIA GPUs designed for matrix multiplication operations, providing significant acceleration for AI workloads.
TFLOPS
[tee-flops]
Trillion (10^12) Floating Point Operations Per Second. Common metric for measuring GPU compute performance.
TPU
[tee-pee-you]
Tensor Processing Unit - Google's custom-developed ASIC specifically designed for neural network machine learning. TPU v4 offers comparable performance to A100.
Example:
"Google Cloud's TPU v4 pods can deliver up to 1.1 exaflops of compute."
V
VRAM
[vee-ram]
Video Random Access Memory - The dedicated high-speed memory on a GPU. The amount of VRAM determines the maximum model size and batch size you can process.
Example:
"A 70B parameter model requires at least 140GB of VRAM in FP16, so you need multiple GPUs."