Recent
The Machine Learning Surgeon's Guide to Quantization: Precision Cuts for Smarter Models
26 mins
Quantization
Inference
Optimization
The Operating Room Setup
9 mins
Setup
Cuda
Cpp
Libtorch
Dissecting torch.compile: Surgical Precision in PyTorch Optimization
23 mins
Torch-Compile
Compiler
A quick incision: ten minutes to RAG
8 mins
Rag
Llm
Vector-Db
Performing Kernel Surgery: Profiling CUDA Kernels with NVIDIA Nsight Compute
9 mins
Cuda
Profiling
Optimization
A Machine Learning Surgeon’s Toolkit: Advanced Matrix Multiplication in CUDA
16 mins
Cuda
Gpu
Optimization
Matrix-Multiplication