↓Skip to main content

Posts

From Sequential to Parallel: Your Journey into GPU Programming with Triton

Gpu-Programming Triton Basics

The Transformer's Anatomy: A Deep Dive into the Architecture that Revolutionized Machine Learning

Transformers Dl

Move Fast or Die Slow

Strategy Business

The Machine Learning Surgeon's Guide to Quantization: Precision Cuts for Smarter Models

Quantization Inference Optimization

The Operating Room Setup

Setup Cuda Cpp Libtorch

Dissecting torch.compile: Surgical Precision in PyTorch Optimization

Torch-Compile Compiler

A quick incision: ten minutes to RAG

Rag Llm Vector-Db

Performing Kernel Surgery: Profiling CUDA Kernels with NVIDIA Nsight Compute

Cuda Profiling Optimization

A Machine Learning Surgeon’s Toolkit: Advanced Matrix Multiplication in CUDA

Cuda Gpu Optimization Matrix-Multiplication

Cerebral Cortex and Hippocampus: Understanding the Computational and Memory Design of GPUs

Gpu Architecture