Pipeline GPU Kernel

- `analyze-kernel-bottleneck` — identify whether the kernel is memory-bound and calculate the compute/load ratio that drives variant selection

Community View Source →

LLM Evaluation

Evaluated by: xiaomi/mimo-v2-flash:free

Last evaluated: May 17, 2026

Prompt Quality

3.0 /5

Evaluation error: RetryError[]

Usefulness

3.0 /5

Evaluation error: RetryError[]

Overall Rating

3.0 /5

Evaluation failed

Prompt Preview

---
name: pipeline-gpu-kernel
description: >
  Apply software pipelining (double-buffering) to a tiled GPU kernel to overlap
  global memory loads with Tensor Core computation. Covers prologue/loop/epilogue
  restructuring, LDG-register vs cp.async (LDGSTS) variant selection based on
  compute/load ratio, shared memory budget verification against architecture-specific
  occupancy cliffs, and SASS-level verification of load/compute overlap.
license: MIT
allowed-tools: Read Write Edit Bash Grep Gl...

Full prompt length: 18908 characters