Analyze Kernel Bottleneck

- `pipeline-gpu-kernel` -- implement software pipelining with cp.async when analysis identifies a memory-bound kernel with low compute/load ratio

Community View Source →

LLM Evaluation

Evaluated by: xiaomi/mimo-v2-flash:free

Last evaluated: May 17, 2026

Prompt Quality

3.0 /5

Evaluation error: RetryError[]

Usefulness

3.0 /5

Evaluation error: RetryError[]

Overall Rating

3.0 /5

Evaluation failed

Prompt Preview

---
name: analyze-kernel-bottleneck
description: >
  Systematically identify whether a GPU kernel is compute-bound, memory-bound,
  or latency-bound using roofline analysis, occupancy calculations, compute/load
  ratio per tile, and SASS instruction inspection. Produces a decision matrix
  for optimization strategy selection (cp.async, warp interleaving, tiling,
  double-buffering, or CuAssembler hand-tuning).
license: MIT
allowed-tools: Read Grep Glob Bash
metadata:
  author: Philipp Thoss
  ve...

Full prompt length: 17023 characters