LLM Evaluation

Evaluated by: xiaomi/mimo-v2-flash:free

Last evaluated: March 29, 2026

Prompt Quality

3.0 /5

Evaluation error: RetryError[]

Usefulness

3.0 /5

Evaluation error: RetryError[]

Overall Rating

3.0 /5

Evaluation failed

Prompt Preview

---
name: run-ab-test-models
description: >
  Design and execute A/B tests for ML models in production using traffic splitting,
  statistical significance testing, and canary/shadow deployment strategies. Measure
  performance differences and make data-driven decisions about model rollout. Use when
  validating a new model version before full rollout, comparing candidate models trained
  with different algorithms, measuring business metric impact of model changes, or when
  regulatory requiremen...

Full prompt length: 8134 characters

Tools & Technologies

  • python