LLM-as-Judge

Why use it

Human evaluation is the gold standard but does not scale. LLM-as-judge approximates human judgment for subjective qualities (helpfulness, coherence, faithfulness) cheaply and at volume — making it the backbone of modern LLM evals.

Good practice for the rubric

Be specific: “rate 1-5 for factual grounding in the provided context” beats “rate quality”.
Provide anchor examples for each score.
Prefer pairwise comparison (A vs B) when absolute scoring is noisy.

Why use it

Good practice for the rubric

Grafo de conocimiento