#evaluation 共 5 个条目 讲座 (1) L11: Evaluation 论文 (4) AlpacaEval: An Automatic Evaluator for Instruction-Following Language Models Holistic Evaluation of Language Models Measuring Massive Multitask Language Understanding Challenges and Opportunities in NLP Benchmarking