Debatable Intelligence: Benchmarking LLM Judges via Debate Speech EvaluationNoy SternlichtAriel Geraet al.2025EMNLP 2025