Roy Bar-Haim

Title

Senior Technical Staff Member, AI Agent Evaluation
Roy Bar-Haim

Bio

I am a Senior Technical Staff Member (STSM) at IBM Research - Israel, working on Natural Language Processing (NLP). My team is developing novel methods for evaluation of LLMs and AI agents. See our survey paper and IJCAI 2025 tutorial on agent evaluation. I have also been working on metrics and models for evaluating AI systems (specifically, RAG), and evaluating LLMs-as-judges and their cognitive capabilities.

I have initiated and led the development of Key Point Summarization, an innovative paradigm for quantitative summarization of opinions, which is now part of watsonx.ai Studio. I also co-led the development of an AI-assisted system for cloud compliance (2021-2023).

Before that, I was part of the team that developed Project Debater, The first AI system to debate humans experts (2013-2019). I led a research team in several IBM Research labs that developed some of core components in this project. I presented a tutorial on Advances in Debating Technologies in AACL-IJCNLP 2020 (with Liat Ein-Dor, Yonatan Bilu and Noam Slonim),  ACL-IJCNLP 2021 and IJCAI 2021 (with Liat Ein-Dor, Matan Orbach, Elad Venezian and Noam Slonim). Slides can be found here.

I hold a Ph.D in Computer Science from Bar-Ilan University (2010). I received my B.Sc (Summa Cum Laude) and M.Sc (Cum Laude) in Computer Science from the Technion, Israel Institute of Technology (1996, 2005). In between, I served as a software development officer and a senior team lead at the Israel Defense Forces (1996-2002). Before joining IBM, I led NLP teams in two startup companies (2010-2013).

Academic Activities

  • Tutorial presenter: AACL 2020, ACL 2021, IJCAI 2021, IJCAI 2025
  • Co-organizer, The 11th Workshop on Argument Mining (ArgMining 2024)
  • Conference area chair: ACL Rolling Review (ARR), IJCAI 2024, IJCAI 2023, ACL 2023 Industry Track, NAACL 2021, COLING 2016
  • Conference reviewer/PC member: AAAI, ACL (Outstanding Reviewer Award 2023), ARR, EMNLP, COLING, CoNLL, ICWSM, LREC
  • Workshop PC member: Argument Mining; Benchmarking: Past, Present and Future; Noisy User-generated Text
  • Journal reviewer: Transactions of the Association for Computational Linguistics (TACL), Natural Language Engineering (NLE), Computer, Speech & Language, ACM Transactions on Intelligent Systems and Technology (ACM-TIST), Knowledge-Based Systems
  • Reviewer for the Israel Science Foundation

Blog posts

Top collaborators

YK
Yoav Katz

Yoav Katz

Manager, Language Model Utilization and Evaluation
MO
Matan Orbach

Matan Orbach

Research Staff Member - Machine Learning and NLP
AK
Arun Kumar

Arun Kumar

Senior Technical Staff Member, IBM Research - Hybrid Cloud, Member IBM Academy of Technology