Leveraging LLM Enhanced Commit Messages to Improve Machine Learning Based Test Case Prioritization

Yara Mahmoud; Akramul Azim; Ramiro Liscano; Kevin Smith; Yee-Kang Chang; Gkerta Seferi; Qasim Tauseef

doi:10.1145/3727582.3728681

PROMISE 2025

Conference paper

26 Jun 2025

Leveraging LLM Enhanced Commit Messages to Improve Machine Learning Based Test Case Prioritization

View publication

Abstract

In the rapidly evolving landscape of software development, software testing is critical for maintaining code quality and reducing defects. Effective test case prioritization employs techniques to identify defects early and ensure software quality. New avenues of research have explored using machine learning (ML) to automate the process, most current applications leverage a machine learning model using numerical features to prioritize the test cases. This study investigates the enhancement of this process by incorporating text-based features derived from git commit messages, which often include valuable information about code changes. Given that commit messages are often poorly written and inconsistent, we employ a large language model (LLM) to rewrite these messages based on code diffs, with the aim of improving the quality of their format and the information they contain. We then assess whether these refined commit messages, as an additional feature, contribute to better performance of the test case prioritization model. Our preliminary results indicate that the inclusion of LLM-enhanced commit messages leads to a noticeable improvement in prioritization effectiveness, suggesting a promising avenue for integrating natural language processing techniques in software testing workflows.

Conference paper