Conference paper

Leveraging LLM Enhanced Commit Messages to Improve Machine Learning Based Test Case Prioritization

Abstract

In the rapidly evolving landscape of software development, software testing is critical for maintaining code quality and reducing defects. Effective test case prioritization employs techniques to identify defects early and ensure software quality. New avenues of research have explored using machine learning (ML) to automate the process, most current applications leverage a machine learning model using numerical features to prioritize the test cases. This study investigates the enhancement of this process by incorporating text-based features derived from git commit messages, which often include valuable information about code changes. Given that commit messages are often poorly written and inconsistent, we employ a large language model (LLM) to rewrite these messages based on code diffs, with the aim of improving the quality of their format and the information they contain. We then assess whether these refined commit messages, as an additional feature, contribute to better performance of the test case prioritization model. Our preliminary results indicate that the inclusion of LLM-enhanced commit messages leads to a noticeable improvement in prioritization effectiveness, suggesting a promising avenue for integrating natural language processing techniques in software testing workflows.