Issue Report Assessment — Assessment of Issue Report Quality and Class through Natural Language Processing

Simon Curty. Issue Report Assessment — Assessment of Issue Report Quality and Class through Natural Language Processing. Bachelor’s thesis, University of Bern, February 2018. Details.


Issue reports are a crucial aspect of software development. They enable collaboration and communication of faults and task. However, issue reports are only helpful if they provide meaningful information, which is often not the case. Such low-quality issue reports induce a multitude of problems: Developers lose time figuring out the issue. Both time needed to complete the issue and its priority are harder to estimate. Assigning issues to the right developer is more difficult. To alleviate these negative impacts, commonly a triager is taking the responsibility to assess issue reports. However, this process is time consuming and cannot fill in missing information. Since issue reports provide insight into a projects changes and faults over time, they are an often used data source for bug prediction, that is the inference of knowledge about past and future bugs. But the quality of such predictions depends on the quality of the source data. We address two factors strongly affecting the quality of issue reports: (1) issue reports have an assigned type. When the assigned type does not reflect the true nature of the report, it is said to be misclassified. An issue report classified as bug may in reality be a feature request. This Misclassification introduces bias in bug prediction and makes it harder to assign an issue to the right developer. (2) Bug reports are commonly written by hand and the reporter does not always include important information. The quality of the content of a bug report impacts the time a developer needs to spend identifying the actual issue. To tackle the misclassification problem, we propose an approach to categorize issue reports. Our classifier achieves an accuracy of 82.9% when classifying issue reports into bugs and non-bugs. For multiclass classification of issue reports by type, our model achieves an accuracy of 74.4%. Thus, our model can reliably validate datasets used for bug prediction and distinguish between issue types. To address the second problem, we propose an approach to estimate the quality of bug reports and assign them a score. This Quality Estimator is capable of giving improvement suggestions, potentially helping reporters to write more helpful bug reports. To showcase real world applications of our models we integrated both the classifier model and the Quality Estimator into a tool for issue report assessment. The tool is implemented as browser extension and allows the user to get feedback about the type of an issue report with one click. The tool is also capable of providing suggestions on how to improve a bug report based on the information already provided.

Posted by scg at 8 February 2018, 6:15 pm link
Last changed by admin on 21 April 2009