Identification of the class responsibility information from the Pharo class comments
Introduction
Comments in Pharo are expressed in a very different way compared to other programming languages. Writing documentation in other programming languages annotations and pseudo-English is used mostly omitting the subject of the sentence. Whereas in Pharo, the text is written freely in the comment.
Problem
Identifying a certain kind of information in the comment is a challenging task due to the lack of a fixed structure of the comment. Although the default template is present in Pharo, very few comments adhere to the template. So information is scattered all over the comment.
Example
A class named "MouseClickState" describe its responsibility with a sentence: "I manage the distinction between clicks, double clicks, and drag operations."
In Pharo class-comments, a sentence is composed as we express in English using SVO. In most of the cases, a class is referred like first person entity āIā and then all the information present in the comment is referred in a similar way. There are lots of information available in class comments like what responsibilities a class have, what a class knows, whom it collaborates with, and code snippet as an example to tell the implementation details.
Aim
Our aim is to perform a pilot study to identify the patterns for the responsibility of a class from the comment and extract it. This is helpful in particular to highlight the important information from the comment and find inconsistent information in the comment.
Steps
- Finding the patterns to identify the responsibility of a class from the comment like which words/verbs describe the responsibility mostly.
- Study of NL(natural language) methods to extract various information from the text.
- Extraction of subjects (nouns), actions (verbs), and their relations using dependency parsing.
- Constructing NL(natural language) heuristics to identify the sentence that describe the responsibility of the class.
- Evaluating the approach on the comment dataset.