Identification of the class responsibility information from the Pharo class comments

Introduction

Comments in Pharo are expressed in a very different way compared to other programming languages. Writing documentation in other programming languages annotations and pseudo-English is used mostly omitting the subject of the sentence. Whereas in Pharo, the text is written freely in the comment.

Problem

Identifying a certain kind of information in the comment is a challenging task due to the lack of a fixed structure of the comment. Although the default template is present in Pharo, very few comments adhere to the template. So information is scattered all over the comment.

Example

A class named "MouseClickState" describe its responsibility with a sentence: "I manage the distinction between clicks, double clicks, and drag operations."

In Pharo class-comments, a sentence is composed as we express in English using SVO. In most of the cases, a class is referred like first person entity ā€œIā€ and then all the information present in the comment is referred in a similar way. There are lots of information available in class comments like what responsibilities a class have, what a class knows, whom it collaborates with, and code snippet as an example to tell the implementation details.

Aim

Our aim is to perform a pilot study to identify the patterns for the responsibility of a class from the comment and extract it. This is helpful in particular to highlight the important information from the comment and find inconsistent information in the comment.

Steps

  • Finding the patterns to identify the responsibility of a class from the comment like which words/verbs describe the responsibility mostly.
  • Study of NL(natural language) methods to extract various information from the text.
  • Extraction of subjects (nouns), actions (verbs), and their relations using dependency parsing.
  • Constructing NL(natural language) heuristics to identify the sentence that describe the responsibility of the class.
  • Evaluating the approach on the comment dataset.
Last changed by pooja on 16 February 2019