Makar: A Framework for Multi-source Studies based on Unstructured Data

Mathias Birrer, Pooja Rani, Sebastiano Panichella, and Oscar Nierstrasz. Makar: A Framework for Multi-source Studies based on Unstructured Data. In 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), () p. 577-581, 2021. Details.


To perform various development and maintenance tasks, developers frequently seek information on various sources such as mailing lists, Stack Overflow (SO), and Quora. Researchers analyze these sources to understand developer information needs in these tasks. However, extracting and preprocessing unstructured data from various sources, building and maintaining a reusable dataset is often a time-consuming and iterative process. Additionally, the lack of tools for automating this data analysis process complicates the task to reproduce previous results or datasets.To address these concerns we propose Makar, which provides various data extraction and preprocessing methods to support researchers in conducting reproducible multi-source studies. To evaluate Makar, we conduct a case study that analyzes code comment related discussions from SO, Quora, and mailing lists. Our results show that Makar is helpful for preparing reproducible datasets from multiple sources with little effort, and for identifying the relevant data to answer specific research questions in a shorter time compared to state-of-the-art tools, which is of critical importance for studies based on unstructured data. Tool webpage:

Posted by scg at 24 May 2021, 8:15 pm link
Last changed by admin on 21 April 2009