SCG News

Big Commit Analysis — Towards an Infrastructure for Commit Analysis

Andreas Hohler. Big Commit Analysis — Towards an Infrastructure for Commit Analysis. Masters thesis, University of Bern, January 2018. Details.


Developers commit changes to the code base of a certain project in order to, for instance, fix bugs, add features, or refactor the code. In empirical studies, researchers often need to link commits with issues in issue trackers to audit the purpose of code changes. Unfortunately, there exists no general-purpose tool that can fulfill this need for different studies. For instance, while in theory each commit should serve one purpose, in practice developers may include several goals in one commit. Also, issues in issue trackers are often miscategorized. We present BICO (BIg COmmit analyzer), a tool that links the source code management system with the issue tracker. BICO presents information in a navigable form in order to make it easier to analyze and reason about the evolution of a certain project. It takes advantage of the fact that developers include issue IDs in commit messages to link them together. BICO also provides dedicated analytics to detect big commits, i.e., multi-purpose and miscategorized commits, using statistical outlier detection. In an initial evaluation, we use BICO to analyze bug-fix commits in Apache Kafka, where our tool reports 9.6% of the bug-fixing commits as miscategorized or multi-purpose commits with a precision of 85%. This high precision demonstrates the applicability of the outlier detection method implemented in BICO. A further case study with Apache Storm shows that the precision of detecting multi-purpose commits can vary between projects. In addition, BICO also comes with a built-in metric suite extractor for calculating change metrics, source code metrics and defect counts.

Posted by scg at 15 January 2018, 2:15 pm comment link

Mining Inline Cache Data to Order Inferred Types in Dynamic Languages

Nevena Milojković, Clément Béra, Mohammad Ghafari, and Oscar Nierstrasz. Mining Inline Cache Data to Order Inferred Types in Dynamic Languages. In Science of Computer Programming, Elsevier, Special Issue on Adv. Dynamic Languages, 2017. Details.


The lack of static type information in dynamically-typed languages often poses obstacles for developers. Type inference algorithms can help, but inferring precise type information requires complex algorithms that are often slow. A simple approach that considers only the locally used interface of variables can identify potential classes for variables, but popular interfaces can generate a large number of false positives. We propose an approach called inline-cache type inference (ICTI) to augment the precision of fast and simple type inference algorithms. ICTI uses type information available in the inline caches during multiple software runs, to provide a ranked list of possible classes that most likely represent a variable’s type. We evaluate ICTI through a proof-of-concept that we implement in Pharo Smalltalk. The analysis of the top-n+2 inferred types (where n is the number of recorded run-time types for a variable) for 5486 variables from four different software systems shows that ICTI produces promising results for about 75% of the variables. For more than 90% of variables, the correct run-time type is present among first six inferred types. Our ordering shows a twofold improvement when compared with the unordered basic approach, i.e., for a significant number of variables for which the basic approach offered ambiguous results, ICTI was able to promote the correct type to the top of the list.

Posted by scg at 15 December 2017, 3:15 pm comment link

Efficient parsing with parser combinators

Jan Kurš, Jan Vraný, Mohammad Ghafari, Mircea Lungu, and Oscar Nierstrasz. Efficient parsing with parser combinators. In Science of Computer Programming () p. -, 2017. To appear. Details.


Abstract Parser combinators offer a universal and flexible approach to parsing. They follow the structure of an underlying grammar, are modular, well-structured, easy to maintain, and can recognize a large variety of languages including context-sensitive ones. However, these advantages introduce a noticeable performance overhead mainly because the same powerful parsing algorithm is used to recognize even simple languages. Time-wise, parser combinators cannot compete with parsers generated by well-performing parser generators or optimized hand-written code. Techniques exist to achieve a linear asymptotic performance of parser combinators, yet there is a significant constant multiplier. The multiplier can be lowered to some degree, but this requires advanced meta-programming techniques, such as staging or macros, that depend heavily on the underlying language technology. In this work we present a language-agnostic solution. We optimize the performance of parsing combinators with specializations of parsing strategies. For each combinator, we analyze the language parsed by the combinator and choose the most efficient parsing strategy. By adapting a parsing strategy for different parser combinators we achieve performance comparable to that of hand-written or optimized parsers while preserving the advantages of parsers combinators.

Posted by scg at 15 December 2017, 10:15 am comment link

Empirically-Grounded Construction of Bug Prediction and Detection Tools

Haidar Osman. Empirically-Grounded Construction of Bug Prediction and Detection Tools. PhD thesis, University of Bern, December 2017. Details.


There is an increasing demand on high-quality software as software bugs have an economic impact not only on software projects, but also on national economies in general. Software quality is achieved via the main quality assurance activities of testing and code reviewing. However, these activities are expensive, thus they need to be carried out efficiently. Auxiliary software quality tools such as bug detection and bug prediction tools help developers focus their testing and reviewing activities on the parts of software that more likely contain bugs. However, these tools are far from adoption as mainstream development tools. Previous research points to their inability to adapt to the peculiarities of projects and their high rate of false positives as the main obstacles of their adoption. We propose empirically-grounded analysis to improve the adaptability and efficiency of bug detection and prediction tools. For a bug detector to be efficient, it needs to detect bugs that are conspicuous, frequent, and specific to a software project. We empirically show that the null-related bugs fulfill these criteria and are worth building detectors for. We analyze the null dereferencing problem and find that its root cause lie in methods that return null. We propose an empirical solution this problem that depends on the wisdom of the crowd. For each API method, we extract the nullability measure that expresses how often the return value of this method is checked against null in the ecosystem of the API. We use nullability to annotate API methods with nullness annotation and warn developers about missing and excessive null checks. For a bug predictor to be efficient, it needs to be optimized as both a machine learning model and a software quality tool. We empirically show how feature selection and hyperparameter optimizations improve prediction accuracy. Then we optimize bug prediction to locate the maximum number of bugs in the minimum amount of code by finding the most cost-effective combination of bug prediction configurations, i.e. dependent variables, machine learning model, and response variable. We show that using both source code and change metrics as dependent variables, applying feature selection on them, then using an optimized Random Forest to predict the number of bugs results in the most cost-effective bug predictor. Throughout this thesis, we show how empirically-grounded analysis helps us achieve efficient bug prediction and detection tools and adapt them to the characteristics of each software project.

Posted by scg at 2 December 2017, 10:15 am comment link

Towards Actionable Visualization for Software Developers

Leonel Merino, Mohammad Ghafari, and Oscar Nierstrasz. Towards Actionable Visualization for Software Developers. In Journal of Software: Evolution and Process, 2018. To appear. Details.


Abundant studies have shown that visualization is advantageous for software developers, yet adopting visualization during software development is not a common practice due to the large effort involved in finding an appropriate visualization. Developers require support to facilitate that task. Among 368 papers in SOFTVIS/VISSOFT venues, we identify 86 design study papers about the application of visualization to relieve concerns in software development. We extract from these studies the task, need, audience, data source, representation, medium and tool; and we characterize them according to the subject, process and problem domain. On the one hand, we support software developers to put visualization in action by mapping existing visualization techniques to particular needs from different perspectives. On the other hand, we highlight the problem domains that are overlooked in the field and need more support.

Posted by scg at 27 November 2017, 8:15 pm comment link
<< 1 2 3 4 5 6 7 8 9 10 >>
Last changed by scg on 14 August 2017