SCG News

Augmenting Type Inference with Lightweight Heuristics

Nevena Milojković. Augmenting Type Inference with Lightweight Heuristics. PhD thesis, University of Bern, June 2017. Details.

Abstract

Static type information facilitates program comprehension and analysis. Yet, such information is absent in dynamically-typed languages, and that increases the time needed for software maintenance. Type inference algorithms may provide type information to developers, but in order to be fast and assist in development phase, they sacrifice the precision for speed. One of the biggest obstacles for their precision is polymorphism presence. In this thesis we first analyse the prevalence of polymorphism in object-oriented software, to assess the criticality it imposes on simple type inference. We find that polymorphism is omnipresent in object-oriented code, and that static analysis in dynamically-typed languages is also hampered by the usage of cross-hierarchy polymorphism, i.e., duck typing. As this big obstacle for static code analysis cannot be bypassed, we propose the need for lightweight heuristics to tackle the problem of imprecision of simple type inference algorithms. Four lightweight heuristics are employed to improve the performance of two simple and fast type inference approaches. These heuristics are founded on the source code and run-time information that are easy to collect without interrupting the workflow of a developer. The heuristics are evaluated and compared with the underlying algorithms based on their inference time and precision. All of them show a significant improvement when compared to the basic algorithm. They introduce a negligible overhead on the inference time, thus we deem them usable during regular coding tasks.

Posted by scg at 5 July 2017, 6:15 pm comment link

An empirical investigation into the usage of a live debugger

Roger Stebler. An empirical investigation into the usage of a live debugger. Masters thesis, University of Bern, June 2017. Details.

Abstract

Reasoning about the run-time behavior of software applications is a challenging task. Live debuggers are essential tools that developers use in combination with other tools from an IDE to find and fix problematic behavior. Nevertheless, if developers use the debugger the wrong way or the debugger does not provide adequate features, it will complicate the debugging process. To explore these issues and help improve the debugging process, we perform an empirical investigation into how developers fix a given bug in an unknown piece of code. We focus our investigation on how developers use the debugger, and on how domain-specific information helps developers during the debugging process. Towards this goal, we design the experiment as a between-group study with 10 participants. We collected and analyzed 6 hours of recordings. By analyzing them we observed that: (i) developers use different strategies to find the cause of a bug; those who successfully solved the given bug observed the live behavior of the application while stepping in the debugger; (ii) domain-specific information helps developers if they are able to find it, especially when dealing with unfamiliar libraries, and (iii) finding and using domain-specific information is not always straightforward unless the information is shown by default. Based on these observations we propose several improvements to the debugger, such as visually highlighting domain-specific information in the debugger, and automatically executing previous debugging actions to avoid their repetition.

Posted by scg at 29 June 2017, 3:15 pm comment link

Inferring schemata from semi-structured data with Formal Concept Analysis

Jan Luca Liechti. Inferring schemata from semi-structured data with Formal Concept Analysis. Bachelor’s thesis, University of Bern, May 2017. Details.

Abstract

Semi-structured data do not conform to the schematic rigor of relational databases, but still present their content in a structured way. They are described as self-describing data, because they provide a schema for every record, for example as XML elements or JSON keywords. We are interested in inferring a relational schema from such semi-structured data that both preserves semantic information, i.e. keeps similar records together, and requires as little extra space as possible. We are operating under the assumption that we do not know records’ true types. We employ well-established notions of Formal Concept Analysis and use them in ways similar to basic operations on graphs and automata. Specifically, we create a formal context from the data where records are objects and tags are attributes and compute its concept lattice. Based on the assumption that semantically similar records are also structurally similar, i.e. have a similar set of attributes, we designed and implemented an algorithm that iteratively performs updates on the lattice in order to obtain a partition of the data ful lling the above mentioned criteria. In tests with real-life data, we obtain good results for datasets that are already highly structured — that contain few outliers and many structurally equivalent records — and mixed results at best for very diverse datasets.

Posted by scg at 29 May 2017, 12:15 pm comment link

Recognising structural patterns in code — A parser based approach

Mathias Fuchs. Recognising structural patterns in code — A parser based approach. Bachelor’s thesis, University of Bern, May 2017. Details.

Abstract

Software complexity increases over time. This makes the analysis of systems increasingly difficult. If we want to analyze a software system and focus on the structure, we require the creation of models. Agile Modeling tries to simplify this task. However, to create these models we need a parser for the source. This parser is sometimes missing or it requires a great deal of effort to create, especially when we are confronted with legacy code, unknown data formats, unknown domain specific languages and sometimes files with mixed languages and log files. To be able to model fast, in the spirit of Agile Modeling, we need to build parsers fast. In this thesis we therefore investigate the possibility to automatically infer parsers of data serialization formats (e.g., JSON, XML) and output the structure of the given source. To do this we created five grammar based building blocks (List, String, KeyValuePair, Command and Tag). These blocks can be combined in various ways to create the needed parser. This raises writing parsers one level higher, away from the "token" level and also enables us to automate the process. We can successfully infer structure of formats such as JSON, XML and CSS. Our approach works particularly well on XML files, with an average f-measure of 1. However it sometimes struggles with CSS, with an average f-measure of 0.88. This is due a problem with the building blocks. They are based on island grammars and ignore the parts of the code that we do not care about. However it does not skip all the code that we want it to skip.

Posted by scg at 16 May 2017, 12:15 pm comment link

Improving the Precision of Type Inference Algorithms with Lightweight Heuristics

Nevena Milojković. Improving the Precision of Type Inference Algorithms with Lightweight Heuristics. In SATToSE’17: Pre-Proceedings of the 10th International Seminar Series on Advanced Techniques & Tools for Software Evolution, June 2017. Details.

Abstract

Dynamically-typed languages allow faster software development by not posing the type constraints. Static type information facilitates program comprehension and software maintenance. Type inference algorithms attempt to reconstruct the type information from the code, yet they suffer from the problem of false positives or false negatives. The use of complex type inference algorithms is questionable during the development phase, due to their performance costs. Instead, we propose lightweight heuristics to improve simple type inference algorithms and, at the same time, preserve their swiftness.

Posted by scg at 9 May 2017, 9:59 am comment link
<< 1 2 3 4 5 6 7 8 9 10 >>
Last changed by oscar on 29 March 2017