SCG News

Inferring schemata from semi-structured data with Formal Concept Analysis

Jan Luca Liechti. Inferring schemata from semi-structured data with Formal Concept Analysis. Bachelor’s thesis, University of Bern, May 2017. Details.


Semi-structured data do not conform to the schematic rigor of relational databases, but still present their content in a structured way. They are described as self-describing data, because they provide a schema for every record, for example as XML elements or JSON keywords. We are interested in inferring a relational schema from such semi-structured data that both preserves semantic information, i.e. keeps similar records together, and requires as little extra space as possible. We are operating under the assumption that we do not know records’ true types. We employ well-established notions of Formal Concept Analysis and use them in ways similar to basic operations on graphs and automata. Specifically, we create a formal context from the data where records are objects and tags are attributes and compute its concept lattice. Based on the assumption that semantically similar records are also structurally similar, i.e. have a similar set of attributes, we designed and implemented an algorithm that iteratively performs updates on the lattice in order to obtain a partition of the data ful lling the above mentioned criteria. In tests with real-life data, we obtain good results for datasets that are already highly structured — that contain few outliers and many structurally equivalent records — and mixed results at best for very diverse datasets.

Posted by scg at 29 May 2017, 12:15 pm comment link

Recognising structural patterns in code — A parser based approach

Mathias Fuchs. Recognising structural patterns in code — A parser based approach. Bachelor’s thesis, University of Bern, May 2017. Details.


Software complexity increases over time. This makes the analysis of systems increasingly difficult. If we want to analyze a software system and focus on the structure, we require the creation of models. Agile Modeling tries to simplify this task. However, to create these models we need a parser for the source. This parser is sometimes missing or it requires a great deal of effort to create, especially when we are confronted with legacy code, unknown data formats, unknown domain specific languages and sometimes files with mixed languages and log files. To be able to model fast, in the spirit of Agile Modeling, we need to build parsers fast. In this thesis we therefore investigate the possibility to automatically infer parsers of data serialization formats (e.g., JSON, XML) and output the structure of the given source. To do this we created five grammar based building blocks (List, String, KeyValuePair, Command and Tag). These blocks can be combined in various ways to create the needed parser. This raises writing parsers one level higher, away from the "token" level and also enables us to automate the process. We can successfully infer structure of formats such as JSON, XML and CSS. Our approach works particularly well on XML files, with an average f-measure of 1. However it sometimes struggles with CSS, with an average f-measure of 0.88. This is due a problem with the building blocks. They are based on island grammars and ignore the parts of the code that we do not care about. However it does not skip all the code that we want it to skip.

Posted by scg at 16 May 2017, 12:15 pm comment link

Improving the Precision of Type Inference Algorithms with Lightweight Heuristics

Nevena Milojković. Improving the Precision of Type Inference Algorithms with Lightweight Heuristics. In SATToSE’17: Pre-Proceedings of the 10th International Seminar Series on Advanced Techniques & Tools for Software Evolution, June 2017. Details.


Dynamically-typed languages allow faster software development by not posing the type constraints. Static type information facilitates program comprehension and software maintenance. Type inference algorithms attempt to reconstruct the type information from the code, yet they suffer from the problem of false positives or false negatives. The use of complex type inference algorithms is questionable during the development phase, due to their performance costs. Instead, we propose lightweight heuristics to improve simple type inference algorithms and, at the same time, preserve their swiftness.

Posted by scg at 9 May 2017, 9:59 am comment link

Exception Evolution in Long-lived Java Systems

Haidar Osman, Andrei Chiş, Claudio Corrodi, Mohammad Ghafari, and Oscar Nierstrasz. Exception Evolution in Long-lived Java Systems. In Proceedings of the 14th International Conference on Mining Software Repositories, MSR ’17, 2017. Details.


Exception handling allows developers to deal with abnormal situations that disrupt the execution flow of a program. There are mainly three types of exceptions: standard exceptions provided by the programming language itself, custom exceptions defined by the project developers, and third-party exceptions defined in external libraries. We conjecture that there are multiple factors that affect the use of these exception types. We perform an empirical study on long-lived Java projects to investigate these factors. In particular, we analyze how developers rely on the different types of exceptions in throw statements and exception handlers. We confirm that the domain, the type, and the development phase of a project affect the exception handling patterns. We observe that applications have significantly more error handling code than libraries and they increasingly rely on custom exceptions. Also, projects that belong to different domains have different preferences of exception types. For instance, content management systems rely more on custom exceptions than standard exceptions whereas the opposite is true in parsing frameworks.

Posted by scg at 3 April 2017, 12:15 pm comment link

Call for PhD candidates in the Software Composition Group, U Bern

Applications are invited for PhD candidates at the Software Composition Group, University of Bern, Switzerland.

The Software Composition Group carries out research in software engineering and programming languages, with a view to enabling software evolution. The SCG is led by Prof. Oscar Nierstrasz and is part of the Institute of Computer Science at the University of Bern.

Applicants will contribute to the ongoing SNSF project, “Agile Software Analysis”, and towards the planned successor project:

The candidate must have a MSc in Computer Science (equivalent to a Swiss MSc), should demonstrate strong programming skills, and have research interests in several of the following areas:

  • software evolution
  • program understanding
  • dynamic analysis
  • static analysis
  • software modeling
  • model-driven engineering
  • secure software engineering
  • programming language design
  • domain specific languages
  • virtual machine technology

Female candidates are especially welcome to apply. To apply, please send an email including your research statement and your CV, with at least two references, to Prof. Oscar Nierstrasz (, by June 1, 2017.

Posted by Oscar Nierstrasz at 29 March 2017, 3:00 pm comment link
<< 1 2 3 4 5 6 7 8 9 10 >>
Last changed by oscar on 29 March 2017