Recognising structural patterns in code — A parser based approach

Mathias Fuchs. Recognising structural patterns in code — A parser based approach. Bachelor’s thesis, University of Bern, May 2017. Details.

Abstract

Software complexity increases over time. This makes the analysis of systems increasingly difficult. If we want to analyze a software system and focus on the structure, we require the creation of models. Agile Modeling tries to simplify this task. However, to create these models we need a parser for the source. This parser is sometimes missing or it requires a great deal of effort to create, especially when we are confronted with legacy code, unknown data formats, unknown domain specific languages and sometimes files with mixed languages and log files. To be able to model fast, in the spirit of Agile Modeling, we need to build parsers fast. In this thesis we therefore investigate the possibility to automatically infer parsers of data serialization formats (e.g., JSON, XML) and output the structure of the given source. To do this we created five grammar based building blocks (List, String, KeyValuePair, Command and Tag). These blocks can be combined in various ways to create the needed parser. This raises writing parsers one level higher, away from the "token" level and also enables us to automate the process. We can successfully infer structure of formats such as JSON, XML and CSS. Our approach works particularly well on XML files, with an average f-measure of 1. However it sometimes struggles with CSS, with an average f-measure of 0.88. This is due a problem with the building blocks. They are based on island grammars and ignore the parts of the code that we do not care about. However it does not skip all the code that we want it to skip.

Posted by scg at 16 May 2017, 12:15 pm link
Last changed by admin on 21 April 2009