Big Software Data

Recently there has been a surge of interest in pushing software analysis beyond the level of individual systems, one of the reasons being the new availability of data. Indeed software is entering the age of big data, which is characterized by increasing volume (amount of software), velocity (speed of software generation), and variety (range of data sources).

SCG carries out several research projects in the realm of big software data. Some of our concerns are:

Analyzing entire software ecosystems
Analyzing large corpora of software systems (see The Famix Corpus)
Detecting code duplication

Some of our work on big software data is captured in our publications:

Papers