Big Software Data

Recently there has been a surge of interest in pushing software analysis beyond the level of individual systems, one of the reasons being the new availability of data. Indeed software is entering the age of big data, which is characterized by increasing volume (amount of software), velocity (speed of software generation), and variety (range of data sources).

SCG carries out several research projects in the realm of big software data. Some of our concerns are:

  • Analyzing entire software ecosystems
  • Analyzing large corpora of software systems (see The Famix Corpus)
  • Detecting code duplication

Some of our work on big software data is captured in our publications:

  1. Mircea Lungu, Romain Robbes, and Michele Lanza. Recovering Inter-Project Dependencies in Software Ecosystems. In ASE'10: Proceedings of the 25th IEEE/ACM International Conference on Automated Software Engineering, ACM Press, 2010. DOI PDF 
  2. Mircea Lungu, Oscar Nierstrasz, and Niko Schwarz. Big Software Data Analysis. In ERCIM News 89, April 2012. URL 
  3. Haidar Osman, Mircea Lungu, and Oscar Nierstrasz. Mining frequent bug-fix code changes. In Software Maintenance, Reengineering and Reverse Engineering (CSMR-WCRE), 2014 Software Evolution Week - IEEE Conference on, p. 343-347, February 2014. DOI PDF 
  4. Romain Robbes and Mircea Lungu. A Study of Ripple Effects in Software Ecosystems (NIER). In Proceedings of the 33rd International Conference on Software Engineering (ICSE 2011), p. 904-907, May 2011. DOI PDF 
  5. Sandro Schulze and Niko Schwarz. How to Make the Hidden Visible — Code Clone Presentation Revisited. In Rainer Koschke, Ira D. Baxter, Michael Conradt, and James R. Cordy (Ed.), Software Clone Management Towards Industrial Application (Dagstuhl Seminar 12071), 2 p. 35—38, Schloss Dagstuhl—Leibniz-Zentrum fuer Informatik, June 2012. PDF 
  6. Niko Schwarz, Mircea Lungu, and Romain Robbes. On how often code is cloned across repositories. In Proceedings of the 2012 International Conference on Software Engineering, ICSE 2012 p. 1289—1292, IEEE Press, Piscataway, NJ, USA, 2012. DOI PDF 
  7. Niko Schwarz. Hot clones: Combining search-driven development, clone management, and code provenance. In 2012 34th International Conference on Software Engineering (ICSE), p. 1628—1629, IEEE, June 2012. DOI PDF 
  8. Niko Schwarz. Hot Clones: A Shotgun Marriage of Search-Driven Development and Clone Management. In 2012 16th European Conference on Software Maintenance and Reengineering, 0 p. 513—515, IEEE, Los Alamitos, CA, USA, March 2012. DOI URL 
  9. Boris Spasojević, Mircea Lungu, and Oscar Nierstrasz. Towards Faster Method Search Through Static Ecosystem Analysis. In Proceedings of the 2014 European Conference on Software Architecture Workshops, ECSAW '14 p. 11:1—11:6, ACM, New York, NY, USA, August 2014. DOI PDF 
  10. Boris Spasojević, Mircea Lungu, and Oscar Nierstrasz. Overthrowing the Tyranny of Alphabetical Ordering in Documentation Systems. In 2014 IEEE International Conference on Software Maintenance and Evolution (ERA Track), p. 511-515, September 2014. DOI PDF 
  11. Boris Spasojević, Mircea Lungu, and Oscar Nierstrasz. Mining the Ecosystem to Improve Type Inference For Dynamically Typed Languages. In Proceedings of the 2014 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software, Onward! '14 p. 133—142, ACM, New York, NY, USA, 2014. DOI PDF 

Last changed by admin on 21 April 2009