SCG News

Detection of Cybersquatted Domains

Patrick Frischknecht. Detection of Cybersquatted Domains. Masters thesis, University of Bern, July 2021. Details.


Domain names, or short domains, are memorable identifiers for websites, however their affiliation is not always clear. Cybersquatters register domains that closely resemble existing ones or well known trademarks for their own profit and therefore misuse the trust of a brand. The focus of this thesis is to support security personnel in the accurate detection of cybersquatted domains. Our goal is to identify such domains that have been crafted in bad faith based on the content present on the website, and therefore effectively reduce the number of websites that would otherwise require a manual review. We developed a tool based on logo matching with image hashing that can, given a target domain, report cybersquatted domains in global-scale domain lists that consist of several hundred million entries. For our case study we selected the websites of nine well known luxury and apparel trademarks from the Forbes Top 100 most valuable brands list that we fed to our tool. We performed a manual evaluation on more than 5 000 reported websites to determine whether the automatically assigned label, harmless or malicious, was correct. We realized that cybersquatting is still a relevant issue for selected brands as they try to protect themselves against this threat. Furthermore, we could identify 1 433 domains that host malicious content, including 639 fake web shops. Finally, we realized that image hashing algorithms are preferably not used in such scenarios, because logos on squatted domains are altered in a way that causes large differences in their similarity scores although they remain visually similar. We conclude that logos are indeed a typical feature used in many websites of cybersquatted domains and that our tool can report domains missed by existing tools and services.

Posted by scg at 26 July 2021, 7:15 pm comment link

How to Identify Class Comment Types? A Multi-language Approach for Class Comment Classification

Pooja Rani, Sebastiano Panichella, Manuel Leuenberger, Andrea Di Sorbo, and Oscar Nierstrasz. How to Identify Class Comment Types? A Multi-language Approach for Class Comment Classification. In Journal of Systems and Software p. 111047, 2021. arXiv preprint arXiv:2107.04521. Details.


Most software maintenance and evolution tasks require developers to understand the source code of their software systems. Software developers usually inspect class comments to gain knowledge about program behavior, regardless of the programming language they are using. Unfortunately, (i) different programming languages present language-specific code commenting notations/guidelines; and (ii) the source code of software projects often lacks comments that adequately describe the class behavior, which complicates program comprehension and evolution activities. To handle these challenges, this paper investigates the different language-specific class commenting practices of three programming languages: Python, Java, and Smalltalk. In particular, we systematically analyze the similarities and differences of the information types found in class comments of projects developed in these languages. We propose an approach that leverages two techniques, namely Natural Language Processing and Text Analysis, to automatically identify various types of information from class comments i.e., the specific types of semantic information found in class comments. To the best of our knowledge, no previous work has provided a comprehensive taxonomy of class comment types for these three programming languages with the help of a common automated approach. Our results confirm that our approach can classify frequent class comment information types with high accuracy for Python, Java, and Smalltalk programming languages. We believe this work can help to monitor and assess the quality and evolution of code comments in different program languages, and thus support maintenance and evolution tasks.

Posted by scg at 26 July 2021, 11:15 am comment link

Finding and Mitigating Cross-Site Scripting Attack Vectors — Testing different Web Application Security Scanners

Rafael Burkhalter. Finding and Mitigating Cross-Site Scripting Attack Vectors — Testing different Web Application Security Scanners. Bachelor’s thesis, University of Bern, April 2021. Details.


The purpose of this thesis is to determine the efficacy and usability of different popular security scanners for web applications. The main focus lies on testing their ability to find cross-site scripting vulnerabilities, i.e. vulnerabilities arising when user input isn’t properly sanitized. To analyze the scanners various criteria are taken into account mainly completeness of the findings, ease of use and installation effort. In a second part an overview on how to analyze a scanner’s result and how Cross-Site Scripting attacks can be mitigated is given.

Posted by scg at 15 June 2021, 3:15 pm comment link

Speculative Analysis for Quality Assessment of Code Comments

Pooja Rani. Speculative Analysis for Quality Assessment of Code Comments. In 2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), p. 299-303, 2021. Details.


Previous studies have shown that high-quality code comments assist developers in program comprehension and maintenance tasks. However, the semi-structured nature of comments, unclear conventions for writing good comments, and the lack of quality assessment tools for all aspects of comments make their evaluation and maintenance a non-trivial problem. To achieve high-quality comments, we need a deeper understanding of code comment characteristics and the practices developers follow. In this thesis, we approach the problem of assessing comment quality from three different perspectives: what developers ask about commenting practices, what they write in comments, and how researchers support them in assessing comment quality. Our preliminary findings show that developers embed various kinds of information in class comments across programming languages. Still, they face problems in locating relevant guidelines to write consistent and informative comments, verifying the adherence of their comments to the guidelines, and evaluating the overall state of comment quality. To help developers and researchers in building comment quality assessment tools, we provide: (i) an empirically validated taxonomy of comment convention-related questions from various community forums, (ii) an empirically validated taxonomy of comment information types from various programming languages, (iii) a language-independent approach to automatically identify the information types, and (iv) a comment quality taxonomy prepared from a systematic literature review.

Posted by scg at 24 May 2021, 8:15 pm comment link

Makar: A Framework for Multi-source Studies based on Unstructured Data

Mathias Birrer, Pooja Rani, Sebastiano Panichella, and Oscar Nierstrasz. Makar: A Framework for Multi-source Studies based on Unstructured Data. In 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), () p. 577-581, 2021. Details.


To perform various development and maintenance tasks, developers frequently seek information on various sources such as mailing lists, Stack Overflow (SO), and Quora. Researchers analyze these sources to understand developer information needs in these tasks. However, extracting and preprocessing unstructured data from various sources, building and maintaining a reusable dataset is often a time-consuming and iterative process. Additionally, the lack of tools for automating this data analysis process complicates the task to reproduce previous results or datasets.To address these concerns we propose Makar, which provides various data extraction and preprocessing methods to support researchers in conducting reproducible multi-source studies. To evaluate Makar, we conduct a case study that analyzes code comment related discussions from SO, Quora, and mailing lists. Our results show that Makar is helpful for preparing reproducible datasets from multiple sources with little effort, and for identifying the relevant data to answer specific research questions in a shorter time compared to state-of-the-art tools, which is of critical importance for studies based on unstructured data. Tool webpage:

Posted by scg at 24 May 2021, 8:15 pm comment link
<< 1 2 3 4 5 6 7 8 9 10 >>
Last changed by admin on 21 April 2009