Nuthan Munaiah,
Benjamin S. Meyers, Cecilia O. Alm, Andrew Meneely, Pradeep K. Murukannaiah, Emily Prud'hommeaux, Josephine Wolff, and Yang Yu.
Proceedings of the 9th International Symposium for Engineering Secure Software and Systems
(ESSoS). Bonn, Germany.
ABSTRACT: Engineering secure software is challenging. Software development organizations leverage a host of processes
and tools to enable developers to prevent vulnerabilities in software. Code reviewing is one such approach
which has been instrumental in improving the overall quality of a software system. In a typical code review,
developers critique a proposed change to uncover potential vulnerabilities. Despite best efforts by
developers, some vulnerabilities inevitably slip through the reviews. In this study, we characterized
linguistic features—inquisitiveness, sentiment and syntactic complexity—of conversations between developers
in a code review, to identify factors that could explain developers missing a vulnerability. We used natural
language processing to collect these linguistic features from 3,994,976 messages in 788,437 code reviews
from the Chromium project. We collected 1,462 Chromium vulnerabilities to empirically analyze the linguistic
features. We found that code reviews with lower inquisitiveness, higher sentiment, and lower complexity were
more likely to miss a vulnerability. We used a Na¨ıve Bayes classifier to assess if the words (or lemmas) in
the code reviews could differentiate reviews that are likely to miss vulnerabilities. The classifier used a
subset of all lemmas (over 2 million) as features and their corresponding TF-IDF scores as values. The
average precision, recall, and F-measure of the classifier were 14%, 73%, and 23%, respectively. We believe
that our linguistic characterization will help developers identify problematic code reviews before they
result in a vulnerability being missed.
@InProceedings{munaiah2017Natural,
author = {Munaiah, Nuthan and Meyers, Benjamin S and Alm, Cecilia O and Meneely, Andrew and Murukannaiah, Pradeep K and Prud'hommeaux, Emily and
Wolff, Josephine and Yu, Yang},
title = {Natural Language Insights from Code Reviews that Missed a Vulnerability},
booktitle = {International Symposium on Engineering Secure Software and Systems},
pages = {70--86},
month = {August}, year = {2017},
address = {Bonn, Germany},
publisher = {Springer}
}