(Partial) Program Dependence Learning

Published in 45th IEEE/ACM International Conference on Software Engineering, 2023

Recommended citation: Aashish Yadavally, Wenbo Wang, Shaohua Wang, and Tien N. Nguyen. 2022. (Partial) Program Dependence Learning. In 45th IEEE/ACM International Conference on Software Engineering (ASE ’22), May 14-20, 2023, Melbourne, Australia.

Code fragments from developer forums often migrate to applications due to the code reuse practice. Owing to the incomplete nature of such programs, analyzing them to early determine the presence of potential vulnerabilities is challenging. In this work, we introduce NeuralPDA, a neural network-based program dependence analysis tool for both complete and partial programs. Our tool efficiently incorporates intra-statement and inter-statement contextual features into statement representations, thereby modeling program dependence analysis as a statement-pair dependence decoding task. In the empirical evaluation, we report that NeuralPDA predicts the CFG and PDG edges in complete Java and C/C++ code with combined F-scores of 94.29% and 92.46%, respectively. The F-score values for partial Java and C/C++ code range from 94.29%–97.17% and 92.46%–96.01%, respectively. We also test the usefulness of the PDGs predicted by NeuralPDA (i.e., PDG) on the downstream task of method-level vulnerability detection. We discover that the performance of the vulnerability detection tool utilizing PDG is only 1.1% less than that utilizing the PDGs generated by a program analysis tool. We also report the detection of 14 realworld vulnerable code snippets from StackOverflow by a machine learning-based vulnerability detection tool that employs the PDGs predicted by NeuralPDA for these code snippets.

Download paper here.