Posts by Collection

portfolio

publications

An Exploration of Machine Learning Based Day-Ahead Solar Irradiance Forecasting Methodologies.

Published in Diss. University of Georgia, 2020

Predicting solar irradiance is an important topic in renewable energy generation. In this work, the North American Mesoscale (NAM) Forecast System data is augmented with irradiance observations from the solar farm at the University of Georgia, towards forecasting 24 hours into the future. For the machine learning models used for this purpose, an input-selection scheme is presented and evaluated. This scheme significantly improved the performance, and resulted in a mean absolute error (MAE) of 72.63 W/m2, 44.94 W/m2 and 63.60 W/m2 for the dual-axis tracking, fixed-axis and singleaxis tracking solar arrays respectively. The effect of geographic expansion, by including additional weather forecasts is evaluated. Furthermore, to correct the reported bias in global horizontal irradiance (GHI) in NAM Forecast System, theory-driven bias-correction approaches are explored, where NAM Forecast System is selectively combined with Clear-Sky Scaling and Liu-Jordan techniques. In addition, the ability of predictive models involving clear-sky index to capture seasonal patterns is evaluated. Read more

Recommended citation: Yadavally, Aashish. An Exploration of Machine Learning Based Day-Ahead Solar Irradiance Forecasting Methodologies. Diss. University of Georgia, 2020.

Phrase2Set, Phrase-to-Set Machine Translation and Its Software Engineering Applications.

Published in 29th IEEE International Conference onSoftware Analysis, Evolution and Reengineering, 2021

Machine translation (MT) has been applied to software engineering (SE) problems, e.g., software tagging, language migration, bug localization, auto program repair, etc. However, MT primarily supports only sequence-to-sequence transformations and falls short during the translation/transformation from a phrase/sequence in the input to a set in the output. An example of such a task is tagging the input text in a software library tutorial or a forum entry text with a set of API elements that are relevant to the input. In this work, we propose PHRASE2SET, a context-sensitive statistical machine translation model that learns to transform a phrase of a mixture of code and texts into a set of text or code tokens. We first design a token-to-token algorithm that computes the probabilities mapping individual tokens from phrases to sets. We propose a Bayesian network-based statistical machine translation model that uses these probabilities to decide a translation process that maximizes the joint translation probability. To do so, we consider the context of the tokens in the source side and that in the target side via their relative co-occurrence frequencies. We evaluate PHRASE2SET in three SE applications: 1) tagging the fragments of texts in a tutorial with the relevant API elements, 2) tagging the StackOverflow entries with relevant API elements, 3) text-to-API translation. Our empirical results show that PHRASE2SET achieves high accuracy and outperforms the state-of-the-art models in all three applications. We also provide the lessons learned and other potential applications. Read more

Recommended citation: T. V. Nguyen, A. Yadavally and T. N. Nguyen, "Phrase2Set: Phrase-to-Set Machine Translation and Its Software Engineering Applications," 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), 2022, pp. 502-513, doi: 10.1109/SANER53432.2022.00068.

Next Syntactic-Unit Code Completion and Applications.

Published in 37th IEEE/ACM International Conference on Automated Software Engineering: New Ideas and Emerging Results (NIER) Track, 2022

Code completion is an important feature in an IDE to improve developers’ productivity. Existing code completion approaches focus on completing the current code token, next token or statement, or code pattern. We propose AstCC, a code completion approach to suggest the next syntactic unit via an AST-based statistical language model. AstCC learns from a large code corpus to derive the next AST subtree representing a syntactic unit, and then fills in the template with the concrete variables from the current program scope. Our empirical evaluation shows that AstCC can correctly suggest the next syntactic unit in 33% of the cases, and in 62% of the cases, it correctly suggests within five candidates. We will also explain the potential applications of AstCC in automated program repair, automated test case generation, and syntactic pattern mining Read more

Recommended citation: Anh Tuan Nguyen, Aashish Yadavally, and Tien N. Nguyen. 2022. Next Syntactic-Unit Code Completion and Applications. In 37th IEEE/ACM International Conference on Automated Software Engineering (ASE ’22), October 10–14, 2022, Rochester, MI, USA. ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3551349.3559544

DeepVD: Toward Class-Separation Features for Neural Network Vulnerability Detection

Published in 45th IEEE/ACM International Conference on Software Engineering, 2023

The advances of machine learning (ML) including deep learning (DL) have enabled several approaches to implicitly learn vulnerable code patterns to automatically detect software vulnerabilities. A recent study showed that despite successes, the existing ML/DL-based vulnerability detection (VD) models are limited in the ability to distinguish between the two classes of vulnerability and benign code. We propose DEEPVD, a graph-based neural network VD model that emphasizes on class-separation features between vulnerability and benign code. DEEPVD leverages three types of class-separation features at different levels of abstraction: statement types (similar to Part-of-Speech tagging), Post-Dominator Tree (covering regular flows of execution), and Exception Flow Graph (covering the exception and error-handling flows). We conducted several experiments to evaluate DEEPVD in a real-world vulnerability dataset of 303 projects with 13,130 vulnerable methods. Our results show that DEEPVD relatively improves over the state-of-the-art ML/DL-based VD approaches 13%–29.6% in precision, 15.6%–28.9% in recall, and 16.4%–25.8% in F-score. Our ablation study confirms that our designed features and components help DEEPVD achieve high class-separability for vulnerability and benign code. Read more

Recommended citation: Wenbo Wang, Tien N. Nguyen, Shaohua Wang, Yi Li, Jiyuan Zhang, and Aashish Yadavally. 2023. DeepVD: Toward Class-Separation Features for Neural Network Vulnerability Detection. In 45th IEEE/ACM International Conference on Software Engineering (ICSE ’23), May 14-20, 2023, Melbourne, Australia.

(Partial) Program Dependence Learning

Published in 45th IEEE/ACM International Conference on Software Engineering, 2023

Code fragments from developer forums often migrate to applications due to the code reuse practice. Owing to the incomplete nature of such programs, analyzing them to early determine the presence of potential vulnerabilities is challenging. In this work, we introduce NeuralPDA, a neural network-based program dependence analysis tool for both complete and partial programs. Our tool efficiently incorporates intra-statement and inter-statement contextual features into statement representations, thereby modeling program dependence analysis as a statement-pair dependence decoding task. In the empirical evaluation, we report that NeuralPDA predicts the CFG and PDG edges in complete Java and C/C++ code with combined F-scores of 94.29% and 92.46%, respectively. The F-score values for partial Java and C/C++ code range from 94.29%–97.17% and 92.46%–96.01%, respectively. We also test the usefulness of the PDGs predicted by NeuralPDA (i.e., PDG) on the downstream task of method-level vulnerability detection. We discover that the performance of the vulnerability detection tool utilizing PDG is only 1.1% less than that utilizing the PDGs generated by a program analysis tool. We also report the detection of 14 realworld vulnerable code snippets from StackOverflow by a machine learning-based vulnerability detection tool that employs the PDGs predicted by NeuralPDA for these code snippets. Read more

Recommended citation: Aashish Yadavally, Wenbo Wang, Shaohua Wang, and Tien N. Nguyen. 2022. (Partial) Program Dependence Learning. In 45th IEEE/ACM International Conference on Software Engineering (ASE ’22), May 14-20, 2023, Melbourne, Australia.

talks

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post. Read more

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post. Read more