Phrase2Set, Phrase-to-Set Machine Translation and Its Software Engineering Applications.

Published in 29th IEEE International Conference onSoftware Analysis, Evolution and Reengineering, 2021

Recommended citation: T. V. Nguyen, A. Yadavally and T. N. Nguyen, "Phrase2Set: Phrase-to-Set Machine Translation and Its Software Engineering Applications," 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), 2022, pp. 502-513, doi: 10.1109/SANER53432.2022.00068.

Machine translation (MT) has been applied to software engineering (SE) problems, e.g., software tagging, language migration, bug localization, auto program repair, etc. However, MT primarily supports only sequence-to-sequence transformations and falls short during the translation/transformation from a phrase/sequence in the input to a set in the output. An example of such a task is tagging the input text in a software library tutorial or a forum entry text with a set of API elements that are relevant to the input. In this work, we propose PHRASE2SET, a context-sensitive statistical machine translation model that learns to transform a phrase of a mixture of code and texts into a set of text or code tokens. We first design a token-to-token algorithm that computes the probabilities mapping individual tokens from phrases to sets. We propose a Bayesian network-based statistical machine translation model that uses these probabilities to decide a translation process that maximizes the joint translation probability. To do so, we consider the context of the tokens in the source side and that in the target side via their relative co-occurrence frequencies. We evaluate PHRASE2SET in three SE applications: 1) tagging the fragments of texts in a tutorial with the relevant API elements, 2) tagging the StackOverflow entries with relevant API elements, 3) text-to-API translation. Our empirical results show that PHRASE2SET achieves high accuracy and outperforms the state-of-the-art models in all three applications. We also provide the lessons learned and other potential applications.

Download paper here.