A Study of LLM Generated Pseudo-Data for Improving Small-Scale Models in Human Values Estimation

TAAI2024-Poster_2-1

Information

論文タイトル：A Study of LLM Generated Pseudo-Data for Improving Small-Scale Models in Human Values Estimation

著者：Yihong Han, Rintaro Tomitaka, Yoko Nishihara, Megumi Yasuo, and Junjie Shan

概要：In recent years, the development of large-scale language models (LLM) has dramatically improved text generation performance. However, general-purpose LLMs have the problem that they do not always perform optimally in tasks in specific fields. The approach of fine-tuning with data from specific fields is commonly used to address this problem, but collecting training data from limited fields is difficult. In particular, data related to human values have problems such as difficulty in annotation, insufficient amount of data, and large variability. In addition, the training and inference of LLMs is expensive. This study focused on the task of human values estimation and tested the effectiveness of an approach that uses an LLM to generate pseudo-data and trains a small-scale model on that data. In the experiment, we augmented a 3,870-items human values dataset with 10 categories to four times its original size with the generated pseudo-data. The training dataset with pseudo-data increased human values estimation accuracy by 17% than the original dataset without pseudo-data. The fine-tuned small-scale model with the accuracy of 57% also outperformed the LLM with the accuracy of 27% in human values estimation. This result indicates that pseudo-data generation using an LLM is effective in the human values estimation task.

書誌情報：TAAI2024

発表日：2024年12月6日