Description
This paper presents the development of a domain-specific chatbot in the field of Sino-Vietnamese classical literature, with a focus on the fiction genre. We fine-tuned multiple large language models (LLMs) on a dataset constructed from Yuenan Hanwen Xiaoshuo Jicheng — Collected Classical Chinese Novels of Vietnam. By applying instruction-tuning techniques and conversational formatting, the models were adapted to understand, interpret, and respond to queries related to Yuenan Hanwen Xiaoshuo Jicheng in Vietnamese. Experimental results show that the fine-tuned models demonstrate strong capabilities in literary comprehension, content explanation, and user interaction. The chatbot system achieved a highest BERTScore (F1) of 78.5%, indicating its potential as an effective tool for supporting the study and dissemination of Sino-Vietnamese classical literature.
Từ khóa
Sino-Vietnamese classical literature
Yuenan Hanwen Xiaoshuo Jicheng
Domain-specific chatbot
LoRA fine-tuning
NLP for Cultural Heritage
Thông tin các tác giả
- Ủ Cao Kỳ Long, currently a master's student at VNUHCM – University of Science, 227 Nguyễn Văn Cừ Street, Chợ Quán Ward, Ho Chi Minh City, Vietnam. Email: ucaokylong.hardcore@gmail.com
- Huỳnh Thanh Xuân, currently a master's student at VNUHCM – University of Science, 227 Nguyễn Văn Cừ Street, Chợ Quán Ward, Ho Chi Minh City, Vietnam. Email: xuanhuynh233@gmail.com
- Phạm Hoàng Vũ, currently a master's student at VNUHCM – University of Science, 227 Nguyễn Văn Cừ Street, Chợ Quán Ward, Ho Chi Minh City, Vietnam. Email: phamhoangvu1811995@gmail.com
- Lưu Thiện Đức, currently a master's student at VNUHCM – University of Science, 227 Nguyễn Văn Cừ Street, Chợ Quán Ward, Ho Chi Minh City, Vietnam. Email: ducluuthien@gmail.com