Abstract:In order to enhance the accuracy of current large language model (LLM) in generating answers using retrieval documents, a retrieval-augmented generation method based on domain knowledge was proposed. Firstly, during the retrieval process, the first layer of sparse retrieval was conducted using both the question and domain knowledge, providing a domain-specific dataset for subsequent dense retrieval. Secondly, in the generation process, a zero-shot learning method was employed to concatenate domain knowledge before or after the question, and combined it with the retrieved documents to input into the large language model. Finally, extensive experiments were conducted on datasets in the medical and legal domains using ChatGLM2-6B and Baichuan2-7B-chat, and performance evaluations were conducted. The results indicate that the retrieval-augmented generation method based on domain knowledge can effectively improve the domain relevance of the answers generated by large language models, and the zero-shot learning method outperforms the fine-tuning method. When the zero-shot learning method is used, the sparse retrieval incorporating domain knowledge and the method of placing domain knowledge before the question achieve the best improvement on ChatGLM2-6B, with ROUGE-1, ROUGE-2 and ROUGE-L scores increasing by 3.82, 1.68 and 4.32 percentage points respectively compared to the baseline method. The proposed method can enhance the accuracy of the answers generated by large language models and provide an important reference for the research and application of open-domain question answering.