College of Health, Science, and Technology

WEDA: Exploring Copyright Protection for Large Language Model Downstream Alignment

Shen Wang
Jialiang Dong
Longfei Wu
Zhitao Guan

Document Type

Article

Publication Date

2024

Abstract

Large Language Models (LLMs) have shown incomparable representation and generalization capabilities, which have led to significant advancements in Natural Language Processing (NLP). Before deployment, the pre-trained LLMs often need to be tailored to specific downstream tasks for improved performance, which is commonly referred to as downstream alignment. This is a costly effort considering the needed manpower, training resources, and downstream-specific data. While much attention has been paid to protecting the copyright of the models themselves, the copyright protection of LLM alignment has been largely overlooked. In this paper, we present Watermark Embedding for Downstream Alignment (WEDA) scheme, which can provide effective copyright protection for two popular LLM alignment techniques parameter-efficient fine-tuning (PEFT) and in-context learning (ICL). For alignment through PEFT, we propose a Chain of Thought (CoT) based solution to embed watermarks into the PEFT weights. Furthermore, we extend this solution to safeguard alignment through ICL by utilizing the prefix-integrated CoT to watermark examples embedded within ICL prompts. We conduct an extensive experimental evaluation to demonstrate the effectiveness of our proposed scheme. © 2024 Elsevier B.V., All rights reserved.

Recommended Citation

Wang, Shen; Dong, Jialiang; Wu, Longfei; and Guan, Zhitao, "WEDA: Exploring Copyright Protection for Large Language Model Downstream Alignment" (2024). College of Health, Science, and Technology. 1176.
https://digitalcommons.uncfsu.edu/college_health_science_technology/1176

Link to Full Text

COinS

College of Health, Science, and Technology

WEDA: Exploring Copyright Protection for Large Language Model Downstream Alignment

Document Type

Publication Date

Abstract

Recommended Citation

Search

Browse

Author Corner

College of Health, Science, and Technology

WEDA: Exploring Copyright Protection for Large Language Model Downstream Alignment

Authors

Document Type

Publication Date

Abstract

Recommended Citation

Share

Search

Browse

Author Corner