分类：未分类

Is Free Self-Alignment Possible?
This paper investigates the possibility of aligning large language models (LLMs) without the need for human-annotated data or expensive fine-tuning. The authors propose AlignEZ, a novel method that leverages self-generated preference data and representation editing to achieve nearly cost-free alignment.

Here’s a breakdown of the paper’s key aspects:

1. Motivation:
- Traditional LLM alignment methods heavily rely on human preference data and computationally expensive fine-tuning, limiting scalability.
- Recent research suggests that alignment might simply be revealing knowledge already present in pretrained models.
2. AlignEZ Approach:
- Self-Generated Preference Data:
  - The base LLM is prompted to generate its own preference data by describing characteristics of helpful and harmful responses.
  - Using these characteristics, the LLM generates pairs of responses, simulating preference comparisons.
- Identifying Preference Directions:
  - The self-generated preference pairs are used to identify directions in the LLM’s embedding space that correspond to helpful and harmful attributes.
  - Two methods are explored:
    
    SVD-Based Identification: Applies Singular Value Decomposition (SVD) on the embedding matrix of preference data to extract the principal eigenvector as the preference direction.
    
    CCS-Based Identification: Utilizes a Contrastive Concept Shap (CCS) probe trained on the self-generated data to identify directions maximizing the difference between helpful and harmful attributes.
- Representation Editing:
  - During inference, the LLM’s embeddings are modified by:
    
    Boosting components aligned with the helpful direction.
    
    Neutralizing components aligned with the harmful direction.
3. Experiments and Results:
- AlignEZ significantly reduces the performance gap between base and traditionally aligned models by an average of 31.6% across various datasets and model architectures.
- It effectively expedites more expensive alignment methods like DPO by improving models trained with limited ground-truth data.
4. Key Findings:
- Self-alignment is achievable to a significant degree without external data or fine-tuning.
- AlignEZ offers a cost-effective way to improve LLM alignment, potentially enabling real-time personalization and fine-grained control.
5. Limitations and Future Work:
- The quality of self-generated preference data influences AlignEZ’s effectiveness.
- Further research is needed to explore its applicability to more complex alignment tasks and different data modalities.
In conclusion, AlignEZ presents a promising step towards free self-alignment, offering a cost-effective and potentially scalable approach to aligning LLMs with human preferences.

免费自对齐：让语言模型更懂你？

大型语言模型（LLM）正在改变我们的世界，但它们也存在着一些问题。比如，它们有时会生成不准确、不友善或带有偏见的信息。为了解决这些问题，研究人员一直在努力对齐 LLM，使其更符合人类的价值观和偏好。

传统的对齐方法通常需要大量的标注数据和大量的计算资源，这对于许多研究人员和开发者来说都是一个巨大的挑战。那么，有没有一种更经济、更便捷的对齐方法呢？

AlignEZ：几乎免费的对齐

最近，来自威斯康星大学麦迪逊分校的研究人员提出了一种名为 AlignEZ 的新方法，它可以实现几乎免费的 LLM 自对齐。AlignEZ 的核心思想是利用 LLM 自身生成的偏好数据来修改其内部表示，从而引导模型生成更符合人类期望的输出。

如何实现自对齐？

AlignEZ 的工作流程主要分为三个步骤：
1. 生成偏好数据： 研究人员首先使用 LLM 自身生成偏好数据。他们向 LLM 提出一些问题，并要求 LLM 描述理想的回答和不理想的回答应该具备的特征。然后，他们再次向 LLM 提出相同的问题，并要求 LLM 根据之前描述的特征生成不同的回答。这样，他们就得到了 LLM 自身生成的偏好数据对。
2. 识别偏好方向： 接下来，研究人员使用这些偏好数据对来识别 LLM 内部表示空间中与人类偏好相关的方向。他们使用两种方法来实现这一目标：
  - 奇异值分解 (SVD)： SVD 可以帮助识别 LLM 内部表示空间中主要的方向，这些方向通常与人类偏好相关。
  - 对比一致性搜索 (CCS)： CCS 则可以帮助识别 LLM 内部表示空间中的超平面，这个超平面可以将理想的回答与不理想的回答区分开来。
3. 编辑内部表示： 最后，研究人员使用识别出的偏好方向来修改 LLM 的内部表示。他们通过增强与人类偏好相关的方向，并抑制与不理想特征相关的方向来引导 LLM 生成更符合人类期望的输出。
实验结果：显著提高模型性能

研究人员在六个不同的数据集和三种不同的 LLM 架构上测试了 AlignEZ 的效果。结果表明，AlignEZ 可以显著缩小 LLM 与其对齐版本之间的性能差距，平均提高了 31.6%。

更重要的是，AlignEZ 还可以加速更昂贵的对齐方法，例如 DPO。研究人员发现，AlignEZ 可以提高仅使用少量标注数据训练的 DPO 模型的性能。

未来展望：更精准、更个性化的对齐

AlignEZ 的出现为 LLM 对齐领域开辟了新的可能性。研究人员希望未来能够进一步改进 AlignEZ，使其能够更精准地识别人类偏好，并实现更个性化的对齐。

总结

AlignEZ 是一种新颖的 LLM 自对齐方法，它可以利用 LLM 自身生成的偏好数据来实现几乎免费的对齐。AlignEZ 的实验结果表明，它可以显著提高 LLM 的性能，并加速更昂贵的对齐方法。AlignEZ 的出现为 LLM 对齐领域开辟了新的可能性，为未来更精准、更个性化的 LLM 对齐技术奠定了基础。

参考文献

[1] AI@Meta. Llama 3 model card. 2024. URL https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md.

[2] Chuang et al. Debiasing vision-language models via biased prompts. arXiv preprint 2302.00070, 2023.

[3] Touvron et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.

[4] Bender et al. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 610–623, 2021.

[5] Bommasani et al. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.

[6] Burns et al. Discovering latent knowledge in language models without supervision. arXiv preprint arXiv:2212.03827, 2022.

[7] Christiano et al. Deep reinforcement learning from human preferences. Advances in neural information processing systems, 30, 2017.

[8] Dalvi et al. Discovering latent concepts learned in bert. arXiv preprint arXiv:2205.07237, 2022.

[9] Cui et al. Ultrafeedback: Boosting language models with high-quality feedback, 2023.

[10] Dettmers et al. Qlora: Efficient finetuning of quantized llms, 2023.

[11] Hoffmann et al. An empirical analysis of compute-optimal large language model training. Advances in Neural Information Processing Systems, 35:30016–30030, 2022.

[12] Jiang et al. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023.

[13] Li et al. Self-alignment with instruction backtranslation. arXiv preprint arXiv:2308.06259, 2023a.

[14] Li et al. Inference-time intervention: Eliciting truthful answers from a language model. Advances in Neural Information Processing Systems, 36, 2024.

[15] Lee et al. Deduplicating training data makes language models better. arXiv preprint arXiv:2107.06499, 2021.

[16] Mangrulkar et al. Peft: State-of-the-art parameter-efficient fine-tuning methods. https://github.com/huggingface/peft, 2022.

[17] McIntosh et al. From google gemini to openai q*(q-star): A survey of reshaping the generative artificial intelligence (ai) research landscape. arXiv preprint arXiv:2312.10868, 2023.

[18] Ouyang et al. Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35:27730–27744, 2022.

[19] Rafailov et al. Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems, 36, 2024.

[20] Sun et al. Principle-driven self-alignment of language models from scratch with minimal human supervision. Advances in Neural Information Processing Systems, 36, 2024.

[21] Li et al. Alpacaeval: An automatic evaluator of instruction-following models, 2023b.

[22] Limisiewicz et al. Debiasing algorithm through model adaptation. arXiv preprint arXiv:2310.18913, 2023.

[23] Lin et al. The unlocking spell on base llms: Rethinking alignment via in-context learning. arXiv preprint arXiv:2312.01552, 2023.

[24] Loshchilov and Hutter. Decoupled weight decay regularization, 2019.

[25] Raschka. Finetuning llms with lora and qlora: Insights from hundreds of experiments, Oct 2023. URL https://lightning.ai/pages/community/lora-insights/?utm_medium=social&utm_source=twitter&utm_campaign=Education_10132023.

[26] Schulman et al. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.

[27] Tamkin et al. Understanding the capabilities, limitations, and societal impact of large language models. CoRR, abs/2102.02503, 2021. URL https://arxiv.org/abs/2102.02503.

[28] Tunstall et al. Zephyr: Direct distillation of lm alignment, 2023.

[29] Wang et al. Self-instruct: Aligning language models with self-generated instructions. arXiv preprint arXiv:2212.10560, 2022.

[30] Wu et al. Reft: Representation finetuning for language models. arXiv preprint arXiv:2404.03592, 2024.

[31] Xie et al. Data selection for language models via importance resampling. Advances in Neural Information Processing Systems, 36:34201–34227, 2023.

[32] Zheng et al. Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in Neural Information Processing Systems, 36, 2024.

[33] Zhou et al. Lima: Less is more for alignment. Advances in Neural Information Processing Systems, 36, 2024.

[34] Introducing Meta Llama 3: The most capable openly available LLM to date — ai.meta.com. https://ai.meta.com/blog/meta-llama-3/, 2024.

[35] Adila et al. Zero-shot robustification of zero-shot models with foundation models. arXiv preprint arXiv:2309.04344, 2023.

[36] Fränken et al. Self-supervised alignment with mutual information: Learning to follow principles without preference labels. arXiv preprint arXiv:2404.14313, 2024.

[37] Han et al. Lm-switch: Lightweight language model conditioning in word embedding space. arXiv preprint arXiv:2305.12798, 2023.

[38] Guo et al. Human-instruction-free llm self-alignment with limited samples. arXiv preprint arXiv:2401.06785, 2024.

[39] Kenton et al. Alignment of language agents. arXiv preprint arXiv:2103.14659, 2021.

[40] Sun et al. Principle-driven self-alignment of language models from scratch with minimal human supervision. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 2511–2565. Curran Associates, Inc., 2023. URL https://proceedings.neurips.cc/paper_files/paper/2023/file/0764db1151b936aca59249e2c13886101-Paper-Conference.pdf.

[41] Zou et al. Representation engineering: A top-down approach to ai transparency, october 2023. URL http://arxiv.org/abs/2310.01405.
2024 年 6 月 8 日
让语音合成更具表现力：StyleMoE 的“分而治之”策略
近年来，语音合成技术取得了长足进步，合成语音不仅清晰易懂，还拥有丰富的感情和韵律，更接近于人类的表达方式。然而，如何从各种不同的参考语音中提取并编码风格信息仍然是一个挑战，尤其是当遇到从未见过的语音风格时。

StyleMoE：将风格编码空间“分而治之”

为了解决这一难题，研究人员提出了 StyleMoE，一种将风格编码空间划分为多个可处理的子空间，并由专门的“风格专家”负责处理的模型。StyleMoE 将 TTS 系统中的风格编码器替换为一个“专家混合” (MoE) 层。通过使用门控网络将参考语音路由到不同的风格专家，每个专家在优化过程中专门负责风格空间的特定方面。

StyleMoE 的工作原理

StyleMoE 的核心思想是将风格编码空间划分为多个子空间，每个子空间由一个专门的风格专家负责处理。这就像将一个复杂的难题分解成多个更容易解决的小问题，每个专家都专注于解决其中一个问题。

具体来说，StyleMoE 使用一个门控网络来决定哪个专家应该处理当前的参考语音。门控网络会根据参考语音的特点，选择最适合的专家，并为每个专家分配相应的权重。每个专家都拥有独立的参数，在优化过程中只负责处理分配给它的子空间，从而提高模型的效率和准确性。

StyleMoE 的优势

StyleMoE 的优势在于：
- 提高风格空间覆盖率：通过将风格编码空间划分为多个子空间，StyleMoE 可以更好地处理各种不同的风格，包括从未见过的风格。
- 提高模型泛化能力：每个专家只负责处理特定的子空间，这有助于提高模型的泛化能力，减少模型对训练数据的依赖。
- 降低计算成本：StyleMoE 使用稀疏 MoE，这意味着只有少数专家会参与到模型的计算中，从而降低了模型的计算成本。
实验结果

研究人员在 ESD 和 VCTK 数据集上对 StyleMoE 进行了测试，结果表明，StyleMoE 在各种指标上都优于基线模型，包括：
- 提高语音质量：StyleMoE 合成的语音具有更高的自然度和清晰度。
- 提高风格相似度：StyleMoE 合成的语音更接近于参考语音的风格。
- 提高模型泛化能力：StyleMoE 在处理从未见过的风格时表现出色。
未来展望

StyleMoE 为语音合成技术的进步开辟了新的方向。未来，研究人员将继续探索不同的门控网络架构，并尝试将 StyleMoE 应用于更复杂的语音合成系统。

参考文献

[1] M. Schr¨oder, “Emotional speech synthesis: A review,” in Seventh European Conference on Speech Communication and Technology, 2001.

[2] A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. W. Senior, and K. Kavukcuoglu, “Wavenet: A generative model for raw audio,” ArXiv, vol. abs/1609.03499, 2016. [Online]. Available: https://api.semanticscholar.org/CorpusID:6254678

[3] Y. Wang, R. J. Skerry-Ryan, D. Stanton, Y. Wu, R. J. Weiss, N. Jaitly, Z. Yang, Y. Xiao, Z. Chen, S. Bengio, Q. V. Le, Y. Agiomyrgiannakis, R. A. J. Clark, and R. A. Saurous, “Tacotron: Towards end-to-end speech synthesis,” in Interspeech, 2017. [Online]. Available: https://api.semanticscholar.org/CorpusID:4689304

[4] N. Li, S. Liu, Y. Liu, S. Zhao, and M. Liu, “Neural speech synthesis with transformer network,” in Proceedings of the AAAI conference on artificial intelligence, vol. 33, no. 01, 2019, pp. 6706–6713.

[5] X. Tan, T. Qin, F. Soong, and T.-Y. Liu, “A survey on neural speech synthesis,” 2021.

[6] S. Takamichi, T. Toda, A. W. Black, G. Neubig, S. Sakti, and S. Nakamura, “Postfilters to modify the modulation spectrum for statistical parametric speech synthesis,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 4, pp. 755–767, 2016.

[7] H.-T. Luong, S. Takaki, G. E. Henter, and J. Yamagishi, “Adapting and controlling dnn-based speech synthesis using input codes,” in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 4905–4909.

[8] Y. Lee, A. Rabiee, and S.-Y. Lee, “Emotional end-to-end neural speech synthesizer,” arXiv preprint arXiv:1711.05447, 2017.

[9] R. Skerry-Ryan, E. Battenberg, Y. Xiao, Y. Wang, D. Stanton, J. Shor, R. Weiss, R. Clark, and R. A. Saurous, “Towards end-to-end prosody transfer for expressive speech synthesis with tacotron,” in international conference on machine learning.
PMLR, 2018, pp. 4693–4702.

[10] Y. Wang, D. Stanton, Y. Zhang, R.-S. Ryan, E. Battenberg, J. Shor, Y. Xiao, Y. Jia, F. Ren, and R. A. Saurous, “Style tokens: Unsupervised style modeling, control and transfer in end-to-end speech synthesis,” in International conference on machine learning. PMLR, 2018, pp. 5180–5189.

[11] K. Akuzawa, Y. Iwasawa, and Y. Matsuo, “Expressive speech synthesis via modeling expressions with variational autoencoder,” arXiv preprint arXiv:1804.02135, 2018.

[12] Y. Ren, C. Hu, X. Tan, T. Qin, S. Zhao, Z. Zhao, and T.-Y. Liu, “Fastspeech 2: Fast and high-quality end-to-end text to speech,” arXiv preprint arXiv:2006.04558, 2020.

[13] A. Ła´ncucki, “Fastpitch: Parallel text-to-speech with pitch prediction,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 6588–6592.

[14] G. Sun, Y. Zhang, R. J. Weiss, Y. Cao, H. Zen, and Y. Wu, “Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis,” in ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2020, pp. 6264–6268.

[15] R. Huang, Y. Ren, J. Liu, C. Cui, and Z. Zhao, “Generspeech: Towards style transfer for generalizable out-of-domain text-to-speech,” Advances in Neural Information Processing Systems, vol. 35, pp. 10 970–10 983, 2022.

[16] R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton, “Adaptive mixtures of local experts,” Neural computation, vol. 3, no. 1, pp. 79–87, 1991.

[17] S. Masoudnia and R. Ebrahimpour, “Mixture of experts: a literature survey,” Artificial Intelligence Review, vol. 42, pp. 275–293, 2014.

[18] N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. Le, G. Hinton, and J. Dean, “Outrageously large neural networks: The sparsely-gated mixture-of-experts layer,” arXiv preprint arXiv:1701.06538, 2017.

[19] C. Riquelme, J. Puigcerver, B. Mustafa, M. Neumann, R. Jenatton, A. Susano Pinto, D. Keysers, and N. Houlsby, “Scaling vision with sparse mixture of experts,” Advances in Neural Information Processing Systems, vol. 34, pp. 8583–8595, 2021.

[20] D. Eigen, M. Ranzato, and I. Sutskever, “Learning factored representations in a deep mixture of experts,” arXiv preprint arXiv:1312.4314, 2013.

[21] D. Min, D. B. Lee, E. Yang, and S. J. Hwang, “Meta-stylespeech: Multi-speaker adaptive text-to-speech generation,” in International Conference on Machine Learning. PMLR, 2021, pp. 7748–7759.

[22] T. H. Teh, V. Hu, D. S. R. Mohan, Z. Hodari, C. G. Wallis, T. G. Ibarrondo, A. Torresquintero, J. Leoni, M. Gales, and S. King, “Ensemble prosody prediction for expressive speech synthesis,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5.

[23] Y. Yan, X. Tan, B. Li, G. Zhang, T. Qin, S. Zhao, Y. Shen, W.-Q. Zhang, and T.-Y. Liu, “Adaspeech 3: Adaptive text to speech for spontaneous style,” arXiv preprint arXiv:2107.02530, 2021.

[24] H. Zen, V. Dang, R. Clark, Y. Zhang, R. J. Weiss, Y. Jia, Z. Chen, and Y. Wu, “Libritts: A corpus derived from librispeech for text-to-speech,” arXiv preprint arXiv:1904.02882, 2019.

[25] J. Yamagishi, C. Veaux, and K. MacDonald, “Cstr vctk corpus: English multi-speaker corpus for cstr voice cloning toolkit (version 0.92),” 2019. [Online]. Available: https://api.semanticscholar.org/CorpusID:213060286

[26] K. Zhou, B. Sisman, R. Liu, and H. Li, “Seen and unseen emotional style transfer for voice conversion with a new emotional speech dataset,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 920–924.

[27] X. An, F. K. Soong, and L. Xie, “Disentangling style and speaker attributes for tts style transfer,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 646–658, 2022.

https://arxiv.org/pdf/2406.03637 https://arxiv.org/html/2406.03637v1
2024 年 6 月 8 日

分类： 未分类

Is Free Self-Alignment Possible?

免费自对齐：让语言模型更懂你？

让语音合成更具表现力：StyleMoE 的“分而治之”策略

分类：未分类