【LLMs+小羊驼】23.03.Vicuna: 类似GPT4的开源聊天机器人（ 90%* ChatGPT Quality）

官方在线demo: https://chat.lmsys.org/
Github项目代码：https://github.com/lm-sys/FastChat
官方博客：Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality
模型下载: https://huggingface.co/lmsys/vicuna-7b-v1.5 | 所有的模型
解读：量子位科技报道 | | 知乎陈城南 || GPT的一生
相关-斯坦福羊驼模型 Alpaca: A Strong, Replicable Instruction-Following Model

在这里插入图片描述

一、简介

1.1 什么是Vicuna(小羊驼)? （类似GPT4的开源聊天机器人）

Vicuna（音标 vɪˈkjuːnə ,小羊驼、骆马）
是 基于LLaMA的指令**微调**模型 （类似GPT的文本生成模型）
LLaMA: 是基础大语言模型，用大量质量一般的互联网文本数据训练，与GPT3 、PaLM类似
与Stanford Alpaca (ælˈpækə，又叫羊驼)的关系: 都是对LLaMa的微调，但是Vicuna数据集质量更高性能更好，参照Alpaca的训练

Vicuna 用ShareGPT网站的用户分享的7w条ChatGPT对话记录，对 LLaMA进行监督质量微调训练（Supervised Finturning），性能超越了LLaMa和Stanford Alpaca，达到了与ChatGPT相似的水平。
在这里插入图片描述

Vicuna1.5（LLaMA2上微调的）

Vicuna1.5= LLaMA2 + 125K 对话（ShareGPT.com）

Vicuna v1.5 is fine-tuned from Llama 2 with supervised instruction fine-tuning. The training data is around 125K conversations collected from ShareGPT.com. See more details in the “Training Details of Vicuna Models” section in the appendix of this paper.

支持中文，但是中文数据只占LLaMA2的0.13%，有监督微调占的比例未知。
在这里插入图片描述

1.1.2 性能对比

使用GPT4做裁判，设置问题，进行验证和评分

在这里插入图片描述

1.2 GPT相关概念 ?

下面内容来源: https://karpathy.ai/stateofgpt.pdf

1.2.1 GPT的4个阶段：

预训练（Pretraining）： 基础大语言模型，用大量质量一般的互联网文本数据无监督训练，典型代表是GPT3 、PaLM，LLaMA:
有监督的精调（SFT, Supervised Finetuning）: 人工精心设计问答
奖励建模（RM，Reward Modeling）
强化学习（RL，Reinforcement Learning）：典型代表是chatgpt Claude.

在这里插入图片描述

1.2.2 什么是token？（字符切分的最小单位，1 token ~= 0.75 of word）

将单词切分为
在这里插入图片描述

二、本地部署（linux服务器）

参考1 ：https://juejin.cn/post/7341593721100386344

本机环境：cuda12.1 + 3090ti

7B未压缩-占用约13G显存
在这里插入图片描述

模型和项目下载

下载项目

git clone https://github.com/lm-sys/FastChat.git

下载相关模型

按需求和显存选择模型

lmsys/vicuna-7b-v1.5
lmsys/vicuna-7b-v1.5-16k
lmsys/vicuna-13b-v1.5-16k
lmsys/vicuna-33b-v1.3

如果下载遇到问题，令export HF_HUB_ENABLE_HF_TRANSFER=0

export HF_ENDPOINT=https://hf-mirror.com
pip install -U huggingface_hub
pip install -U hf-transfer
export HF_HUB_ENABLE_HF_TRANSFER=1
huggingface-cli download --resume-download lmsys/vicuna-7b-v1.5  --local-dir ./weights/vicuna-7b-v1.5
# 
# 或者13b
huggingface-cli download --resume-download lmsys/vicuna-13b-v1.5  --local-dir ./weights/vicuna-13b-v1.5

安装依赖

参考官网：https://github.com/lm-sys/FastChat/blob/main/pyproject.toml

conda create -n fastchat   python=3.10 -y
conda activate fastchat
pip install "fschat[model_worker,webui]"

启动

方式1：纯命令端启动（不推荐）

python -m fastchat.serve.cli --model-path weights/vicuna-7b-v1.5

方式2：gradio ui对话 (启动3个服务 server 、model、gradio)

服务器ip+端口

# server 控制器
python3 -m fastchat.serve.controller

# 模型相关
python -m fastchat.serve.model_worker --model-path weights/vicuna-7b-v1.5/
# 连接测试（可不选）
python3 -m fastchat.serve.test_message --model-name vicuna-7b-v1.5

在这里插入图片描述

文章目录

一、简介

1.1 什么是Vicuna(小羊驼)? （类似GPT4的开源聊天机器人）

Vicuna1.5（LLaMA2上微调的）

1.1.2 性能对比

1.2 GPT相关概念 ?

1.2.1 GPT的4个阶段：

1.2.2 什么是token？ （字符切分的最小单位，1 token ~= 0.75 of word）

二 、本地部署 （linux服务器）

本机环境：cuda12.1 + 3090ti

模型和项目下载

下载相关模型

安装依赖

启动

方式1：纯命令端 启动（不推荐）

方式2：gradio ui对话 (启动3个服务 server 、model、gradio)

1.2.2 什么是token？（字符切分的最小单位，1 token ~= 0.75 of word）

二、本地部署（linux服务器）

方式1：纯命令端启动（不推荐）