在Apple M1 上运行LLaMA

link之家

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

相关文章推荐

爱跑步的沙滩裤 · 《部落冲突》互刷什么意思互刷介绍_九游手机游戏· 2 月前 ·

不爱学习的墨镜 · 算术平均值，几何平均值 - CSDN文库· 2 月前 ·

不拘小节的山寨机 · 浙江大学2022年江苏省综合评价招生简章 ...· 2 月前 ·

讲道义的键盘 · 建造了88座美术馆后，80岁的安藤忠雄如何挑 ...· 2 月前 ·

仗义的铅笔 · 凡▪艾克—模块结构的单元设计—阿姆斯特丹孤儿 ...· 1 年前 ·

# clone repo and install dependences git clone https://github.com/ggerganov/llama.cpp cd llama.cpp python -m pip install torch numpy sentencepiece # download 7B model mkdir -p models/7B/ wget -P models/7B/ https://huggingface.co/nyanko7/LLaMA-7B/resolve/main/consolidated.00.pth wget -P models/7B/ https://huggingface.co/nyanko7/LLaMA-7B/raw/main/params.json wget -P models/7B/ https://huggingface.co/nyanko7/LLaMA-7B/raw/main/checklist.chk wget -P models/ https://huggingface.co/nyanko7/LLaMA-7B/resolve/main/tokenizer.model # converts the model to "ggml FP16 format" python convert-pth-to-ggml.py models/7B/ 1 # quantizes the model to 4-bits ./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2 # enjoy ./main -m ./models/7B/ggml-model-q4_0.bin \ -t 8 \ -n 128 \ -p 'I Have a Dream'

目前已知的模型有：

7B: 1个模型文件，占用空间 13GB,转换后总占用空间 30GB

13B: 2个模型文件，占用空间 25GB,转换后总占用空间 60GB

30B: 4个模型文件，占用空间 61GB,转换后总占用空间 120GB

65B: 8个模型文件，占用空间 122GB,转换后总占用空间 240GB

每个模型的内存占用空间大小约为 4GB ，根据自己机器内存大小选择合适的模型

Meta并没有公开模型的hash值，所以请自行判断是否要运行目前已知的泄漏地址有以下几个:

官方库的PR

有人在官方库上 故意不小心 提交了模型的磁力链接

magnet:?xt=urn:btih:ZXXDAUWYLRUXXBHUYEMS6Q5CE5WA3LVA&dn=LLaMA
llma-dl 库
new bing找到的库，里面用的好像是作者自己的API接口
curl -o- https://raw.githubusercontent.com/shawwn/llama-dl/56f50b96072f42fb2520b1ad5a1d6ef30351f23c/llama.sh | bash
或者通过磁力链接
magnet:?xt=urn:btih:b8287ebfa04f879b048d4d4404108cf3e8014352&dn=LLaMA&tr=udp%3a%2f%2ftracker.opentrackr.org%3a1337%2fannounce
huggingface.co
目前找到的只有7B 和 65B的模型
huggingface.co/nyanko7/LLa…
huggingface.co/datasets/ny…
软/硬件依赖
笔者机器硬件是 Apple M1 8-core 16GB RAM
系统版本是 12.5.1
clang 版本如下
❯ c++ -v
Apple clang version 14.0.0 (clang-1400.0.29.102)
Target: arm64-apple-darwin21.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
Python
Python 目前是基于3.10 版本
如果没有对应的python版本，可以通过 pipenv 或者 conda 创建一个虚拟环境出来
pipenv shell --python 3.10
conda create -n llama python=3.10
conda activate llama
pip install torch numpy sentencepiece
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
编译出 main 和 quantize
确保模型已经下载到对应的文件夹内
下面以 7B 模型举例子
ls ./models
tokenizer.model
将模型转换为 ggml FP16格式
python convert-pth-to-ggml.py models/7B/ 1
这一步会生成一个13GB的 models/7B/ggml-model-f16.bin 文件
下一步将模型量化为4-bit
./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2
如果你的模型数量有多个，需要分批次来处理
比如13B的两个模型文件
./quantize ./models/13B/ggml-model-f16.bin   ./models/13B/ggml-model-q4_0.bin 2
./quantize ./models/13B/ggml-model-f16.bin.1 ./models/13B/ggml-model-q4_0.bin.1 2
享受AI的时刻
笔者用的是13B模型，-t 是线程数量,-n 是token数量 , -p 是你输入的内容
❯ ./main -m models/13B/ggml-model-q4_0.bin -t 8 -n 409600 -p 'I Have a Dream'
main: seed = 1678677633
llama_model_load: loading model from 'models/13B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 5120
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 40
llama_model_load: n_layer = 40
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 13824
llama_model_load: n_parts = 2
llama_model_load: ggml ctx size = 8559.49 MB
llama_model_load: memory_size =   800.00 MB, n_mem = 20480
llama_model_load: loading model part 1/2 from 'models/13B/ggml-model-q4_0.bin'
llama_model_load: ............................................. done
llama_model_load: model size =  3880.49 MB / num tensors = 363
llama_model_load: loading model part 2/2 from 'models/13B/ggml-model-q4_0.bin.1'
llama_model_load: ............................................. done
llama_model_load: model size =  3880.49 MB / num tensors = 363
main: prompt: 'I Have a Dream'
main: number of tokens in prompt = 5
     1 -> ''
 29902 -> 'I'
  6975 -> ' Have'
   263 -> ' a'
 16814 -> ' Dream'
sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000
I Have a Dream: A Handbook for Teachers and Students on Martin Luther King, Jr.
Culture is always changing and being influenced by the people around us who we can observe. Ways of thinking about culture are more important than which one you believe in because it could be dangerous if your way off believing in something that isn’t true but also that means there will be changes over time so everyone should learn these things when they start school
Added: Sun, April 29th 2018 [end of text]
main: mem per token = 22439492 bytes
main:     load time =  4974.55 ms
main:   sample time =   300.81 ms
main:  predict time = 90728.84 ms / 824.81 ms per token
main:    total time = 98585.49 ms
Running LLaMA 7B and 13B on a 64GB M2 MacBook Pro with llama.cpp
ggerganov/llama.cpp
  




    
 
   相关推荐
   
        MegEngine
      
    安卓机上 4G 内存跑 alpaca，欢迎试用轻量级 LLM 模型推理框架 InferLLM
 欢迎试用结构简单，易上手的轻量级 LLM 模型推理框架 InferLLM：提供一个比 llama.cpp 更简单更容易上手的本地部署框架，供大家学习和讨论 让 LLM 模型在本地或者端上部署成为可能，
  742
 
 
        中杯可乐多加冰
        掘金·金石计划
      
    Stable Diffusion复现——基于 Amazon SageMaker 搭建文本生成图像模型
 最近我在网上搜索发现，**亚马逊AWS提供的Amazon SageMaker机器学习平台**，为快速构建、训练和部署机器学习模型提供了许多便利的工具和服务，我也是深入体验了一番。下面分享这篇复现博客给
  213
 
 
      
    Huggingface上传自己的模型
 一起养成写作习惯！这是我参与「掘金日新计划 · 4 月更文挑战」的第1天，点击查看活动详情。 前言 Huggingface transformers是一个非常棒的NLP项目，它用pytorch实现了几
  3692
 




    
 
        ShowMeAI
      
    斯坦福NLP课程 | 第10讲 - NLP中的问答系统
 NLP课程第10讲介绍了问答系统动机与历史、SQuAD问答数据集、斯坦福注意力阅读模型、BiDAF模型、近期前沿模型等。
  2.2w
 
 
        HuggingFace
      
    ILLA Cloud- 调用 Hugging Face Inference Endpoints，开启大模型世界之门
 一个月前，我们 宣布了与 ILLA Cloud 与达成的合作，ILLA Cloud 正式支持集成 Hugging Face Hub 上的 AI 模型库和其他相关功能。 今天，我们为大家带来 ILLA 
  361
 
 
        MobotStone
      
    人工智能在教育行业的应用场景二：学生方向
 “我正在参加「掘金·启航计划」” 人工智能在教育领域的应用已经得到了广泛的关注和探索，它为学生提供了更加个性化和智能化的学习方式，这对于学生的学习和发展具有重要的意义。下面，本文将介绍一些人工智能在教
  255
 
 
      
     图解Transformer 
 持续创作，加速成长！这是我参与「掘金日新计划 · 6 月更文挑战」的第12天，点击查看活动详情 终于到transformer了，之前又是自注意力机制又是多头注意力机制，其实都是在为transforme
  1763
 




    
 
        dkjone
        Docker
      
    小米AX9000 安装Docker+homeassistant
 小米AX9000 安装Docker+homeassistant步骤和安装视频，docker compose 或者 Portainer 的 stack安装方式
  4533
 
 
      
    交通标识识别-opencv
 基于深度学习的交通标识识别，是一个涉及到计算机视觉和深度学习的综合性项目，其中，OpenCV 是一种广泛使用的计算机视觉库，它提供了许多图像处理和计算机视觉方面的功能。 基本思路 交通标志识别是一种将
  650
 
 
        ZackSock
        掘金·金石计划
      
    基于的Transformer文本情感分析（Keras版）
 从2017年起，RNN系列网络逐渐被一个叫Transformer的网络替代，发展到现在Transformer已经成为自然语言处理中主流的模型了，而且由Transformer引来了一股大语言模
  992
 
 
        兴科Sinco
        Python
      
    基于ASR-NLP的智能语音交互应用，具体实现过程是怎样的？
 随着人工智能技术的飞速发展，语音交互已经成为人机交互的重要形式。其中，语音识别(ASR)和自然语言处理(NLP)是实现智能语音交互的关键技术。 基于ASR-NLP的智能语音交互应用是指通过语音识别
  417
 
 
        HuggingFace
      
    使用 LoRA 和 Hugging Face 高效训练大语言模型
 在本文中，我们将展示如何使用 大语言模型低秩适配 (Low-Rank Adaptation of Large Language Models，LoRA) 技术在单 GPU 上微调 110 亿参数的 F
  635
 
 
      
    NLP极简入门指南，助你通过面试，踏入NLP的大门
 弱人工智能的时代已经到来，人们每天的生活都离不开算法所提供的服务。比如：资讯类APP是根据用户偏好做的个性化推荐；出行类APP背后是算法在做最优化调度；购物类APP是根据历史购买行为和商品间相似度进行推荐。这样的例子还有很多很多，就不一一列举了。 可见算法对于一家互联网公司有多…
  9020
 
 
        yunwei37
        GitHub
      
    提示词（prompt）工程指南（一）：提示介绍
 提示工程是一种相对较新的学科，专门用于开发和优化提示，以高效地使用语言模型（LM）来处理各种应用和研究主题。提示工程技能有助于更好地理解大型语言模型（LLMs）的能力和局限性。研究人员使用提示工程来提
  583
 
 
          
 
 
  
 
 友情链接：
 
        多张图片js全屏左右滑动
      
   私信