添加链接
link之家
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
# clone repo and install dependences git clone https://github.com/ggerganov/llama.cpp cd llama.cpp python -m pip install torch numpy sentencepiece # download 7B model mkdir -p models/7B/ wget -P models/7B/ https://huggingface.co/nyanko7/LLaMA-7B/resolve/main/consolidated.00.pth wget -P models/7B/ https://huggingface.co/nyanko7/LLaMA-7B/raw/main/params.json wget -P models/7B/ https://huggingface.co/nyanko7/LLaMA-7B/raw/main/checklist.chk wget -P models/ https://huggingface.co/nyanko7/LLaMA-7B/resolve/main/tokenizer.model # converts the model to "ggml FP16 format" python convert-pth-to-ggml.py models/7B/ 1 # quantizes the model to 4-bits ./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2 # enjoy ./main -m ./models/7B/ggml-model-q4_0.bin \ -t 8 \ -n 128 \ -p 'I Have a Dream'

目前已知的模型有:

  • 7B: 1个模型文件,占用空间 13GB,转换后总占用空间 30GB
  • 13B: 2个模型文件,占用空间 25GB,转换后总占用空间 60GB
  • 30B: 4个模型文件,占用空间 61GB,转换后总占用空间 120GB
  • 65B: 8个模型文件,占用空间 122GB,转换后总占用空间 240GB
  • 每个模型的内存占用空间大小约为 4GB ,根据自己机器内存大小选择合适的模型

    Meta并没有公开模型的hash值,所以请自行判断是否要运行 目前已知的泄漏地址有以下几个:

  • 官方库的PR
  • 有人在官方库上 故意不小心 提交了模型的磁力链接

    magnet:?xt=urn:btih:ZXXDAUWYLRUXXBHUYEMS6Q5CE5WA3LVA&dn=LLaMA
    
  • llma-dl 库
  • new bing找到的库,里面用的好像是作者自己的API接口

    curl -o- https://raw.githubusercontent.com/shawwn/llama-dl/56f50b96072f42fb2520b1ad5a1d6ef30351f23c/llama.sh | bash
    

    或者通过磁力链接

    magnet:?xt=urn:btih:b8287ebfa04f879b048d4d4404108cf3e8014352&dn=LLaMA&tr=udp%3a%2f%2ftracker.opentrackr.org%3a1337%2fannounce
    
  • huggingface.co
  • 目前找到的只有7B 和 65B的模型

    huggingface.co/nyanko7/LLa…

    huggingface.co/datasets/ny…

    软/硬件依赖

    笔者机器硬件是 Apple M1 8-core 16GB RAM

    系统版本是 12.5.1

    clang 版本如下

    ❯ c++ -v
    Apple clang version 14.0.0 (clang-1400.0.29.102)
    Target: arm64-apple-darwin21.6.0
    Thread model: posix
    InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
    

    Python

    Python 目前是基于3.10 版本

    如果没有对应的python版本,可以通过 pipenv 或者 conda 创建一个虚拟环境出来

    pipenv shell --python 3.10
    
    conda create -n llama python=3.10
    conda activate llama
    
    pip install torch numpy sentencepiece
    
    git clone https://github.com/ggerganov/llama.cpp
    cd llama.cpp
    

    编译出 mainquantize

    确保模型已经下载到对应的文件夹内

    下面以 7B 模型举例子

    ls ./models
    tokenizer.model
    

    将模型转换为 ggml FP16格式

    python convert-pth-to-ggml.py models/7B/ 1
    

    这一步会生成一个13GB的 models/7B/ggml-model-f16.bin 文件

    下一步将模型量化为4-bit

    ./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2
    

    如果你的模型数量有多个,需要分批次来处理

    比如13B的两个模型文件

    ./quantize ./models/13B/ggml-model-f16.bin   ./models/13B/ggml-model-q4_0.bin 2
    ./quantize ./models/13B/ggml-model-f16.bin.1 ./models/13B/ggml-model-q4_0.bin.1 2
    

    享受AI的时刻

    笔者用的是13B模型,-t 是线程数量,-n 是token数量 , -p 是你输入的内容

    ❯ ./main -m models/13B/ggml-model-q4_0.bin -t 8 -n 409600 -p 'I Have a Dream'
    main: seed = 1678677633
    llama_model_load: loading model from 'models/13B/ggml-model-q4_0.bin' - please wait ...
    llama_model_load: n_vocab = 32000
    llama_model_load: n_ctx   = 512
    llama_model_load: n_embd  = 5120
    llama_model_load: n_mult  = 256
    llama_model_load: n_head  = 40
    llama_model_load: n_layer = 40
    llama_model_load: n_rot   = 128
    llama_model_load: f16     = 2
    llama_model_load: n_ff    = 13824
    llama_model_load: n_parts = 2
    llama_model_load: ggml ctx size = 8559.49 MB
    llama_model_load: memory_size =   800.00 MB, n_mem = 20480
    llama_model_load: loading model part 1/2 from 'models/13B/ggml-model-q4_0.bin'
    llama_model_load: ............................................. done
    llama_model_load: model size =  3880.49 MB / num tensors = 363
    llama_model_load: loading model part 2/2 from 'models/13B/ggml-model-q4_0.bin.1'
    llama_model_load: ............................................. done
    llama_model_load: model size =  3880.49 MB / num tensors = 363
    main: prompt: 'I Have a Dream'
    main: number of tokens in prompt = 5
         1 -> ''
     29902 -> 'I'
      6975 -> ' Have'
       263 -> ' a'
     16814 -> ' Dream'
    sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000
    I Have a Dream: A Handbook for Teachers and Students on Martin Luther King, Jr.
    Culture is always changing and being influenced by the people around us who we can observe. Ways of thinking about culture are more important than which one you believe in because it could be dangerous if your way off believing in something that isn’t true but also that means there will be changes over time so everyone should learn these things when they start school
    Added: Sun, April 29th 2018 [end of text]
    main: mem per token = 22439492 bytes
    main:     load time =  4974.55 ms
    main:   sample time =   300.81 ms
    main:  predict time = 90728.84 ms / 824.81 ms per token
    main:    total time = 98585.49 ms
    

    Running LLaMA 7B and 13B on a 64GB M2 MacBook Pro with llama.cpp

    ggerganov/llama.cpp

  • 私信