你好,我在A100-80G机器上复现powerinfer,但是遇到了以下的错误,看起来貌似是没有检测出机器上的i显卡?
机器的cuda版本:12.4
(base) turbo@sma100-02:/home/turbo/projects/PowerInfer$ ./build/bin/main -m /home/turbo/models/ReluLLaMA-70B/llama-70b-relu.q4.powerinfer.gguf -n 128 -t 8 -p "Once upon a time" --ignore-eos Log start main: build = 1578 (906830b) main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu main: seed = 1713950721 llama_model_loader: loaded meta data with 23 key-value pairs and 883 tensors from /home/turbo/models/ReluLLaMA-70B/llama-70b-relu.q4.powerinfer.gguf (version GGUF V3 (latest)) llama_model_loader: - tensor 0: token_embd.weight q4_0 [ 8192, 32000, 1, 1 ] llama_model_loader: - tensor 1: blk.0.attn_q.weight q4_0 [ 8192, 8192, 1, 1 ] llama_model_loader: - tensor 2: blk.0.attn_k.weight q4_0 [ 8192, 1024, 1, 1 ] ... llama_model_loader: - kv 0: general.architecture str llama_model_loader: - kv 1: general.name str ... llama_model_loader: - type f32: 161 tensors llama_model_loader: - type q4_0: 722 tensors llama_model_load: PowerInfer model loaded. Sparse inference will be used. llm_load_vocab: special tokens definition check successful ( 259/32000 ). llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = SPM llm_load_print_meta: n_vocab = 32000 llm_load_print_meta: n_merges = 0 llm_load_print_meta: n_ctx_train = 2048 llm_load_print_meta: n_embd = 8192 llm_load_print_meta: n_head = 64 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_layer = 80 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_gqa = 8 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: n_ff = 28672 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_yarn_orig_ctx = 2048 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: model type = 70B llm_load_print_meta: model ftype = mostly Q4_0 llm_load_print_meta: model params = 74.98 B llm_load_print_meta: model size = 39.28 GiB (4.50 BPW) llm_load_print_meta: general.name = nvme llm_load_print_meta: BOS token = 1 '<s>' llm_load_print_meta: EOS token = 2 '</s>' llm_load_print_meta: UNK token = 0 '<unk>' llm_load_print_meta: PAD token = 0 '<unk>' llm_load_print_meta: LF token = 13 '<0x0A>' llm_load_print_meta: sparse_pred_threshold = 0.00 error loading model: CUDA is not loaded llama_load_model_from_file_with_context: failed to load model llama_init_from_gpt_params: error: failed to load model '/home/turbo/models/ReluLLaMA-70B/llama-70b-relu.q4.powerinfer.gguf' main: error: unable to load model
我在编译阶段的输出是:
(base) turbo@sma100-02:/home/turbo/projects/PowerInfer$ cmake -S . -B build -DLLAMA_CUBLAS=ON -- The C compiler identification is GNU 11.4.0 -- The CXX compiler identification is GNU 11.4.0 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/cc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/c++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Looking for pthread.h -- Looking for pthread.h - found -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE -- Found CUDAToolkit: /usr/local/cuda/include (found version "12.4.99") -- cuBLAS found -- The CUDA compiler identification is NVIDIA 11.5.119 -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Check for working CUDA compiler: /usr/bin/nvcc - skipped -- Detecting CUDA compile features -- Detecting CUDA compile features - done -- Using CUDA architectures: 52;61;70 GNU ld (GNU Binutils for Ubuntu) 2.38 -- CMAKE_SYSTEM_PROCESSOR: x86_64 -- x86 detected -- Configuring done -- Generating done -- Build files have been written to: /home/turbo/projects/PowerInfer/build
和
``