别被24G大显存迷惑！Tesla K80 24G 深度评测：一个让人心痛的百元级显卡，大模型部署全程翻车实录-阿南达文事网

别被24G大显存迷惑！Tesla K80 24G 深度评测：一个让人心痛的百元级显卡，大模型部署全程翻车实录

前几期 Tesla M40 24G 显卡测试系列反响热烈。众多读者纷纷留言：术哥，能否搞一张同样是 24G大显存的K80？

基于读者对 Tesla K80 24G 的持续关注，术哥我投入巨资（280元人民币）购入了这款与 M40 同样具备24G显存的显卡准备进行深度测试。测试结果令人崩溃：这款显卡在大模型部署场景下存在严重局限。

先说结论：对于计划部署本地大模型的用户，Tesla K80 并非理想选择，真的不建议购买！

主要限制（三大技术瓶颈）：

驱动支持受限：仅支持470系列驱动，导致无法使用新特性，特别是系统内存共享显存功能
工具兼容性差：与 Ollama、Xinference、GPUStack 等主流大模型部署工具存在严重兼容性问题
架构老旧：Compute Capability 仅为3.7，工具适配难度大，即使修改源码成功率也较低

测试环境

平台：PVE 虚拟化（一千多攒的 X99 服务器）上的虚拟机

系统：Ubuntu 22.04.5 LTS

CPU：8C（E5-2698B v3）

内存：32G（DDR3）

硬件说明

Tesla K80 24G是 2014年发布的，一款双GPU架构的产品，每个GPU拥有2496个CUDA核心，总计4992个CUDA核心。它的显存配置为24GB GDDR5，每个GPU分配12GB。

物理外观：

作为服役11年的老将，这块显卡确实历经沧桑。卖家信誓旦旦说"成色完美，有保护膜"，结果到手一看：PCB板橙色发暗（还美其名曰"古铜金"）。最让人揪心的是，相比 M40、P100的全覆盖金属背板设计，这款 K80 完全暴露了电子元件，让人看了直捏把汗。

请注意！在虚拟化平台和虚拟机系统里，我们看到的并不是一块 24G 显存的 K80，实际上是两块12G显存的 Tesla K80。

虚拟化平台表现： 虚拟机透传显卡时，需要选择两个 PCI 设备。
虚拟机系统内部查看： 虚拟机里看到的是两张 Tesla K80，nvidia-smi 看到的也是两张 12G的 Tesla K80。

代码语言：javascript代码运行次数：0运行复制

$ lspci 
00:08.0 Communication controller: Red Hat, Inc. Virtio console
00:10.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
00:11.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)

代码语言：javascript代码运行次数：0运行复制

$ nvidia-smi
Sun Mar 30 09:42:04 2025       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.256.02   Driver Version: 470.256.02   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:10.0 Off |                    0 |
| N/A   40C    P0    56W / 149W |      0MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           Off  | 00000000:00:11.0 Off |                    0 |
| N/A   34C    P0    70W / 149W |      0MiB / 11441MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

翻车记录

最新版的 550 驱动不支持，需要安装 470 版本，我选择了 NVIDIA-Linux-x86_64-470.256.02.run
最新版的 Ollama 不支持（Docker 部署）

代码语言：javascript代码运行次数：0运行复制

ollama  | time=2025-03-30T09:43:35.672Z level=INFO source=gpu.go:303 msg="[0] CUDA GPU is too old. Compute Capability detected: 3.7"
ollama  | time=2025-03-30T09:43:35.761Z level=INFO source=gpu.go:303 msg="[1] CUDA GPU is too old. Compute Capability detected: 3.7"
ollama  | time=2025-03-30T09:43:35.761Z level=INFO source=gpu.go:377 msg="no compatible GPUs were discovered"
ollama  | time=2025-03-30T09:43:35.761Z level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="31.3 GiB" available="30.6 GiB"

最新版的 Xinference 不支持

代码语言：javascript代码运行次数：0运行复制

xinference  | /usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py:129: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11040). Please update your GPU driver by downloading and installing a new version from the URL: .aspx Alternatively, go to:  to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
xinference  |   return torch._C._cuda_getDeviceCount() > 0
xinference  | No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
xinference  | INFO 03-30 17:51:53 __init__.py:190] Automatically detected platform cuda.
xinference  | INFO 03-30 17:52:02 __init__.py:190] Automatically detected platform cuda.
xinference  | INFO 03-30 17:52:11 __init__.py:190] Automatically detected platform cuda.
xinference  | INFO 03-30 17:52:20 __init__.py:190] Automatically detected platform cuda.
xinference  | 2025-03-30 17:52:26,055 xinference.core.supervisor 24 INFO     Xinference supervisor 0.0.0.0:29170 started
xinference  | 2025-03-30 17:52:26,079 xinference.core.worker 24 INFO     Starting metrics export server at 0.0.0.0:None
xinference  | 2025-03-30 17:52:26,082 xinference.core.worker 24 INFO     Checking metrics export server...
xinference  | 2025-03-30 17:52:28,110 xinference.core.worker 24 INFO     Metrics server is started at: :39115
xinference  | 2025-03-30 17:52:28,110 xinference.core.worker 24 INFO     Purge cache directory: /data/xinference/cache
xinference  | 2025-03-30 17:52:28,112 xinference.core.worker 24 INFO     Connected to supervisor as a fresh worker
xinference  | 2025-03-30 17:52:28,128 xinference.core.worker 24 INFO     Xinference worker 0.0.0.0:29170 started
xinference  | 2025-03-30 17:52:30,131 xinference.core.worker 24 ERROR    Report status got error.
xinference  | Traceback (most recent call last):
xinference  |   File "/usr/local/lib/python3.10/dist-packages/xinference/core/worker.py", line 1077, in report_status
xinference  |     status = await asyncio.to_thread(gather_node_info)
xinference  |   File "/usr/lib/python3.10/asyncio/threads.py", line 25, in to_thread
xinference  |     return await loop.run_in_executor(None, func_call)
xinference  | asyncio.exceptions.CancelledError
xinference  | 
xinference  | During handling of the above exception, another exception occurred:
xinference  | 
xinference  | Traceback (most recent call last):
xinference  |   File "/usr/local/lib/python3.10/dist-packages/xinference/core/worker.py", line 1076, in report_status
xinference  |     async with timeout(2):
xinference  |   File "/usr/local/lib/python3.10/dist-packages/async_timeout/__init__.py", line 141, in __aexit__
xinference  |     self._do_exit(exc_type)
xinference  |   File "/usr/local/lib/python3.10/dist-packages/async_timeout/__init__.py", line 228, in _do_exit
xinference  |     raise asyncio.TimeoutError
xinference  | asyncio.exceptions.TimeoutError

编译安装后，Ollama 能启动并识别显卡，但是 GPU 状态不对

按 GitHub 上的方法，自己编译了 Ollama，二进制运行时能识别 K80。但是实际使用时，我感觉没用到 GPU，细节请看视频。

Ollama 启动时已识别：

代码语言：javascript代码运行次数：0运行复制

time=2025-03-30T09:56:14.794Z level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-03-30T09:56:19.489Z level=INFO source=types.go:130 msg="inference compute" id=GPU-cca19740-ce86-5af0-dc55-512315901eec library=cuda variant=v11 compute=3.7 driver=11.4 name="Tesla K80" total="11.2 GiB" available="11.1 GiB"
time=2025-03-30T09:56:19.489Z level=INFO source=types.go:130 msg="inference compute" id=GPU-c2e75588-6887-2b45-66b1-83cefb0b9ee2 library=cuda variant=v11 compute=3.7 driver=11.4 name="Tesla K80" total="11.2 GiB" available="11.1 GiB"

运行时，显卡 GPU使用率为 0%，Processes 也没有任何进程。但是，Ollama ps 里看到的却是使用了 97%的 GPU。具体效果看下图：

Tesla-k80-ollama-run

K80 24G 卡装载不下Q4 量化的 DeepSeek-R1-32B

GPU 承载了 97%，CPU 承载了 3%，细节看上图。

代码语言：javascript代码运行次数：0运行复制

Every 2.0s: ./ollama ps                                               ubuntu: Sun Mar 30 10:06:42 2025

NAME               ID              SIZE     PROCESSOR         UNTIL
deepseek-r1:32b    38056bbcbb2d    23 GB    3%/97% CPU/GPU    4 minutes from now

问题测试

采用跟 Tesla M40 相同的问题测试 Tesla K80的速度（因为速度实在太慢了，只选了两个，看一下效果算了）。

为了还原真实感，所有测试视频保留原始时长，未做加速处理！

问题一：数学推理能力测试：9.9和9.11哪个大？

问题二：细节观察能力测试：DeepSeek中有几个字母e？

看完上面的初步评测，你们还想入手 Tesla K80 24G 大显卡？

初步评测总结： Tesla K80 在大模型部署上的表现令人失望。虽然拥有24G显存，但架构限制和驱动兼容性问题严重影响了其实用价值。对于预算有限的AI开发者来说，不建议将其作为首选方案，建议考虑其他更适合的方案。

未完待续！继续技术探索： 注意到社区中有用户成功部署了 DeepSeek R1，我们将继续深入研究可能的优化方案。如果你有相关经验，欢迎在评论区分享你的技术心得。后续我们也会带来更多针对性的解决方案探讨。

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。原始发表：2025-03-31，如有侵权请联系 cloudcommunity@tencent 删除虚拟机大模型部署部署测试系统

别被24G大显存迷惑！Tesla K80 24G 深度评测：一个让人心痛的百元级显卡，大模型部署全程翻车实录