Yeah, I agree that it does help for some approaches that do require a lot of VRAM. If you’re not on a tight schedule, this type of thing might be good enough to just get a model running.
I don’t personally do anything that large; even the diffusion methods I’ve developed were able to fit on a 24GB card, but I know with the hype in multimodal stuff, VRAM needs can be pretty high.
I suspect this machine will be popular with hobbyists for running really large open weight LLMs.
I suspect this machine will be popular with hobbyists for running really large open weight LLMs.
Yeah.
It will probably spur a lot of development! I’ve seen a lot of bs=1 speedup “hacks” shelved because GPUs are fast enough, and memory efficiency is the real bottleneck. But suddenly all these devs are going to have a 48GB-96GB pool that’s significantly slower than a 3090. And multimodal becomes much more viable.
Not to speak of better ROCM compatibility. AMD should have done this ages ago…
Yeah, I agree that it does help for some approaches that do require a lot of VRAM. If you’re not on a tight schedule, this type of thing might be good enough to just get a model running.
I don’t personally do anything that large; even the diffusion methods I’ve developed were able to fit on a 24GB card, but I know with the hype in multimodal stuff, VRAM needs can be pretty high.
I suspect this machine will be popular with hobbyists for running really large open weight LLMs.
Yeah.
It will probably spur a lot of development! I’ve seen a lot of bs=1 speedup “hacks” shelved because GPUs are fast enough, and memory efficiency is the real bottleneck. But suddenly all these devs are going to have a 48GB-96GB pool that’s significantly slower than a 3090. And multimodal becomes much more viable.
Not to speak of better ROCM compatibility. AMD should have done this ages ago…