Kimi Linear: An Expressive, Efficient Attention Architecture

143 points by blackcat201 10 hours ago

eXpl0it3r an hour ago

For the uninitiated, what's a "hybrid linear attention architecture"?

amoskvin an hour ago

any hardware recommendations? how much memory do we need to this?

uniqueuid a few seconds ago

You will effectively want a 48GB card or more for quantized versions, otherwise you won't have meaningful space left for the KV cache. Blackwell and above is generally a good idea to get faster hardware support for 4b (some recent models took some time to ship for older architectures, gpt-oss IIRC).

adt 6 hours ago

textembedding an hour ago

125 upvotes with 2 comments is kinda sus

muragekibicho an hour ago

Lots of model releases are like this. We can only upvote. We can't run the model on our personal computers. We can neither test their 'Efficient Attention' concept on our personal computers.
Honestly, it would take 24 hours just to download the 98 GB model if I wanted to try it out (assuming I had a card with 98 GB of ram).

nekofneko 4 hours ago

[flagged]

Ethan312 5 hours ago

[flagged]