Enfin une mémoire à long terme pour l’IA : MAMBA, SSM, S4, S6 & Transformers

Merci ! Partagez avec vos amis !

Vous avez aimé cette vidéo, merci de votre vote !

Ajoutées 1 année by admin

10 Vues

#mamba #transformer #rnn

Dans cette vidéo j'explore le fonctionnement de Mamba et des SSM (S4, S6) et ce que cela signifie sur l'évolution des architectures et particulièrement de la mémorisation des informations comparé à un Transformers.

00:00 Introduction
01:00 Transformer
03:00 RNN
04:40 Transformer, RNN & CNN
08:30 SSM State Space Model, S4
21:15 Mamba, S6
31:06 Réflexions

Sources
- Mamba implementation https://github.com/state-spaces/mamba/tree/2a3704fd47ba817b415627b06fd796b971fdc137
- Mamba: Linear-Time Sequence Modeling with Selective State Spaces https://arxiv.org/abs/2312.00752
- Efficiently Modeling Long Sequences with Structured State Spaces (S4) https://arxiv.org/pdf/2111.00396.pdf
- FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness https://arxiv.org/pdf/2205.14135.pdf
- Long range arena : https://arxiv.org/pdf/2011.04006.pdf
- Repeat After Me: Transformers are Better than State Space Models at Copying Transformers are Better than State Space Models at Copying https://arxiv.org/pdf/2402.01032.pdf
- Ring Attention with Blockwise Transformers for Near-Infinite Context https://arxiv.org/pdf/2310.01889.pdf
- RWKV: Reinventing RNNs for the Transformer Era https://arxiv.org/abs/2305.13048

About me:
Site : http://visualbehavior.ai/
Github : https://github.com/thibo73800
Twitter : https://twitter.com/ThiboNeveu
Instagram : https://www.instagram.com/thibaultneveu_ai/
TikTok : https://www.tiktok.com/@thibaultneveu
SnapChat : https://www.snapchat.com/add/thiboneveu.ai
Medium : https://medium.com/@thibo73800