#mamba #transformer #rnn
Dans cette vidéo j'explore le fonctionnement de Mamba et des SSM (S4, S6) et ce que cela signifie sur l'évolution des architectures et particulièrement de la mémorisation des informations comparé à un Transformers.
00:00 Introduction
01:00 Transformer
03:00 RNN
04:40 Transformer, RNN & CNN
08:30 SSM State Space Model, S4
21:15 Mamba, S6
31:06 Réflexions
Sources
- Mamba implementation https://github.com/state-spaces/mamba/tree/2a3704fd47ba817b415627b06fd796b971fdc137
- Mamba: Linear-Time Sequence Modeling with Selective State Spaces https://arxiv.org/abs/2312.00752
- Efficiently Modeling Long Sequences with Structured State Spaces (S4) https://arxiv.org/pdf/2111.00396.pdf
- FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness https://arxiv.org/pdf/2205.14135.pdf
- Long range arena : https://arxiv.org/pdf/2011.04006.pdf
- Repeat After Me: Transformers are Better than State Space Models at Copying Transformers are Better than State Space Models at Copying https://arxiv.org/pdf/2402.01032.pdf
- Ring Attention with Blockwise Transformers for Near-Infinite Context https://arxiv.org/pdf/2310.01889.pdf
- RWKV: Reinventing RNNs for the Transformer Era https://arxiv.org/abs/2305.13048
About me:
Site : http://visualbehavior.ai/
Github : https://github.com/thibo73800
Twitter : https://twitter.com/ThiboNeveu
Instagram : https://www.instagram.com/thibaultneveu_ai/
TikTok : https://www.tiktok.com/@thibaultneveu
SnapChat : https://www.snapchat.com/add/thiboneveu.ai
Medium : https://medium.com/@thibo73800
Dans cette vidéo j'explore le fonctionnement de Mamba et des SSM (S4, S6) et ce que cela signifie sur l'évolution des architectures et particulièrement de la mémorisation des informations comparé à un Transformers.
00:00 Introduction
01:00 Transformer
03:00 RNN
04:40 Transformer, RNN & CNN
08:30 SSM State Space Model, S4
21:15 Mamba, S6
31:06 Réflexions
Sources
- Mamba implementation https://github.com/state-spaces/mamba/tree/2a3704fd47ba817b415627b06fd796b971fdc137
- Mamba: Linear-Time Sequence Modeling with Selective State Spaces https://arxiv.org/abs/2312.00752
- Efficiently Modeling Long Sequences with Structured State Spaces (S4) https://arxiv.org/pdf/2111.00396.pdf
- FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness https://arxiv.org/pdf/2205.14135.pdf
- Long range arena : https://arxiv.org/pdf/2011.04006.pdf
- Repeat After Me: Transformers are Better than State Space Models at Copying Transformers are Better than State Space Models at Copying https://arxiv.org/pdf/2402.01032.pdf
- Ring Attention with Blockwise Transformers for Near-Infinite Context https://arxiv.org/pdf/2310.01889.pdf
- RWKV: Reinventing RNNs for the Transformer Era https://arxiv.org/abs/2305.13048
About me:
Site : http://visualbehavior.ai/
Github : https://github.com/thibo73800
Twitter : https://twitter.com/ThiboNeveu
Instagram : https://www.instagram.com/thibaultneveu_ai/
TikTok : https://www.tiktok.com/@thibaultneveu
SnapChat : https://www.snapchat.com/add/thiboneveu.ai
Medium : https://medium.com/@thibo73800
- Catégories
- Intelligence Artificielle
- Mots-clés
- deeplearning, machine learning, formation
Commentaires