Santosh Sawant

Santosh Sawant · 2025-01-16T15:05:15.057Z

MiniMax-01: Scaling Foundation Models with Lightning Attention Recently, Long context LLMs have been pinnacle in further advancement of generative ai in various fields. Now researchers have introduced the MiniMax-01 series of long context LLMs, including MiniMax-Text-01 and MiniMax-VL-01. MiniMax-Text-01 is a powerful language model boasting 456 billion total parameters, with 45.9 billion activated per token. To unlock its long-context capabilities, it adopts a hybrid architecture integrating Lightning Attention, Softmax Attention, and Mixture-of-Experts (MoE). Leveraging advanced parallel strategies like Linear Attention Sequence Parallelism Plus (LASP+), varlen ring attention, and Expert Tensor Parallel (ETP), its training context length extends to 1 million tokens, and it can handle up to 4 million tokens during inference. Building on MiniMax-Text-01's prowess, researchers have also developed MiniMax-VL-01 for enhanced visual capabilities. It uses the “ViT-MLP-LLM” framework common in multimodal LLMs. It is initialized and trained using three key components: a 303-million-parameter Vision Transformer (ViT) for visual encoding, a randomly initialized two-layer MLP projector for image adaptation, and MiniMax-Text-01 as the base LLM. This model features a dynamic resolution mechanism. Input images are resized according to a pre-set grid, with resolutions ranging from 336×336 to 2016×2016, while maintaining a 336×336 thumbnail. The resized images are split into non - overlapping patches of the same size. These patches and the thumbnail are encoded separately and then combined to form a full image representation. As a result, MiniMax-VL-01 has achieved top-level performance on multimodal leaderboards, demonstrating its edge in complex multimodal tasks. Experiments on both standard and in-house benchmarks show that MiniMax models match the performance of state-of-the-art models like GPT-4o and Claude-3.5-Sonnet while offering a 20-32 times longer context window. Paper : https://lnkd.in/gVDkUeXd

Bengaluru, Karnataka, India
8K followers 500+ connections

View mutual connections with Santosh

Welcome back

Email or phone

Password

Forgot password?

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

Join to follow

Philips

Visvesvaraya Technological University

About

LLM Architect learning to innovate, optimize, and scale the next generation of large…

Articles by Santosh

MLOps Platform for Edge AI

Aug 13, 2022

MLOps Platform for Edge AI

Before we start on Edge MLOps let us do a walkthrough about what is MLOps ? and what it is not ?. Demystifying MLOps…

1 Comment
Content-Based Feature Extraction and Image Retrieval using Celeb-A dataset

Jul 29, 2021

Content-Based Feature Extraction and Image Retrieval using Celeb-A dataset

Facial Attribute prediction is a Computer Vision (CV) task about deducing the set of attributes belonging to a face…
DETR - End to End object DEtection with TRansformer

Jun 14, 2021

DETR - End to End object DEtection with TRansformer

Attention is all you need, paper for Transformers, changed the state of NLP and has achieved great heights. Though…

See all articles

Contributions

How can you use software architecture to advance your career?

Sharing your knowledge and experience is one of the best way to advance your carees. I recall countless time people have reachout to me to review their design. During this exchange I found that it not only help me to improve my exisiting skills but also stay rooted and experiement various grounding principle on which a typical software architecture is usually based on.

Santosh Sawant contributed 1 year ago Upvote

Activity

Some meetings are just meant to happen! 😃. Walked into Joe & The Juice Cafe and boom — there’s Roy! (Anandamoy Roychowdhary). After all these years,…

Some meetings are just meant to happen! 😃. Walked into Joe & The Juice Cafe and boom — there’s Roy! (Anandamoy Roychowdhary). After all these years,…

Liked by Santosh Sawant
Day 28: Quantum Wavefunction Evolution with CUDA Today, I implemented a CUDA-based simulation of the time evolution of a quantum wavefunction using…

Day 28: Quantum Wavefunction Evolution with CUDA Today, I implemented a CUDA-based simulation of the time evolution of a quantum wavefunction using…

Liked by Santosh Sawant
🚀 Just dropped a new tutorial: Build Your Own Medical Mini-DeepSeek R1 with Reinforcement Learning — for under $3 on a T4 GPU. The RL finetuned…

🚀 Just dropped a new tutorial: Build Your Own Medical Mini-DeepSeek R1 with Reinforcement Learning — for under $3 on a T4 GPU. The RL finetuned…

Liked by Santosh Sawant

Join now to see all activity

Experience

Philips

Bangalore Urban, Karnataka, India
-

Bengaluru, Karnataka, India
-

Bengaluru, Karnataka, India
-

Bengaluru, Karnataka, India
-

Bangalore, India
-

Bangalore
-

Bangalore

Education

Visvesvaraya Technological University

2005 - 2007

Activities and Societies: Recipient of Merit Scholarship from VTU for academic performance.
2003 - 2005

Activities and Societies: Recipient of Merit Scholarship from BVBCET for academic performance.

Licenses & Certifications

TensorFlow Developer Certificate

Google

Issued Feb 2021 Expires Feb 2024

Credential ID 29223031

See credential
CKA: Certified Kubernetes Administrator

The Linux Foundation

Issued Dec 2020 Expires Dec 2023

See credential
CKAD: Certified Kubernetes Application Developer

The Linux Foundation

Issued Nov 2020 Expires Nov 2023

Credential ID LF-f053w8yx1o

See credential

Recommendations received

2 people have recommended Santosh

Join now to view

More activity by Santosh

DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails The rapid advancement of large language models (LLMs) has increased the…

DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails The rapid advancement of large language models (LLMs) has increased the…

Shared by Santosh Sawant
How do you currently deploy open LLMs? With vLLM, with Kubernetes? vLLM production-stack is an new open-source batteries included reference…

How do you currently deploy open LLMs? With vLLM, with Kubernetes? vLLM production-stack is an new open-source batteries included reference…

Liked by Santosh Sawant
Let’s dive into Group Relative Policy Optimization (GRPO) the loss function used in the RL training process by DeepSeek. 📔 Background Info GRPO is…

Let’s dive into Group Relative Policy Optimization (GRPO) the loss function used in the RL training process by DeepSeek. 📔 Background Info GRPO is…

Liked by Santosh Sawant
To help developers securely experiment and build their own specialized agents, the 671-billion-parameter DeepSeek-R1 model is now available as an…

To help developers securely experiment and build their own specialized agents, the 671-billion-parameter DeepSeek-R1 model is now available as an…

Liked by Santosh Sawant
DeepSeek-R1: Incentivizing Reasoning Capability in Large Language Models via Reinforcement Learning A typical training process for LLMs consists of…

DeepSeek-R1: Incentivizing Reasoning Capability in Large Language Models via Reinforcement Learning A typical training process for LLMs consists of…

Shared by Santosh Sawant
C++ remains one of the top choices of programming languages for mission-critical systems and software that interface with hardware. There’s already…

C++ remains one of the top choices of programming languages for mission-critical systems and software that interface with hardware. There’s already…

Liked by Santosh Sawant
Mind Evolution: Evolving Deeper LLM Thinking Recently Google have released an evolutionary search strategy for scaling inference time compute in…

Mind Evolution: Evolving Deeper LLM Thinking Recently Google have released an evolutionary search strategy for scaling inference time compute in…

Shared by Santosh Sawant
Don't just study how diffusion models work - train one! Sony Research released Micro Diffusion, a minimal implementation that allows training a…

Don't just study how diffusion models work - train one! Sony Research released Micro Diffusion, a minimal implementation that allows training a…

Liked by Santosh Sawant
MiniMax-01: Scaling Foundation Models with Lightning Attention Recently, Long context LLMs have been pinnacle in further advancement of generative…

MiniMax-01: Scaling Foundation Models with Lightning Attention Recently, Long context LLMs have been pinnacle in further advancement of generative…

Shared by Santosh Sawant
Nobody will hire you to code without end goals. Solve problems: 1) Take a LM, compress its KV-Cache (choose technique). Try to retain its…

Nobody will hire you to code without end goals. Solve problems: 1) Take a LM, compress its KV-Cache (choose technique). Try to retain its…

Liked by Santosh Sawant
How can AI make reading more enjoyable? What would an AI-powered reading experience look like? Over the holidays, I prototyped aireadingclub.com to…

How can AI make reading more enjoyable? What would an AI-powered reading experience look like? Over the holidays, I prototyped aireadingclub.com to…

Liked by Santosh Sawant
Separator tokens like the new line and period character seem to be quite important to LLMs. SepLLM uses this finding to create a special attention…

Separator tokens like the new line and period character seem to be quite important to LLMs. SepLLM uses this finding to create a special attention…

Liked by Santosh Sawant
Here is everything that happened in AI Agents this week 🧵 (save for later) 1/ Alex Reibman shared his vision for the modern AI Agent…

Here is everything that happened in AI Agents this week 🧵 (save for later) 1/ Alex Reibman shared his vision for the modern AI Agent…

Liked by Santosh Sawant
Many people asking me how to start learning CUDA and Triton. Honestly, this is the only resource you need. I teach CUDA and Triton from scratch…

Many people asking me how to start learning CUDA and Triton. Honestly, this is the only resource you need. I teach CUDA and Triton from scratch…

Liked by Santosh Sawant
Excited to share insights from Walmart 's groundbreaking semantic search system that revolutionizes e-commerce product discovery! The team at…

Excited to share insights from Walmart 's groundbreaking semantic search system that revolutionizes e-commerce product discovery! The team at…

Liked by Santosh Sawant
One of my favorite lectures on ML/LLMs in 2024: Hyung Won Chung from OpenAI - "Don't teach. Incentivize." - https://lnkd.in/eANKf4ND

One of my favorite lectures on ML/LLMs in 2024: Hyung Won Chung from OpenAI - "Don't teach. Incentivize." - https://lnkd.in/eANKf4ND

Liked by Santosh Sawant
DeepSeek v3 is the most powerful open source AI model to be released! I read through their technical report / paper, and found some cool things: 1…

DeepSeek v3 is the most powerful open source AI model to be released! I read through their technical report / paper, and found some cool things: 1…

Liked by Santosh Sawant

View Santosh’s full profile

See who you know in common
Get introduced
Contact Santosh directly

Join to view full profile

Other similar profiles

Explore more posts

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Santosh Sawant in India

698 others named Santosh Sawant in India are on LinkedIn

See others named Santosh Sawant

Add new skills with these courses

See all courses

Santosh Sawant

Bengaluru, Karnataka, India 8K followers 500+ connections

About

Articles by Santosh

MLOps Platform for Edge AI

Content-Based Feature Extraction and Image Retrieval using Celeb-A dataset

DETR - End to End object DEtection with TRansformer

Contributions

Activity

Some meetings are just meant to happen! 😃. Walked into Joe & The Juice Cafe and boom — there’s Roy! (Anandamoy Roychowdhary). After all these years,…

Liked by Santosh Sawant

Day 28: Quantum Wavefunction Evolution with CUDA Today, I implemented a CUDA-based simulation of the time evolution of a quantum wavefunction using…

Liked by Santosh Sawant

🚀 Just dropped a new tutorial: Build Your Own Medical Mini-DeepSeek R1 with Reinforcement Learning — for under $3 on a T4 GPU. The RL finetuned…

Liked by Santosh Sawant

Experience

-

-

-

-

-

-

Education

Licenses & Certifications

Recommendations received

Ankur Mathur

Nagendra Jammi

More activity by Santosh

DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails The rapid advancement of large language models (LLMs) has increased the…

Shared by Santosh Sawant

How do you currently deploy open LLMs? With vLLM, with Kubernetes? vLLM production-stack is an new open-source batteries included reference…

Liked by Santosh Sawant

Let’s dive into Group Relative Policy Optimization (GRPO) the loss function used in the RL training process by DeepSeek. 📔 Background Info GRPO is…

Liked by Santosh Sawant

To help developers securely experiment and build their own specialized agents, the 671-billion-parameter DeepSeek-R1 model is now available as an…

Liked by Santosh Sawant

DeepSeek-R1: Incentivizing Reasoning Capability in Large Language Models via Reinforcement Learning A typical training process for LLMs consists of…

Shared by Santosh Sawant

C++ remains one of the top choices of programming languages for mission-critical systems and software that interface with hardware. There’s already…

Liked by Santosh Sawant

Mind Evolution: Evolving Deeper LLM Thinking Recently Google have released an evolutionary search strategy for scaling inference time compute in…

Shared by Santosh Sawant

Don't just study how diffusion models work - train one! Sony Research released Micro Diffusion, a minimal implementation that allows training a…

Liked by Santosh Sawant

MiniMax-01: Scaling Foundation Models with Lightning Attention Recently, Long context LLMs have been pinnacle in further advancement of generative…

Shared by Santosh Sawant

Nobody will hire you to code without end goals. Solve problems: 1) Take a LM, compress its KV-Cache (choose technique). Try to retain its…

Liked by Santosh Sawant

How can AI make reading more enjoyable? What would an AI-powered reading experience look like? Over the holidays, I prototyped aireadingclub.com to…

Liked by Santosh Sawant

Separator tokens like the new line and period character seem to be quite important to LLMs. SepLLM uses this finding to create a special attention…

Liked by Santosh Sawant

Here is everything that happened in AI Agents this week 🧵 (save for later) 1/ Alex Reibman shared his vision for the modern AI Agent…

Liked by Santosh Sawant

Many people asking me how to start learning CUDA and Triton. Honestly, this is the only resource you need. I teach CUDA and Triton from scratch…

Liked by Santosh Sawant

Excited to share insights from Walmart 's groundbreaking semantic search system that revolutionizes e-commerce product discovery! The team at…

Liked by Santosh Sawant

One of my favorite lectures on ML/LLMs in 2024: Hyung Won Chung from OpenAI - "Don't teach. Incentivize." - https://lnkd.in/eANKf4ND

Liked by Santosh Sawant

DeepSeek v3 is the most powerful open source AI model to be released! I read through their technical report / paper, and found some cool things: 1…

Liked by Santosh Sawant

View Santosh’s full profile

Other similar profiles

Rohit Bhat

Kiran Sk

Shiva Krishna

Praveen Kumar Kanakam

Madhan Kumaravelu

Selvanathan Ragunathan

Anandh Ravindran

Madhuveera Chaitanya Kumar Kakarla

Rajeev Harikar

Niket Goel

Aahat Sachdeva

SUSHMA KARANTH

shachi bhavsar

Tanmay Agrawal

Siva Reddy

Vanya Jaiswal

Bengaluru, Karnataka, India
8K followers 500+ connections