🚀 We just dropped 𝗦𝗺𝗼𝗹𝗗𝗼𝗰𝗹𝗶𝗻𝗴: a 𝟮𝟱𝟲𝗠… | Andrés Marafioti

AI Researcher @ Hugging Face | 9+ YOE in GenAI, MLOps, & Research | Pushing the Boundaries of Open-Source AI

🚀 We just dropped 𝗦𝗺𝗼𝗹𝗗𝗼𝗰𝗹𝗶𝗻𝗴: a 𝟮𝟱𝟲𝗠 𝗼𝗽𝗲𝗻-𝘀𝗼𝘂𝗿𝗰𝗲 𝘃𝗶𝘀𝗶𝗼𝗻 𝗟𝗠 for complete document OCR! ✨ 📄 𝗘𝗻𝗱-𝘁𝗼-𝗲𝗻𝗱 𝗱𝗼𝗰𝘂𝗺𝗲𝗻𝘁 𝗰𝗼𝗻𝘃𝗲𝗿𝘀𝗶𝗼𝗻—no more complex pipelines, just one tiny model ⚡ 𝗙𝗮𝘀𝘁 & 𝗹𝗶𝗴𝗵𝘁𝘄𝗲𝗶𝗴𝗵𝘁—processes a page in 0.35 sec on a consumer GPU with <500MB VRAM 🏆 𝗦𝗢𝗧𝗔 𝗮𝗰𝗰𝘂𝗿𝗮𝗰𝘆—outperforms models 27× larger in full-page transcription, layout detection, and code recognition 💾 𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝘁 𝗹𝗮𝗿𝗴𝗲-𝗯𝗮𝘁𝗰𝗵 𝗽𝗿𝗼𝗰𝗲𝘀𝘀𝗶𝗻𝗴—cheap and easy to run in-house 📊 𝗛𝗮𝗻𝗱𝗹𝗲𝘀 𝗮𝗹𝗹 𝗱𝗼𝗰𝘂𝗺𝗲𝗻𝘁 𝗲𝗹𝗲𝗺𝗲𝗻𝘁𝘀—tables, charts, code, equations, lists, and more 🔍 𝗙𝘂𝗹𝗹 𝘁𝗲𝗰𝗵 𝗿𝗲𝗽𝗼𝗿𝘁 𝗮𝘃𝗮𝗶𝗹𝗮𝗯𝗹𝗲 𝘄𝗶𝘁𝗵 𝘁𝗵𝗲 𝗿𝗲𝗹𝗲𝗮𝘀𝗲 This is another example that small, optimized models can compete with much larger systems—making AI more sustainable. Go to the comments for links to the model, paper, and demo! 🚀

147 Comments

Md Zurez T.

Web Engineer / LLM Enthusiast

How does it perform with RTL languages ?

Yaman S.

MBA Candidate at WBS | AI Product Manager | AI Agents & LLMs | Leading Safe Responsible AI | Driver SaaS Success with Digital Transformation, AI, Innovation & Customer-Centric Solutions

That's a wonderful addition, I dont think people agree but SLM's are the next big thing and if we can make a multimodal SLM, it'll be gamechanger

1 Reaction

Muhammad Bilal Shahid

I help companies become AI Powered | AI Engineer specializing in Gen AI | Building production grade AI Apps | LUMS

Can we use its outputs directly for rag or should we post process them and make them more transparent English language friendly? What do you suggest

Christopher Korokeyi

Product leader | Talent developer

This sounds amazing! Congrats on the incredible achievement!!!

1 Reaction

Richard K.

Hoping smoldocling means docling lib has better handling of footnotes now... will test today

1 Reaction

Adjay Sagar S

Data Professional | BITS Pilani (Distinction Divison)

Andrés Marafioti any idea to launch something for information extraction/NER on text data?

Greggory Elias

Build AI Agent Workforces with 1 Click | CEO of AgentsforHire.ai | AI Thought Leader | Founder | Subscribe to my weekly newsletter (5k subs) for insights on how AI news & trends affect you

Would love to see a benchmark comparison with speed vs mistral, megaparse and llamaparse. Completely agree a smaller specialized model for ocr is the way to go. OCR f1 and recall above 0.95 is fantastic but love to see metrics for tables and charts. Maybe use the megaparse benchmark dataset and compare?

1 Reaction

Ed Mwanza

Director of AI R&D at RENFROE | PhD Candidate in Computer Science (AI/ML) | Driving Insurtech Innovations for AI-Powered Insurance Solutions & Digital Transformation

This is awesome!!!

1 Reaction

Efthymios Efthymiadis

Data Scientist/AI engineer | EU Projects Coordinator

Hello, is there any repository to work with it?

1 Reaction

Ramandeep Kaur

Team Lead Software Engineer

OCR, but make it fast and tiny. Love it!

1 Reaction

See more comments

To view or add a comment, sign in

Andrés Marafioti’s Post

Explore topics