🚀 We just dropped 𝗦𝗺𝗼𝗹𝗗𝗼𝗰𝗹𝗶𝗻𝗴: a 𝟮𝟱𝟲𝗠 𝗼𝗽𝗲𝗻-𝘀𝗼𝘂𝗿𝗰𝗲 𝘃𝗶𝘀𝗶𝗼𝗻 𝗟𝗠 for complete document OCR! ✨ 📄 𝗘𝗻𝗱-𝘁𝗼-𝗲𝗻𝗱 𝗱𝗼𝗰𝘂𝗺𝗲𝗻𝘁 𝗰𝗼𝗻𝘃𝗲𝗿𝘀𝗶𝗼𝗻—no more complex pipelines, just one tiny model ⚡ 𝗙𝗮𝘀𝘁 & 𝗹𝗶𝗴𝗵𝘁𝘄𝗲𝗶𝗴𝗵𝘁—processes a page in 0.35 sec on a consumer GPU with <500MB VRAM 🏆 𝗦𝗢𝗧𝗔 𝗮𝗰𝗰𝘂𝗿𝗮𝗰𝘆—outperforms models 27× larger in full-page transcription, layout detection, and code recognition 💾 𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝘁 𝗹𝗮𝗿𝗴𝗲-𝗯𝗮𝘁𝗰𝗵 𝗽𝗿𝗼𝗰𝗲𝘀𝘀𝗶𝗻𝗴—cheap and easy to run in-house 📊 𝗛𝗮𝗻𝗱𝗹𝗲𝘀 𝗮𝗹𝗹 𝗱𝗼𝗰𝘂𝗺𝗲𝗻𝘁 𝗲𝗹𝗲𝗺𝗲𝗻𝘁𝘀—tables, charts, code, equations, lists, and more 🔍 𝗙𝘂𝗹𝗹 𝘁𝗲𝗰𝗵 𝗿𝗲𝗽𝗼𝗿𝘁 𝗮𝘃𝗮𝗶𝗹𝗮𝗯𝗹𝗲 𝘄𝗶𝘁𝗵 𝘁𝗵𝗲 𝗿𝗲𝗹𝗲𝗮𝘀𝗲 This is another example that small, optimized models can compete with much larger systems—making AI more sustainable. Go to the comments for links to the model, paper, and demo! 🚀
That's a wonderful addition, I dont think people agree but SLM's are the next big thing and if we can make a multimodal SLM, it'll be gamechanger
Can we use its outputs directly for rag or should we post process them and make them more transparent English language friendly? What do you suggest
This sounds amazing! Congrats on the incredible achievement!!!
Hoping smoldocling means docling lib has better handling of footnotes now... will test today
Andrés Marafioti any idea to launch something for information extraction/NER on text data?
Would love to see a benchmark comparison with speed vs mistral, megaparse and llamaparse. Completely agree a smaller specialized model for ocr is the way to go. OCR f1 and recall above 0.95 is fantastic but love to see metrics for tables and charts. Maybe use the megaparse benchmark dataset and compare?
This is awesome!!!
Hello, is there any repository to work with it?
OCR, but make it fast and tiny. Love it!
Web Engineer / LLM Enthusiast
4dHow does it perform with RTL languages ?