Max Anfilofyev’s Post

Chief CareBot | Scaling Patient Care 8x with AI | Chief Product Officer @ DR | Connect to scale with AI

8mo Edited

Study published in Nature doesn't conclude that LLMs are not ready for clinical decision-making https://lnkd.in/ekRZGyjf The study highlights the limitations of some open-source large language models (LLMs) in clinical decision-making. It’s not surprising they struggle, as they underperform humans. Exciting times ahead: proprietary models like GPT-4, which outperform humans on medical information understanding benchmarks, should be tested next! Chart with MedQA MSLE accuracy. The top evaluated model Meditron significantly underperforms human experts whereas as GPT-4 overperforms https://lnkd.in/edM8dB5H

To view or add a comment, sign in

More Relevant Posts

Philippe Küng

AI/ML Engineer, GenAI Cloud, On-Prem & Web3 Infrastructure 🤖 with ❤️for 🧑🤝🧑
7mo
Report this post
Excellent work on medical ⚕ data retrieval in particular situations. Use a cutting-edge method (RAG) combined with sources of dependable, interlinked, peer-reviewed studies. This enables domain-specific semantic search, chunking, and context capture to improve traditional approaches 📅 In this article, Jund Wu et al. explore the intriguing world of medicine. A massive potential to explore. At the Clinic of AI, we have used the same methodology for agriculture and wrapped it with a simple UX chatbot to allow farmers 👩🔧 of different plants ☘ to get the latest access to studies and information in real-time. ℹ This project was made with one of the latest additions to the team: Mert Oğul, and was an impressive feat as one of his first dives into the world of RAG-based/Arxvix interconnected bots. - we might be publishing a workshop on its conceptions, stay tuned. Daniel Küng #agriculture #RAGbot #semanticsearch #Information #chunking #artificialintelligence #medical

Pascal Biese

Daily AI highlights for 70k+ experts 📲🤗 AI/ML Engineer
7mo

MedGraphRAG: Towards Safe Medical LLMs Large Language Models (LLMs) have shown impressive capabilities in various domains, including healthcare. However, when it comes to handling sensitive medical data, generating reliable and evidence-based responses is crucial. Traditional methods often fall short in capturing the full context of medical information, leading to potential inaccuracies and safety concerns. MedGraphRAG introduces a novel graph-based Retrieval-Augmented Generation (RAG) framework tailored for the medical domain. It starts by employing a hybrid static-semantic approach for document chunking, which significantly improves context capture compared to traditional methods. Extracted entities are then used to construct a three-tier hierarchical graph structure, connecting entities to foundational medical knowledge from papers and dictionaries. These entities are further linked to form meta-graphs, which are merged based on semantic similarities to create a comprehensive global graph. To learn more about MedGraphRAG and other AI highlights, check out this week's LLM Watch: https://lnkd.in/dwm3iKc5
Like Comment
To view or add a comment, sign in
Pascal Biese

Daily AI highlights for 70k+ experts 📲🤗 AI/ML Engineer
7mo
Report this post
MedGraphRAG: Towards Safe Medical LLMs Large Language Models (LLMs) have shown impressive capabilities in various domains, including healthcare. However, when it comes to handling sensitive medical data, generating reliable and evidence-based responses is crucial. Traditional methods often fall short in capturing the full context of medical information, leading to potential inaccuracies and safety concerns. MedGraphRAG introduces a novel graph-based Retrieval-Augmented Generation (RAG) framework tailored for the medical domain. It starts by employing a hybrid static-semantic approach for document chunking, which significantly improves context capture compared to traditional methods. Extracted entities are then used to construct a three-tier hierarchical graph structure, connecting entities to foundational medical knowledge from papers and dictionaries. These entities are further linked to form meta-graphs, which are merged based on semantic similarities to create a comprehensive global graph. To learn more about MedGraphRAG and other AI highlights, check out this week's LLM Watch: https://lnkd.in/dwm3iKc5

13 Comments
Like Comment
To view or add a comment, sign in
Fehmi Boyacioglu

Researcher | Data Enthusiast | Seeking PhD and Analytics Roles
3mo
Report this post
OpenAI’s o1-preview large language model (LLM) has demonstrated superior performance in medical reasoning tasks compared to both human physicians and previous AI models. In differential diagnosis generation, o1-preview correctly included the diagnosis in 78.3% of New England Journal of Medicine cases, surpassing GPT-4’s 72.9% accuracy. Additionally, in management reasoning vignettes, o1-preview outperformed both GPT-4 and human physicians by over 40%, highlighting its potential to enhance clinical decision-making. #ArtificialIntelligence #Healthcare

Superhuman performance of a large language model on the reasoning tasks of a physician

arxiv.org

1 Comment
Like Comment
To view or add a comment, sign in
Mindful Engineer AI

1,202 followers
5mo
Report this post
🔊 Researchers from the University of Oxford have introduced MedGraphRAG, designed to improve Large Language Models (LLMs), like GPT-4 in the medical field It addresses one of the biggest challenges LLMs face—contextual accuracy in medicine. MedGraphRAG's approach connects important medical entities, such as diseases and treatments, to reliable knowledge bases. MedGraphRAG outperforms state-of-the-art models across multiple medical Q&A benchmarks. Read more: https://lnkd.in/grFKP2ep ♻️ If you found this content useful, please share it with your friends and connections. 🔔 Follow MindfulEngineer for more AI-related content. https://lnkd.in/gHdY4uiK #AIinHealthcare #MedicalAI #LLM #MachineLearning #ArtificialIntelligence
Like Comment
To view or add a comment, sign in
Curei.ai

82 followers
6mo Edited
Report this post
On 'Hustling the physician-patient relationship', the regulatory cues and emphasis on patient-centred communication were great. I liked the principle that 'Human Supervision of AI is good practise' - it reflects on where we are now. Bias is a golden thread - to add in another paper on racial bias and LLMs, I found this a helpful read too https://lnkd.in/g-wgX85t

Unmasking and quantifying racial bias of large language models in medical report generation - Communications Medicine

nature.com
Like Comment
To view or add a comment, sign in
Nick Tarazona, MD
4mo
Report this post
Qualitative metrics from the biomedical literature for evaluating large language models in clinical decision-making: a narrative review - Most frequently used LLMs: GPT-3.5, GPT-4, Bard, LLaMa/Alpaca-based models, Bing Chat. - Top criteria for scoring LLM outputs: accuracy, completeness, appropriateness, insight, consistency. - Need for standardized reporting of qualitative evaluation metrics to enhance research on LLMs in healthcare. DOI: https://lnkd.in/e3iikhzJ Link: https://lnkd.in/e2v5VSwp
Like Comment
To view or add a comment, sign in
Nick Tarazona, MD
4mo
Report this post
Qualitative metrics from the biomedical literature for evaluating large language models in clinical decision-making: a narrative review - Most frequently used LLMs include GPT-3.5, GPT-4, Bard, LLaMa/Alpaca-based models, and Bing Chat. - Key evaluation criteria for LLM outputs are accuracy, completeness, appropriateness, insight, and consistency. - There is significant variation in how studies report findings and assess LLM performance; standardized reporting metrics are needed. DOI: https://lnkd.in/e3iikhzJ DOI: doi:10.1186/s12911-024-02757-z
Like Comment
To view or add a comment, sign in
Carrie Smedira, CPHIMS

Helping customers migrate to Azure to innovate with the latest technologies
2mo
Report this post
Large Language Models (LLMs) can revolutionize how patients find the right clinical trials and healthcare professionals find the right candidates. This video by NIH National Library of Medicine shows how they applied OpenAI GPT models to find the right clinical trials quickly and efficiently.

NIH-Developed AI Algorithm Successfully Matches Potential Volunteers to Clinical Trials Release

https://www.youtube.com/
Like Comment
To view or add a comment, sign in
NEJM AI

15,560 followers
2mo
Report this post
Volume 2, No. 2 is now available! Here are the latest articles available in the February issue of NEJM AI: Save this post to revisit later (click the 💬 button at top right of post). 𝗘𝗱𝗶𝘁𝗼𝗿𝗶𝗮𝗹𝘀 ⚖️ It’s Time to Bench the Medical Exam Benchmark https://nejm.ai/3PK8v1D 𝗣𝗲𝗿𝘀𝗽𝗲𝗰𝘁𝗶𝘃𝗲𝘀 🌍 Using Large Language Models to Promote Health Equity https://nejm.ai/40gbwvn 📝 The Burden of Reviewing LLM-Generated Content https://nejm.ai/3WnJD3r 🌟 AI Grand Rounds: Cultivating Health Care’s AI Future https://nejm.ai/4aurOW2 𝗢𝗿𝗶𝗴𝗶𝗻𝗮𝗹 𝗔𝗿𝘁𝗶𝗰𝗹𝗲𝘀 🗣️ A Cross-Sectional Study of GPT-4–Based Plain Language Translation of Clinical Notes to Improve Patient Comprehension of Disease Course and Management https://nejm.ai/4avSdTw 👁️ Developing ICU Clinical Behavioral Atlas Using Ambient Intelligence and Computer Vision https://nejm.ai/4g8D23N 🔬 Multicenter Double-Blind Study Evaluating AI-Driven Detection of Proximal Deep Vein Thrombosis https://nejm.ai/4awOqFj 𝗣𝗼𝗹𝗶𝗰𝘆 𝗖𝗼𝗿𝗻𝗲𝗿 🤖 Disclosure, Humanizing, and Contextual Vulnerability of Generative AI Chatbots https://nejm.ai/40rPlSY 𝗥𝗲𝘃𝗶𝗲𝘄 𝗔𝗿𝘁𝗶𝗰𝗹𝗲 ⚠️ Not All Clinical AI Monitoring Systems Are Created Equal: Review and Recommendations https://nejm.ai/40J53KJ Visit http://ai.nejm.org to read all the latest articles on AI and machine learning in clinical medicine. #ArtificialIntelligence #AIinMedicine
Like Comment
To view or add a comment, sign in
Rangaswami B.

Founder, Director and CEO | Treasury Management & Lease Accounting SaaS
3mo
Report this post
This is quite intriguing. Following emerging evidence in recent months that some large language models (LLMs) are beating experienced physicians in the quality of diagnoses, a paper authored by medical faculties claims to have detected cognitive impairment in LLMs as they age - a very human-like degradation of AI 'thinking' ability. Would be very interesting to see if this finding can be replicated.
Like Comment
To view or add a comment, sign in

3,705 followers

View Profile Follow

Max Anfilofyev’s Post

More from this author

How to Save the World (Or at Least Find a Decent Chest X-Ray)

Why Hospitals Must Embrace the Transforming Episode Accountability Model (TEAM)

Unlocking Financial Success: The ACO Guide to Mastering Risk Adjustment in Value-Based Care

Explore topics

Max Anfilofyev’s Post

More Relevant Posts

NIH-Developed AI Algorithm Successfully Matches Potential Volunteers to Clinical Trials Release

https://www.youtube.com/

More from this author

How to Save the World (Or at Least Find a Decent Chest X-Ray)

Why Hospitals Must Embrace the Transforming Episode Accountability Model (TEAM)

Unlocking Financial Success: The ACO Guide to Mastering Risk Adjustment in Value-Based Care

Explore topics