Lucija Gregov
London, England, United Kingdom
2K followers
500+ connections
View mutual connections with Lucija
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
or
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
View mutual connections with Lucija
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
or
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
View Lucija’s full profile
Other similar profiles
-
Lefteris Vazaios
LondonConnect -
Gabriele Lillacci
LondonConnect -
Thomas Richardson
LondonConnect -
Puneeth Nikin
London Area, United KingdomConnect -
Zongyi Liu
Palo Alto, CAConnect -
Christopher Hadley
LondonConnect -
Marco Bertetti
Dubai, United Arab EmiratesConnect -
Ekaterina Gorbunova
LondonConnect -
Imdadul Haque Milon
London Area, United KingdomConnect -
Federico Armata, PhD
LondonConnect -
Vasileios Vasileiou
AthensConnect -
Colin G.
Lead Data Scientist at Jagex
United KingdomConnect -
Ritvik Sharma
GurugramConnect -
Sam Barrows
San Francisco, CAConnect -
Phil McParlane
United KingdomConnect -
Letisya Aliciyan
WinchesterConnect -
David Pryce-Compson, PhD
LondonConnect -
Saeid Masoumzadeh
London Area, United KingdomConnect -
Dr Sriharsha Ramaraju, Ph.D
CardiffConnect -
Antreas Oikonomou
LondonConnect
Explore more posts
-
Harsh Singhal
This is really awesome. I used notebookLM to summarise a recent paper that shows how o1 like model can be reproduced. After providing the pdf from Arxiv, I noticed that the generated podcast was almost 49 minutes long. Very different from the earlier 10-12 minute version. And another impressive feature was the interactive mode where you can interject the two hosts and ask a question which they very graciously respond to. I wanted to share the audio summary but YT didn’t allow me to upload the WAV file. YT expects a video mp4 file. So I found free videos from Pexels and stitched them together with ffmpeg. The videos are meditative acquarium shots of fish swimming around. And with the audio discussing reproducing o1 as an engaging podcast episode, the video keeps the distraction to the minimum. The paper itself is a great read but if you have a 30-40 minute commute, or want to watch some cool fish swimming around and learn about o1, check out
12
-
Harsh Singhal
Reflecting on this and iterating a few more times, here are some features I’d like to build. - understand the montage videos and summarize them. Describe the scenes, color palette, how busy it is, etc. - have an agent create a podcast that is less “fun” and removes the aahs and ums from the NotebookLM audio summary. - have an agent find the best collection of videos to go with the podcast. Next version - figure out how to generate slides that provide key takeaways. - extract key images from the paper such as tables or charts and even pseudocode figures for the slides. - podcast audio is generated to track the slides. I feel a lot of information and insights in papers can be quickly skimmed if they were available as YouTube videos or audio podcasts. Sure, one cannot avoid reading these papers but one can sample them via video/podcast format and then choose which paper they want to dive into. Also, many papers require a background and some prerequisite knowledge to really appreciate. This can also be generated as part of a longer form video that takes the audience into a few asides. It can be a series of videos that can be consumed as course material. Colleges with AI/ML programs are struggling with pedagogy approaches to bring students up to speed and then deliver coursework on GenAI. And often they fall short on time or on ability to distill the information or on lack of knowledge and experience in these areas on the part of the educators. There is no escaping the hard work of reading papers multiple times and doing the work to go and research references and so on. That is what masters and doctoral candidates are doing. But undegrads and even high schoolers who wants more are often juggling a lot of breadth in their curriculum and often ignoring seminal papers in the AI field. And this applies to any field for that matter. I’ll be summarising interesting papers I’ve read with NotebookLM so I can spot any issues. And I’ll continue to build a workflow where I can generate summaries, correct them to the best of my knowledge and share it with the community. I hope you find the time to provide feedback.
-
Pratyush Lohumi
🔍 Google's Gemma Scope: Illuminating the Inner Workings of Large Language Models 🧠 Google researchers have developed a powerful new tool called Gemma Scope, designed to shed light on how each layer in Gemma 2 large language models responds to input tokens. 📊 Key insights: - Sparse autoencoders (SAEs) can transform embeddings into interpretable representations - Each index in the transformed embedding corresponds to a distinct concept - SAE weights indicate the strength of each concept in the input Gemma Scope enables: - Manual and automatic labeling of concepts in each layer - Steering the model by adjusting SAE outputs to generate concept-specific text This groundbreaking tool paves the way for answering critical questions about LLMs, such as how fine-tuning and chain-of-thought prompting influence a model's internal representations. 🤔 #GemmaScope #LargeLanugageModels #TransformerInterpretability #GoogleAI #MachineLearningResearch
1
-
Jessica Xin Dong
The most brilliant data scientists I've encountered share a vital trait: 🔍 They're creative problem-solvers. It's easy to get hung up on the analytical bits of data science and miss the forest for the trees. But honestly, what's the point of being able to craft an algorithm from the ground up if you're stumped by the quirky, out-of-the-box challenges? Creativity isn't just a 'nice-to-have'; it's essential. And while some of it might be wired into us, there's plenty you can do to give your creative muscle a good workout: 📚 **Read**: Dive into a novel. Fiction has this magic of constructing entire universes in your mind, offering a slightly richer experience than audiobooks. 🎨 **Create**: Whether it's journaling, sketching, sewing, or colouring, engaging in any form of creative expression can do wonders. 🏃♀️ **Move**: Your body isn't built for endless hours at the desk. Cycle, walk, yoga, or skate your way to clarity and inspiration. 🗣 **Communicate**: Sometimes, a good chat is all you need to untangle a tricky problem. Do you think creativity is the backbone of data science? How do you keep your creative juices flowing? Share your thoughts below! 🧠✨
5
1 Comment -
Massimiliano Marchesiello
Why Most Cross-Validation Visualizations Are Wrong (And How to Fix Them) https://ift.tt/3Im8iwz MODEL VALIDATION & OPTIMIZATION Stop using moving boxes to explain cross-validation! You know those cross-validation diagrams in every data science tutorial? The ones showing boxes in different colors moving around to explain how we split data for training and testing? Like this one: Have you seen that? Image by author. I’ve seen them too — one too many times. These diagrams are common — they’ve become the go-to way to explain cross-validation. But here’s something interesting I noticed while looking at them as both a designer and data scientist. When we look at a yellow box moving to different spots, our brain automatically sees it as one box moving around. It’s just how our brains work — when we see something similar move to a new spot, we think it’s the same thing. (This is actually why cartoons and animations work!) You might think the animated version is better, but now you can’t help following the blue box and starting to forget that this should represent how cross-validation works. Source: Wikipedia But here’s the thing: In these diagrams, each box in a new position is supposed to show a different chunk of data. So while our brain naturally wants to track the boxes, we have to tell our brain, “No, no, that’s not one box moving — they’re different boxes!” It’s like we’re fighting against how our brain naturally works, just to understand what the diagram means. Looking at this as someone who works with both design and data, I started thinking: maybe there’s a better way? What if we could show cross-validation in a way that actually works with how our brain processes information? All visuals: Author-created using Canva Pro. Optimized for mobile; may appear oversized on desktop. What’s Cross-Validation Really About? Cross-validation is about making sure machine learning models work well in the real world. Instead of testing a model once, we test it multiple times using different parts of our data. This helps us understand how the model will perform with new, unseen data. Here’s what happens: We take our data Divide it into groups Use some groups for training, others for testing Repeat this process with different groupings The goal is to get a reliable understanding of our model’s performance. That’s the core idea — simple and practical. (Note: We’ll discuss different validation techniques and their applications in another article. For now, let’s focus on understanding the basic concept and why current visualization methods need improvement.) What’s Wrong with Current Cross-validation Diagrams? Open up any machine learning tutorial, and you’ll probably see these types of diagrams: Long boxes split into different sections Arrows showing parts moving around Different colors showing training and testing data Multiple versions of the same diagram side by side Currently, this is similar to the first image you’ll see if...
-
Anastassia Kornilova
Last month, DeepMind released FACTS Grounding - a benchmark for evaluating Factuality and Quality of LLM outputs...and they avoided most of the common benchmark pitfalls that I check for! (https://lnkd.in/eT88wNHj) - ✅ *Reproducibility*: Official evaluation code is available (https://lnkd.in/exp3kP2c) - ✅ *Test Data Leakage*: About half of the evaluation examples are private - they can not accidentally be scraped or included in an instruction tuning dataset (unlike GSM8K - https://lnkd.in/ec_a87ZJ) - ✅ *Construct Validity*: The benchmark design involves a step to check if the output is "eligible" (i.e. that it contains the relevant information) - this discredits the responses that are "factual" but contain no useful information. The developers, also, did not include examples that involved creativity, mathematics, or complex reasoning to target factuality exclusively. - ✅ *Saturation*: Over time SotA Models hit a ceiling on performance, and no longer represent a useful proxy for performance. The top models currently score in the low 80s on this benchmark, which shows that there is still room for improvement. - ✅ *Process and Limitation Documentation*: The benchmark developers clearly document the process for assembling the benchmark and explain how they handled limitations. For example: Evaluation is done using LLM-as-a-Judge, and "judges" are known to favor outputs from the same model. The evaluation uses an ensemble of 3 different models to reduce this bias. - 🚫 *Variance*: Each input prompt is only run once for every model. Given that LLMs produce different outputs over multiple runs of the same input, I would have liked to see that element incorporated in evaluation. However, given that LLM-as-a-Judge methods are used during grading, the one-run outputs may be fairly reliable. I am still finalizing my checklist of common benchmark pitfalls, so stay tuned.
19
-
Ben Feuer
Cynical attacks on science like this recent X post from Noam Brown reflect very poorly on #OpenAI. With trust in the scientific process degrading in society at large, bad faith arguments from people who know better are irresponsible and potentially dangerous. OpenAI's (inflated) market valuation is built on the foundations of open science. The innovations in their latest o1 model were taken from the open science community, in many cases without attribution. Perhaps comments like this are part of the reason that #Apple is rumored to have pulled out of the latest OpenAI funding round, and so many talented employees are making for the exits? OpenAI needs open science. But who needs OpenAI?
6
-
Andrew Jones
I’ve never thought of Data Science as just training ML models - to me it’s always been so much more. For me, a broad description would be: “Using data to help businesses make better strategic decisions, to understand their customers, and to predict what is likely to happen next” There is so much value in this, and it will always be in high demand - regardless of what tools help you do it.
89
8 Comments -
Manu D.
Just read a fascinating paper on a new prompting technique called Buffer of Thoughts (BoT). Can this be a chain of thought alternative? It's a game-changer for complex reasoning with Large Language Models (LLMs). 🧠💡 Key points: -BoT implements a "meta-buffer" - a dynamic repository of high-level thought templates, providing LLMs with adaptable problem-solving strategies. -Input problems undergo a distillation process, extracting essential elements and constraints. This distilled information is then used to query the meta-buffer for the most relevant thought template. -The system employs a continuous learning mechanism. The buffer manager actively updates the template repository based on new insights and outcomes, ensuring the system evolves with each task. -Performance metrics are notable. The paper reports that "Llama3-8B+BoT has the potential to surpass Llama3-70B model," indicating significant improvements in model efficiency. -BoT demonstrates superior efficiency compared to methods like Tree of Thought (ToT), reducing computational costs by approximately 88% while maintaining or improving task performance. -The framework utilizes embedding similarity calculations to retrieve the most appropriate thought template, with a similarity threshold (δ) typically set between 0.5-0.7. -For tasks without a closely matching template, BoT employs a general thought template designed to address a broad spectrum of problems. What excites me most is the potential for smaller models to punch above their weight. If BoT can help an 8B parameter model compete with a 70B one, we're looking at significant cost and compute savings. Curious to hear your thoughts. Have you experimented with BoT or similar techniques? #ai #machinelearning #etech #ML Github: https://lnkd.in/dV28he2y Paper: https://lnkd.in/dGsu3QtC
39
4 Comments -
Dr Timothy Bednall
This is one for all of the statisticians and psychometricians! This paper has been a bit of a solo labour of love for the past couple of years, and I have finally finished the first draft of it. Most statistics users will have heard of R-Squared as a way of evaluating how well a model explains the variability in some kind of outcome. If you're into psychological measurement like me, you probably would have heard of Cronbach's alpha, and if you're real nerd, you might even have heard of something called Coefficient omega. And if you've ever had to present results to a non-statistical audience, you might have used an approach called relative importance analysis to convert your results into a percentage. It turns out that there is a common matrix formula solution underlying all of these techniques. So I've written a short paper explaining how to apply this framework to a variety of different models, and have provided R code and Excel worksheets as supplementary materials. This paper has not yet been peer reviewed, but I'm releasing it to my network via a pre-print server for those interested enough to give feedback. It's a rather technical read, but it's one that may be of interest to fellow data professionals. https://lnkd.in/gUPfXPdF
68
5 Comments -
Engr. Jalal Saleem
OpenCoder doesn't get enough love They open-sourced the entire pipeline to create QwenCoder-level code models. This includes: - Large datasets - High-quality models - Eval framework Tons of great lessons and observations in the paper 📝 Paper: arxiv.org/abs/2411.04905 #special #linkedin #Family #Pakistan #skills #islam #government #AI #ML #MachineLearning #tesla #microsoft #openai
4
-
Jagadish Venkataraman
Kolmogorov–Arnold Networks seem like a paradigm shift in ML architectures. Curious to see how they evolve. "While MLPs have fixed activation functions on nodes ("neurons"), KANs have learnable activation functions on edges ("weights"). KANs have no linear weights at all -- every weight parameter is replaced by a univariate function parametrized as a spline. " https://lnkd.in/gdPk_qCQ
28
-
Michal Rachtan
What does "open source" really mean for Large Language Models (LLMs)? This recent study reveals 14 different shades of LMMs' "openness" 😉; ranging from access to code, model weights, and training data to unrestricted use and distribution. The authors surveyed over 40 so-called "open" LLMs across 14 parameters, including code availability, training data access, documentation quality, and ease of access. Not surprisingly, most models are "open weight" at best, withholding key details related to the IP from the training phase. Check out the full paper here 👉 https://lnkd.in/dGFxHD4K #GenerativeAI #OpenSource #LLM
67
4 Comments -
Zak Jost
Recently I've been playing with "Swarm Intelligence", wondering in the back of my mind how I might apply it to my "day job" and merge with modern methods in the age of transformers...etc. Today I came across a paper (https://lnkd.in/g_bUdGB7) in the TL;DR newsletter that uses many inferences from small models to generate much better solutions than individuals, and at 3x lower cost than using a large model. And it is merely generating several samples independently without any interaction among the models. What if we instead applied a swarm intelligence algorithm to this approach so that inference N was informed by all the prior inferences? Would we better navigate the exploit/explore trade-off? This is the sort of place my video series will be heading. Relevant quote from the abstract: "...when it comes to inference, we often limit the amount of compute to only one attempt per problem. Here, we explore inference compute as another axis for scaling by increasing the number of generated samples. Across multiple tasks and models, we observe that coverage - the fraction of problems solved by any attempt - scales with the number of samples over four orders of magnitude. In domains like coding and formal proofs, where all answers can be automatically verified, these increases in coverage directly translate into improved performance. When we apply repeated sampling to SWE-bench Lite, the fraction of issues solved with DeepSeek-V2-Coder-Instruct increases from 15.9% with one sample to 56% with 250 samples, outperforming the single-attempt state-of-the-art of 43% which uses more capable frontier models."
50
3 Comments -
Gaurav Sen
The MCTS algorithm makes a comeback with Large Language Models LLMs are known to perform poorly on algorithmic tasks. Previous attempts to solve this with Chain of Thought and Graph of Thought have been promising. A new algorithm suggests using Monte Carlo Tree Search, a game tree search algorithm that probabilistically finds optimal paths using simulations. #SystemDesign #MCTS #LLMs
145
4 Comments -
Hadia Hameed
An interesting post on how Skyscanner uses the combination of Dr Jekyll and its in-house statistical tool called WISE to ship experiment variants to production. Instead of using a frequentist approach as is done in most AB experiments, this framework uses a Bayesian approach to quantify the uncertainty in the final estimates by reporting credible intervals instead of confidence intervals. Summary: - Example of an experiment to test whether a pop-up, mentioning that ‘Skyscanner never takes a cut’ increases the ‘redirector rate’ or not. - If sufficient data has not yet been collected, WISE’s results will say ‘keep testing’. - WISE is able to provide a recommendation on the variant to ship. - WISE uses the Bayesian approach. They model their belief of the underlying parameter values as probability distributions, which are updated as new data is observed. - For conversion metrics, it uses a beta-binomial model - For revenue per user metrics, it uses a hurdle gamma-exponential model - What's the experiment stopping criteria: Through Monte Carlo simulations of thousands of experiments, they have determined a set of bespoke target 90% HDI widths to serve as their stopping criterion, which ensured an appropriate balance between the duration of experiments and the correct decision rate. https://lnkd.in/dv7BEHXe #statistics #analytics #rstats #python #peopleanalytics #experimentation
6
1 Comment -
Tolgahan Cakaloglu, Ph.D.
Today's reading is Memory Layers in Large Language Models 🧠 This research shows how memory layers can take LLM performance to the next level: 🔑 Trainable key-value lookup mechanism: Adds parameters without increasing computational cost, boosting factual accuracy and task performance. 📈 Big results, small cost: Outperforms models with much larger computational budgets and even beats mixture-of-experts models! ⚙️ Scales seamlessly with up to 128 billion memory parameters The authors from Meta highlights how memory layers are a game-changing addition to future AI architectures, offering smarter, faster, and more accurate solutions. You may want to check out the paper at: https://lnkd.in/gzuqUfYe #AI #LLM #MemoryLayers #Innovation #MachineLearning #FactualAI #FutureOfAI #Meta
19
-
Charlie ✦ Greenman
Love to see it. New elastic index spun up this lovely morning. Will allow users to: -> search across 10's of thousands of datasets -> then cross reference with vector search so can zero in on individual items within dataset looking for ---- Not to brag, but if spotify employed this architecture you could find playlists based on individual songs within a playlist(something they currently don't do) [Spotify could even start charging more for a more premium search #businessIdea]
-
Massimiliano Marchesiello
Introducing the New Anthropic PDF Processing API https://ift.tt/TxFMLby Image by AI (Dalle-3) Anthropic Claude 3.5 now understands PDF input In the last few weeks, Anthropic has released some exciting beta features that have largely gone under the radar. One of these was its new token-counting API. I have already written an article on this, which you can read by clicking the link below. Introducing the New Anthropic Token Counting API The other exciting feature, and the subject of this article, is that Claude 3.5 can now process PDFs and understand both text and visual content within PDF documents. PDF Capabilities Claude works with any standard PDF file, allowing you to inquire about text, images, charts, and tables within your documents. Here are some common use cases: Analyzing financial reports, interpreting charts and tables Extracting key information from legal documents Assisting with document translations Converting document content into structured formats Limitations Because this is still a Beta release, there are a few limitations to its use. Right now, it can handle a maximum file size of 32MB, and the number of pages in any one document is limited to 100. Supported Platforms and Models PDF support is currently available on the latest Claude 3.5 Sonnet model (claude-3-5-sonnet-20241022) through direct API access. Calculate Expected Token Usage The token count for a PDF file is determined by the amount of text extracted and the total number of pages. Each page is converted to an image, and token costs are calculated accordingly. Depending on content density, each page typically requires between 1,500 and 3,000 tokens. Standard input token pricing applies, with no extra fees for PDF processing. You can also use token counting (see story link above) to calculate the number of tokens for a message that includes PDFs. Okay, let’s get started. First, I’m developing using Windows WSL2 Ubuntu. If you’re a Windows user, I have a comprehensive guide on installing WSL2, which you can find here. Setting up a dev environment Before we start coding, let’s set up a separate development environment. That way, all our projects will be siloed and won’t interfere with each other. I use conda for this, but use whichever tool you’re familiar with. (base) $ conda create -n claude_pdf python=3.10 -y (base) $ conda activate claude_pdf # Install required Libraries (claude_pdf) pip install anthropic jupyter Getting an Anthropic API key You'll need an Anthropic API key if you don’t already have one. You can get that from the Anthropic Console. Register or Sign-In, then you’ll see a screen like this, Image from Anthropic Website Click the Get API Keys button and follow the instructions from there. Take note of your key and set the environment variable ANTHROPIC_API_KEY to it. The code For my input PDF, I’ll use a copy of Tesla’s Q10 September 2023 quarterly submission to the Securities and Exchange Commission...
Explore collaborative articles
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
Explore MoreOthers named Lucija Gregov
2 others named Lucija Gregov are on LinkedIn
See others named Lucija Gregov