Events

Hide Sidebar Show Sidebar

April 19, 2025

No events scheduled

April 22, 2025

CSAIL Forum with Prof Yoon Kim: Efficient and Expressive Architectures for Language Modeling

Part Of

CSAIL Forum

12:00P

- 1:00P

Location

TBD

Add to Calendar 2025-04-22 12:00:00 2025-04-22 13:00:00 America/New_York CSAIL Forum with Prof Yoon Kim: Efficient and Expressive Architectures for Language Modeling Efficient and Expressive Architectures for Language ModelingSpeaker: Yoon Kim, Assistant Professor, CSAIL Tuesday 12:00-1:00 EDT, April 22, 2025 live stream via Zoom: Registration requiredAbstract:Transformers are the dominant architecture for language modeling (and generative AI more broadly). The attention mechanism in Transformers is considered core to the architecture and enables accurate sequence modeling at scale. However, the complexity of attention is quadratic in input length, which makes it difficult to apply Transformers to model long sequences. Moreover, Transformers have theoretical limitations when it comes to the class of problems it can solve, which prevents their being able to model certain kinds of phenomena such as state tracking. This talk will describe some recent work on efficient alternatives to Transformers which can overcome these limitations.Bio: Yoon Kim is an assistant professor at MIT EECS and a principal investigator at CSAIL, where he works on natural language processing and machine learning. He obtained his Ph.D. in computer science from Harvard University. TBD

Thesis Defense: Pratyusha Sharma, "Discovering and Engineering the Computation Underlying Large Intelligent Agents"

Pratyusha Sharma

CSAIL

2:00P

- 3:00P

Location

Kiva and zoom

Add to Calendar 2025-04-22 14:00:00 2025-04-22 15:00:00 America/New_York Thesis Defense: Pratyusha Sharma, "Discovering and Engineering the Computation Underlying Large Intelligent Agents" The richness of language and intelligent behavior has often been attributed to latent compositional structure. Can we build tools for discovering how deep networks learn and represent this latent structure implicitly? And more importantly, can we use this knowledge to improve generalization in largely structure-less general purpose models or refine our understanding of the world they describe? In this talk, I present three perspectives to answer these questions. I will discuss experimental methods to functionally characterize the space of learnt solutions in LLMs and demonstrate how this understanding can be used to improve their empirical generalization in a gradient free manner, sometimes by as much as 30% points on language understanding benchmarks. Following that, I show how to decipher the structure of another (black box) language-like system, the naturally arising communication system of sperm whales in the wild, discovering for the first time a unique combinatorial communication system. Finally, I apply insights from these results to equip embodied agents with a latent language of thought–-hierarchical and compositional—and show how it can enable long-horizon reasoning and planning in these systems.Thesis Committee: Antonio Torralba, Jacob Andreas, Daniela Rus, Yejin Choi TBD

HCI Seminar - Cindy Bennett - Accessibility and Disability Considerations for Responsible AI

Cindy Bennett

Google Research

Part Of

HCI Seminar Series 2024

4:00P

- 5:00P

Location

Add to Calendar 2025-04-22 16:00:00 2025-04-22 17:00:00 America/New_York HCI Seminar - Cindy Bennett - Accessibility and Disability Considerations for Responsible AI Abstract:Generative (gen) AI is widely considered to have the potential to scale accessibility solutions. For example, users can turn on AI-generated captions on most virtual conference platforms and blind and low vision users can receive detailed image and video descriptions on demand, capabilities and scales unheard of until recently. However, greater responsible AI research shows how gen AI leveraged in particular domains (e.g., creative) is transforming professionals and threatening workers, such as artists who frequently work in precarious conditions. Further, gen AI exhibits bias in how it represents various groups of people. In this talk I will share two projects addressing these topics–(1) how disabled artists make their workflows accessible and negotiate recent gen AI advancements and (2) representational tropes participants with disabilities identified in AI-generated images. By sharing these projects I will show how gen AI exhibits potential to enhance disabled people’s work by relieving them of certain access barriers and taking over undesired administrative labor, but its wider favoring over hiring artists raises concerns about the cost of leveraging it as an accessibility tool. Further, AI-generated images represented people with disabilities extremely poorly, amplifying longstanding stereotypes which disabled advocates have countered for decades. I will argue that these limitations cannot be read separately from AI applied to solve perennial digital inaccessibility, but they must motivate multi-pronged approaches to outline responsible AI development.Bio:Dr. Cynthia Bennett is a senior research scientist at Google Research. She researches making technology-mediated experiences, such as those leveraging generative AI, accessible to and representative of people with disabilities while mitigating harmful applications. Prior, Bennett was a researcher at Apple and a postdoctoral Research Fellow  at Carnegie Mellon University, after receiving her Ph.D. in Human Centered Design and Engineering from the University of Washington. Bennett's research has been recognized with awards from top scientific publication venues and funding agencies in her field. She is also a disabled woman scholar committed to raising participation of people with disabilities in the tech industry.This talk will also be streamed over Zoom: https://mit.zoom.us/j/96239100489. TBD

How to Securely Implement Cryptography in Deep Neural Networks

Adi Shamir

Weizmann Institute of Science

Part Of

Theory of Computation (ToC) 2024 - 2025

4:15P

- 5:15P

Location

Refreshments at 4:00 PM

Add to Calendar 2025-04-22 16:15:00 2025-04-22 17:15:00 America/New_York How to Securely Implement Cryptography in Deep Neural Networks The wide adoption of deep neural networks (DNNs) raises the question of how can we equip them with a desired cryptographic functionality (e.g., to decrypt an encrypted input, to verify that this input is authorized, or to hide a secure watermark in the output).The problem is that cryptographic primitives are typically designed to run on digital computers that use Boolean gates to map sequences of bits to sequences of bits, whereas DNNs are a special type of analog computer that uses linear mappings and ReLUs to map vectors of real numbers to vectors of real numbers. In the past, this discrepancy between the discrete and continuous computational models had led to many interesting side channel attacks. In this talk I will describe a new theory of security when digital cryptographic primitives are implemented as ReLU-based DNNs. I will first demonstrate the existence of a provable exponential gap between the complexities of solving a simple search problem in the two computational models. I will then show that the natural implementations of block ciphers as DNNs can be broken in linear time by using nonstandard inputs whose “bits” are real numbers. Finally, I will develop a new and completely practical method for implementing any desired cryptographic functionality as a standard ReLU-based DNN in a provably secure and correct way. TBD

April 23, 2025

EECS Special Seminar: Tijana Zrnic, "AI-Assisted Approaches to Data Collection and Inference"

Tijana Zrnic

Stanford University

Part Of

EECS Special Seminar

11:00A

- 12:00P

Location

Star and zoom

Add to Calendar 2025-04-23 11:00:00 2025-04-23 12:00:00 America/New_York EECS Special Seminar: Tijana Zrnic, "AI-Assisted Approaches to Data Collection and Inference" Abstract:Recent breakthroughs in AI offer tremendous potential to reduce the costs of data collection. For example, there is a growing interest in leveraging large language models (LLMs) as efficient substitutes for human judgment in tasks such as model evaluation and survey research. However, AI systems are not without flaws—generative language models often lack factual accuracy, and predictive models remain vulnerable to subtle perturbations. These issues are particularly concerning when critical decisions, such as scientific discoveries or policy choices, rely on AI-generated outputs. In this talk, I will present recent and ongoing work on AI-assisted approaches to data collection and inference. Rather than treating AI as a replacement for data collection, our methods leverage AI to strategically guide data collection and improve the power of subsequent inferences, all the while retaining provable validity guarantees. I will demonstrate the benefits of this methodology through examples from computational social science and more. Bio:Tijana Zrnic is a Ram and Vijay Shriram Postdoctoral Fellow at Stanford University, affiliated with Stanford Data Science and the Department of Statistics. Tijana obtained her PhD in Electrical Engineering and Computer Sciences at UC Berkeley and a BEng in Electrical and Computer Engineering at the University of Novi Sad in Serbia. Her research establishes foundations to ensure data-driven technologies have a positive impact; she has worked on topics such as AI-assisted statistical inference, performative prediction, and mitigating selection bias. TBD

ML Tea: Do Large Language Model Benchmarks Test Reliability?

Part Of

ML Tea

4:00P

- 5:00P

Location

Add to Calendar 2025-04-23 16:00:00 2025-04-23 17:00:00 America/New_York ML Tea: Do Large Language Model Benchmarks Test Reliability? Speakers: Josh VendrowAbstract: When deploying large language models (LLMs), it is important to ensure that these models are not only capable, but also reliable. Many benchmarks have been created to track LLMs' growing capabilities, however there has been no similar focus on measuring their reliability. To understand the potential ramifications of this gap, we investigate how well current benchmarks quantify model reliability. We find that pervasive label errors can compromise these evaluations, obscuring lingering model failures and hiding unreliable behavior.Motivated by this gap in the evaluation of reliability, we then propose the concept of so-called platinum benchmarks, i.e., benchmarks carefully curated to minimize label errors and ambiguity. As a first attempt at constructing such benchmarks, we revise examples from fifteen existing popular benchmarks. We evaluate a wide range of models on these platinum benchmarks and find that, indeed, frontier LLMs still exhibit failures on simple tasks such as elementary-level math word problems. Analyzing these failures further reveals previously unidentified patterns of problems on which frontier models consistently struggle.Bio: Josh is a third-year PhD student working with Aleksander Madry. Josh's research focuses on building machine learning models that are safe and robust when deployed in the real world.  TBD

April 24, 2025

Revisiting Keyed-Verification Anonymous Credentials

MICHELE ORRÙ

CNRS

Part Of

CSAIL Security Seminar 2024 - 2025

12:00P

- 1:00P

Location

Stata (Star)

Add to Calendar 2025-04-24 12:00:00 2025-04-24 13:00:00 America/New_York Revisiting Keyed-Verification Anonymous Credentials Abstract Keyed-verification anonymous credentials are widely recognized as among the most efficient tools for anonymous authentication. In this work, we revisit two prominent credential systems: the scheme by Chase et al. (CCS 2014), commonly referred to as CMZ or PS MAC, and the scheme by Barki et al. (SAC 2016), known as BBDT or BBS MAC. We show how to make CMZ statistically anonymous and BBDT compatible with the BBS RFC draft. We provide a comprehensive security analysis for strong(er) properties of unforgeability and anonymity. These properties allow them to be composed with extensions that users can pick and choose. We show that simpler variants satisfying one-more unforgeability can still be anonymous tokens (Kreuter et al., CRYPTO 2020). To enable faster proofs for complex presentations, we present a compiler that uses an interactive oracle proof and a designated-verifier polynomial commitment to construct a designated-verifier non-interactive argument. For keyed-verification anonymous credentials, designated-verifier proofs suffice since the verifier is known in advance. We explore extensions that could benefit from this approach.  TBD

Embodied Intelligence (EI) Joint Seminar Presentation

Hongyin Luo & Yung-Sung Chuang & Philip Schroeder

MIT CSAIL

Part Of

Embodied Intelligence 2024-2025

4:00P

- 5:00P

Location

Stata Center (Building 32) (32 Vassar Street)

Add to Calendar 2025-04-24 16:00:00 2025-04-24 17:00:00 America/New_York Embodied Intelligence (EI) Joint Seminar Presentation There will be a joint presentation this week by three MIT CSAIL members from the Spoken Language Systems group. Title: Quantifying Generalization Complexity for Large Language ModelsAbstract:  LLMs have shown remarkable performance in a range of complex tasks, but how well do they generalize beyond their training data distribution and how do we quantitatively measure such generalization? This talk presents our recent ICLR work on SCYLLA, an evaluation framework that disentangles generalization from memorization in LLMs. Using a dynamic evaluation approach, SCYLLA quantifies the generalization capabilities of LLMs across complexity levels, revealing key insights into their performance gaps between in-distribution (ID) and out-of-distribution (OOD) data. We will explore findings like the generalization valley — a non-monotonic relationship between task complexity and performance, which suggests a critical threshold where LLMs' reliance on non-generalizable behavior peaks. Additionally, we'll discuss critical complexity, which shifts as model size increases, suggesting that larger models can tackle more complex reasoning tasks before they begin to over-rely on memorization. This talk will also cover our benchmarking results across 28 popular LLMs, including both open-source models (e.g., LLaMA, Qwen) and closed models (e.g., Claude, GPT). The aim is to provide a clearer understanding of their generalization capabilities and help foster more robust methods for evaluating and augmenting LLMs.Bio:  Hongyin Luo is a research scientist at MIT CSAIL, working with Dr. James Glass. Hongyin focuses on improving the efficiency and transparency of language model reasoning with structured and symbolic inference frameworks.Title:  Reducing Hallucinations in LLMs via Decoding, Detection, and CitationAbstract:  Large language models (LLMs) often produce hallucinations—content that is factually incorrect or unsupported by the real-world facts or input context. This talk presents three approaches that address this challenge from complementary perspectives.  1. DoLa is a decoding method that improves truthfulness by contrasting output distributions from earlier and final transformer layers, leveraging observations of the layer-wise localization of factual knowledge.https://arxiv.org/abs/2309.03883 2. Lookback Lens detects contextual hallucinations using only the information from the attention maps, and transfers well across tasks and model sizes.https://arxiv.org/abs/2407.07071 3. SelfCite introduces a self-supervised framework for aligning LLMs to generate fine-grained citations, using context ablation to provide a simple but effective reward for the necessity and sufficiency of a citation, achieving great performance comparable to Claude Citations with only an 8B model.https://arxiv.org/abs/2502.09604 Together, these techniques offer lightweight and scalable solutions for improving the factual reliability and verifiability of LLM outputs. Bio:  Yung-Sung Chuang is a fourth-year PhD student at MIT CSAIL, working with Dr. James Glass. His research focuses on improving the reliability and factuality of large language models.Title:  THREAD: Thinking Deeper with Recursive SpawningAbstract:  Large language models have shown impressive capabilities across diverse settings, but still struggle as the length and complexity of the context increases. To address this challenge, we introduce a new framework: Thinking Recursively and Dynamically (ThReaD). THREAD frames model generation as a thread of execution that, based on the context, can run to completion or dynamically spawn new threads in a recursive fashion. By spawning, threads can offload work (e.g., reasoning, retrieving information, analyzing data) to child threads, which only return tokens needed for the parent thread to do its work.  We show significant performance gains with THREAD in the settings of LLM task solving and question answering, where the dynamic threading allows the model to recursively decompose the given task or question into progressively simpler sub-problems that can be solved by separate child threads. In an extension of this work, we also demonstrate how a THREAD-based framework can improve reasoning over videos with vision-language models. Bio:  Philip Schroeder is a PhD student at MIT CSAIL, advised by Dr. Jim Glass, in the Spoken Language Systems Group. His work focuses on advancing the reasoning capabilities of LLMs and VLMs through embodied interaction with external environments, both virtual and real.  TBD

Learning, engineering, and targeting cell states in cancer

Ava Amini

Microsoft Research in Cambridge, MA

Part Of

Boston IEEE/ACM 2024 -2025

7:00P

- 8:00P

Location

Stata Center: Kiva

Add to Calendar 2025-04-24 19:00:00 2025-04-24 20:00:00 America/New_York Learning, engineering, and targeting cell states in cancer Boston Chapter of IEEE Computer Society and GBC/ACM7:00 PM, Thursday, 24 April 2025MIT Room 32-G449 (Kiva) and online via ZoomLearning, engineering, and targeting cell states in cancerAva Amini Please register in advance for this seminar even if you plan to attend in person athttps://acm-org.zoom.us/webinar/register/WN_Msf8F_LXTcSD2mWpDeVx5A After registering, you will receive a confirmation email containing information about joining the webinar.Indicate on the registration form if you plan to attend in person. This will help us determine whether the room is close to reaching capacity. We plan to serve light refreshments (probably pizza) before the talk starting at around 6:30 pm. Letting us know you will come in person will help us determine how much pizza to order.We may make some auxiliary material such as slides and access to the recording available after the seminar to people who have registered.Abstract:Cancer is often treated using a reductionist approach: distilled to an individual subtype, mutation, or phenotype. But fundamentally, cancers are complex ecosystems that necessitate systems-level understanding and intervention. Addressing this problem is equal parts biology and computer science. In Project Ex Vivo, a joint cancer research collaboration between Microsoft Research and the Broad Institute, we are envisioning a new, constructionist paradigm for precision oncology, one powered by the bottom-up integration of computation and experimentation to understand the complexity of cell state ecosystems in cancer. In this talk I will share our recent efforts to build AI models to better define, model, and therapeutically target cell states in cancer.Bio:Ava Amini is a Principal Researcher at Microsoft Research in Cambridge, MA. Her research focuses on developing new AI methods to understand and design biology, with the ultimate aim of realizing precision biomedicines that improve human health. She is a co-lead of Ex Vivo , a collaborative effort between Microsoft and the Broad Institute, that is focused on defining, engineering, and targeting cell states in cancer.In addition to research, Ava is passionate about AI education and outreach — she is a lead organizer and instructor for MIT Introduction to Deep Learning , an in-person and global course on the fundamentals of deep learning.Ava completed her PhD in Biophysics at Harvard University and the Massachusetts Institute of Technology (MIT), where she was advised by Sangeeta Bhatia at the Koch Institute for Integrative Cancer Research and supported by the NSF Graduate Research Fellowship. Ava received her Bachelor of Science in Computer Science and Molecular Biology from MIT.Directions to 32-G449 - MIT Stata Center, 32 Vassar Street, Cambridge, MA: Please use the main entrance to the Stata Center at 32 Vassar Street (the entrance closest to Main street) as those doors will be unlocked. Upon entering, proceed to the elevators which will be on the right after passing a large set of stairs and a MITAC kiosk. Take the elevator to the 4th floor and turn right, following the hall to an open area; 32-G449 will be on the left. Location of Stata on campus map  This joint meeting of the Boston Chapter of the IEEE Computer Society and GBC/ACM will be hybrid (in person and online).Up-to-date information about this and other talks is available online at https://ewh.ieee.org/r1/boston/computer/. You can sign up to receive updated status information about this talk and informational emails about future talks at https://mailman.mit.edu/mailman/listinfo/ieee-cs, our self-administered mailing list. TBD

April 25, 2025

Foundations in Multimodal Mechanistic Interpretability (William Rudman, Brown University)

11:00A

- 12:15P

Location

Add to Calendar 2025-04-25 11:00:00 2025-04-25 12:15:00 America/New_York Foundations in Multimodal Mechanistic Interpretability (William Rudman, Brown University) Abstract:  Mechanistic interpretability has been instrumental in understanding Large Language Models, yet remains underexplored in multimodal models. This is due to a lack of effective image-corruption methods needed for causal analysis. The first part of this talk introduces NOTICE, a novel corruption scheme designed for MLLMs, enabling causal mediation analysis for MLLMs.  Next, we examine the reasoning capabilities of MLLMs and find that MLLMs are shape-blind. Namely, vision-encoders in MLLMs embed geometrically dissimilar objects into the same regions of their representation space. We construct a side-counting dataset of abstract shapes, showing that current MLLMs achieve near-zero accuracy on a trivial task for humans.Finally, we present ongoing work on VisualCounterfact, a dataset designed to investigate the relationship between counterfactual visual inputs and world knowledge. VisualCounterfact consists of tuples that alter specific visual properties—color, size, and texture—of common objects. For instance, given (banana, color, yellow), we create a counterfact image (banana, color, purple) by modifying the object's pixels. Using VisualCounterfact, we locate a mechanism for reliably controlling whether a model will answer with the counterfactual property present in the image or retrieve the world-knowledge answer from its weights. TBD

April 26, 2025

New England Symposium on Graphics (NESG)

9:45A

- 4:45P

Location

34-101

Add to Calendar 2025-04-26 9:45:00 2025-04-26 16:45:00 America/New_York New England Symposium on Graphics (NESG) The New England Symposium on Graphics (NESG) is back! NESG is a one-day informal get-together for students, postdocs, and faculty in the area studying computer graphics and adjacent fields (computer-aided design, geometry, fabrication, vision, photography, etc.). The program consists of invited talks, a poster session, and plenty of time to catch up with collaborators. For more information, please check the event’s website https://nesg.graphics TBD

April 28, 2025

Industry Innovators Expo

2:00P

- 4:00P

Location

TBD

R&D Commons

Add to Calendar 2025-04-28 14:00:00 2025-04-28 16:00:00 America/New_York Industry Innovators Expo On Monday, April 28, from 2-4PM, CSAIL Alliances will host an Industry Innovators Expo where our member companies will have the opportunity to demonstrate their latest technology, talk about their technical progress and challenges, and give away swag. There will be light refreshments and opportunities for networking, so please join us in the R&D Commons on the 4th floor.   TBD

April 29, 2025

CSAIL Alliances Student Poster Session

5:00P

- 6:30P

Location

TBD

R&D Commons

Add to Calendar 2025-04-29 17:00:00 2025-04-29 18:30:00 America/New_York CSAIL Alliances Student Poster Session Come learn about the groundbreaking work happening in Stata on Tuesday, April 29, from 5-6:30PM at the CSAIL Student Poster Session. This is a chance for CSAIL students and postdocs to highlight their research, talk about its implications, and engage with industry members about its business applications. Meet your peers in the R&D Commons on the 4th floor—there will be refreshments! TBD

April 30, 2025

Managing Exploratory AI

Yongjoo Park

University of Illinois at Urbana-Champaign

1:00P

- 2:00P

Location

Add to Calendar 2025-04-30 13:00:00 2025-04-30 14:00:00 America/New_York Managing Exploratory AI Abstract: Today’s data science systems, ranging from batch jobs to interactive interfaces, are surprisingly fragile. Data scientists typically use dozens of libraries, but a single bug in any of them can destroy hours or even days of computation, causing significant pain. This issue has been widely discussed in the data science community and academic literature. Yet, no principled mechanisms have been proposed to address the issue. It may be puzzling to database researchers. Existing databases implement checkpointing to periodically save changes for future recovery. Why haven’t data science systems adopted it? Are there any unique properties that challenge the adoption? In this talk, I will first identify a core challenge: the lack of mechanisms for detecting data changes, a key premise of checkpointing. Unlike databases with centralized buffer pools, data science systems intentionally omit centralized data spaces, allowing individual libraries to use shared memory, GPUs, and remote machines. Changes across these diverse locations must still be identified. To address this, we are making exciting progress around one central theme: a nonintrusive state manager that behaves like conventional buffer pools without requiring data to be placed in a central location. The key idea is to construct a mathematical map of library-managed data—including data dependencies—using graphs. These graphs enable new algorithms to detect changes, save them incrementally, and restore states correctly. We are actively developing an open-source system, Kishu, to benefit all data practitioners.Bio: Yongjoo Park is an Assistant Professor in the School of Computing and Data Science at the University of Illinois at Urbana-Champaign. His research focuses on systems for data-intensive AI. Yongjoo is also a Chief Scientist of Keebo, a start-up company he co-founded based on his Ph.D. research. Yongjoo obtained a Ph.D. in Computer Science and Engineering from the University of Michigan, Ann Arbor. He is a recipient of 2018 SIGMOD Jim Gray Dissertation Honorable Mention and ACM SIGMOD 2023 Best Artifact Award Honorable Mention.-- For the zoom passcode, contact the organizer at [email protected] TBD

May 06, 2025

CSAIL Forum with Manish Raghavan: The role of information diversity in AI systems

Part Of

CSAIL Forum

12:00P

- 1:00P

Location

TBD

Add to Calendar 2025-05-06 12:00:00 2025-05-06 13:00:00 America/New_York CSAIL Forum with Manish Raghavan: The role of information diversity in AI systems Registration required: https://mit.zoom.us/meeting/register/GP_RXB5BSTy_Ubf3wNJwxQBio: Manish Raghavan is the Drew Houston (2005) Career Development Professor at the MIT Sloan School of Management and Department of Electrical Engineering and Computer Science. Before that, he was a postdoctoral fellow at the Harvard Center for Research on Computation and Society (CRCS). His research centers on the societal impacts of algorithms and AI. TBD

[Thesis Defense] Learning to infer causal structure with applications to molecular biology

Menghua (Rachel) Wu

2:00P

- 3:00P

Location

Kiva

Add to Calendar 2025-05-06 14:00:00 2025-05-06 15:00:00 America/New_York [Thesis Defense] Learning to infer causal structure with applications to molecular biology TBD

TBA

Sam Hopkins

CSAIL, EECS

Part Of

Theory of Computation (ToC) 2024 - 2025

4:15P

- 5:15P

Location

Refreshments at 4:00 PM

Add to Calendar 2025-05-06 16:15:00 2025-05-06 17:15:00 America/New_York TBA TBA TBD

May 14, 2025

Scalable Image AI via Self-designing Storage

Utku Sirin

Harvard University

1:00P

- 2:00P

Location

Add to Calendar 2025-05-14 13:00:00 2025-05-14 14:00:00 America/New_York Scalable Image AI via Self-designing Storage Abstract: Image AI has the potential to improve every aspect of human life, providing more productive and safer services and tools. Image AI is, however, very expensive. It costs millions of dollars to train and deploy a single model with hundreds of thousands of pounds of carbon emission. We identify that the root cause of the problem is a long-overlooked and largely unexplored dimension: storage. The majority of the cost of reading and processing an image depends on how it is stored on disk and in main memory, as storage determines how much data is moved and processed. Most images today are stored as JPEG files. But, JPEG is designed for the human eye; it aims to maximally compress images with minimal loss in visual quality. In contrast, we observe that, during image AI, it is AI algorithms that “see” the images, rather than humans. Furthermore, JPEG is a single design. AI problems, however, are diverse; every problem is unique in terms of how data should be stored and processed. Using a fixed design, such as JPEG, for all problems results in excessive data movement and wasteful image AI pipelines.This talk presents Image Calculator, a self-designing storage system that creates and manages storage for image AI tasks. Unlike state-of-the-art that uses a fixed storage format, Image Calculator builds a design space of thousands of storage formats, each capable of storing and representing data differently, at different training and inference speed, accuracy, and space trade-offs. Given an AI task, Image Calculator searches and finds the optimal storage format that minimizes training and inference times and maximizes accuracy. Image Calculator consists of two main components: (i) search & training, and (ii) model-serving. Search & training efficiently searches within the design space by using locality among storage formats. Formats that have similar features also perform similarly. It clusters information-dense formats and quickly identifies high-quality candidates with scalable search time. Model-serving deploys storage formats at inference servers. It exploits the inherent frequency structure in image data. It breaks images into pieces, i.e., frequency components, and processes images frequency by frequency instead of image by image as conventionally done. This allows dramatically reducing data communication between clients & servers with fast and efficient inference.We evaluate Image Calculator across a diverse set of datasets, tasks, models, and hardware. We show that Image Calculator can generate storage formats that reduce end-to-end inference and training time by up to 14.2x and consume space by up to 8.2x with little or no loss in accuracy for image classification, object detection, and instance segmentation, compared to state-of-the-art image storage formats, such as JPEG and its recent variants. Image Calculator’s storage formats reduce individual time components, such as PCIe time, by up to 271x. Image Calculator is even more successful on small hardware devices providing sub-millisecond inference on CPUs, making inference scalable and cheap on commodity hardware. Tailoring storage to AI tasks results in heavily compressed and specialized data. Despite that, we show that Image Calculator is able to reconstruct images with high visual quality. Image Calculator’s incremental computation scheme allows moving just enough data for every image, further reducing data movement cost, and providing fast and scalable inference serving.This talk will mainly be based on the following two papers:[1] The Image Calculator: 10x Faster Image-AI Inference by Replacing JPEG with Self-designing Storage Format. Utku Sirin, Stratos Idreos. SIGMOD, 2024 [Link][2] Frequency-Store: Scaling Image AI by A Column-Store for Images. Utku Sirin, Victoria Kauffman, Aadit Saluja, Florian Klein, Jeremy Hsu, Stratos Idreos. CIDR, 2025 [Link]Bio: Utku Sirin is a postdoctoral researcher at the Data Systems lab at Harvard University, advised by Stratos Idreos. Utku’s work on the Image Calculator reimagines Image AI through self-designing AI storage, which always takes the best shape given the AI context and goals, bringing 10x speedup end-to-end. Utku was awarded the Microsoft Research PhD Fellowship in 2017 and the Swiss National Science Foundation Postdoctoral Fellowship in 2021 and 2023. Utku has also been a winner of the ACM SIGMOD Students Research Competition and a recipient of an IEEE ICDE best reviewer award. Before joining Harvard, Utku obtained his PhD from the Data-Intensive Applications and Systems lab at EPFL, advised by Anastasia Ailamaki on hardware-conscious data systems. -- For the zoom passcode, contact the organizer at [email protected] TBD

Events

Current Seminar Series

April 19, 2025

April 22, 2025

CSAIL Forum with Prof Yoon Kim: Efficient and Expressive Architectures for Language Modeling

Part Of

Location

Thesis Defense: Pratyusha Sharma, "Discovering and Engineering the Computation Underlying Large Intelligent Agents"

Location

HCI Seminar - Cindy Bennett - Accessibility and Disability Considerations for Responsible AI

Part Of

Location

How to Securely Implement Cryptography in Deep Neural Networks

Part Of

Location

April 23, 2025

EECS Special Seminar: Tijana Zrnic, "AI-Assisted Approaches to Data Collection and Inference"

Part Of

Location

ML Tea: Do Large Language Model Benchmarks Test Reliability?

Part Of

Location

April 24, 2025

Revisiting Keyed-Verification Anonymous Credentials

Part Of

Location

Embodied Intelligence (EI) Joint Seminar Presentation

Part Of

Location

Learning, engineering, and targeting cell states in cancer

Part Of

Location

April 25, 2025

Foundations in Multimodal Mechanistic Interpretability (William Rudman, Brown University)

Location

April 26, 2025

New England Symposium on Graphics (NESG)

Location

April 28, 2025

Industry Innovators Expo

Location

April 29, 2025

CSAIL Alliances Student Poster Session

Location

April 30, 2025

Managing Exploratory AI

Location

May 06, 2025

CSAIL Forum with Manish Raghavan: The role of information diversity in AI systems

Part Of

Location

[Thesis Defense] Learning to infer causal structure with applications to molecular biology

Location

TBA

Part Of

Location

May 14, 2025

Scalable Image AI via Self-designing Storage

Location

Event Type

Impact Area

Research Area