Hi everyone! I'm incoming PhD student at the Department of Computer Science of University of Toronto, where I'll be working with Professor Ashton Anderson. My research focus on generative models, human-AI alignment, and mechanistic interpretability. I am interested in developing AI systems that are effective and interpretable by human beings. In my daily life, I am deeply enthusiastic in the sports of Go, basketball, and tennis. Looking forward to connecting with you!
Among the many tasks that Large Language Models (LLMs) have revolutionized is text classification. Current text classification paradigms, however, rely solely on the output of the final layer in the LLM...
Read more
Among the many tasks that Large Language Models (LLMs) have revolutionized is text classification. Current text classification paradigms, however, rely solely on the output of the final layer in the LLM, with the rich information contained in internal neurons largely untapped. In this study, we present SPIN: a model-agnostic framework that sparsifies and integrates internal neurons of intermediate layers of LLMs for text classification. Specifically, SPIN sparsifies internal neurons by linear probing-based salient neuron selection layer by layer, avoiding noise from unrelated neurons and ensuring efficiency. The cross-layer salient neurons are then integrated to serve as multi-layered features for the classification head. Extensive experimental results show our proposed SPIN significantly improves text classification accuracy, efficiency, and interpretability.
Read less
Zhenwei Tang, Difan Jiao, Reid McIlroy-Young, Jon Kleinberg, Siddhartha Sen, Ashton Anderson
There are an increasing number of domains in which artificial intelligence (AI) systems both surpass human ability and accurately model human behavior...
Read more
There are an increasing number of domains in which artificial intelligence (AI) systems both surpass human ability and accurately model human behavior. This introduces the possibility of algorithmically-informed teaching in these domains through more relatable AI partners and deeper insights into human decision-making. Critical to achieving this goal, however, is coherently modeling human behavior at various skill levels. Chess is an ideal model system for conducting research into this kind of human-AI alignment, with its rich history as a pivotal testbed for AI research, mature superhuman AI systems like AlphaZero, and precise measurements of skill via chess rating systems.
Read less
Zhenwei Tang, Difan Jiao, Blair Yang, Ashton Anderson
The rapid advancement of large vision-language models (VLMs) has introduced challenges in evaluating their reasoning across multiple modalities...
Read more
The rapid advancement of large vision-language models (VLMs) has introduced challenges in evaluating their reasoning across multiple modalities. Existing benchmarks provide limited insights into how models understand and reason over semantically equivalent information across modalities, which is crucial because a robust model should demonstrate consistent comprehension regardless of how information is represented. To address this gap, we introduce SEAM, a benchmark dataset for cross-modal reasoning that ensures semantically equivalent inputs are presented in distinct and standardized notations. By employing fundamentally distinct notation systems across modalities, in contrast to OCR-based image-text pairing, our benchmark provides a rigorous assessment of the textual-symbolic versus visual-spatial reasoning capabilities of VLMs.
Read less
[Under Review] Understanding Mechanisms of Skill Adaptation in Generative Models: Chess as a Model System
Difan Jiao, George Eilender, Zhenwei Tang, Ashton Anderson
TMLR 2025 Submission
Generative models exhibit a remarkable ability to adapt their outputs to different skill levels, ranging from beginner to expert in various domains...
Read more
Generative models exhibit a remarkable ability to adapt their outputs to different skill levels, ranging from beginner to expert in various domains. However, understanding the mechanisms behind skill adaptation remains an open challenge. We address this gap by introducing chess as a model system, leveraging its well-defined structure and clear strength gradients to investigate how Maia-2, a chess model that generates human-like next moves across varying strengths, internally represents and adapts to different skill levels. We start by proposing two possible but mutually exclusive skill adaptation mechanisms: the model dynamically adjusts its internal concept understanding to match different skill levels, or the model maintains a consistent internal understanding while modulating how it externalizes that understanding.
Read less
[Under Review] Learning to Imitate with Less: Efficient Individual Behavior Modeling in Chess
Zhenwei Tang, Difan Jiao, Eric Xue, Reid McIlroy-Young, Jon Kleinberg, Siddhartha Sen, Ashton Anderson
TMLR 2025 Submission
As humans seek to collaborate with, learn from, and better understand artificial intelligence systems, developing AIs that can accurately emulate individual decision-making becomes increasingly important...
Read more
As humans seek to collaborate with, learn from, and better understand artificial intelligence systems, developing AIs that can accurately emulate individual decision-making becomes increasingly important. Chess, a long-standing AI benchmark with precise skill measurement, offers an ideal testbed for human-AI alignment. However, existing approaches to modeling human behavior require large amounts of data from each individual, making them impractical for new or sparsely represented users. In this work, we introduce Maia4All, a framework designed to learn and adapt to individual decision-making styles efficiently, even with limited data. Maia4All achieves this through a two-stage optimization process: (1) the enrichment step, which bridges population and individual-level human behavior modeling with a prototype-enriched model, and (2) the democratization step, which leverages strengths or prototypes to initialize and refine individual embeddings with minimal data.
Read less