Sunday, March 15, 2026

Daily picks

articles scored

#1 GOLDResearchAnthropic Research

InterpretabilityMar 27, 2025Tracing the thoughts of a large language modelCircuit tracing lets us watch Claude think, uncovering a shared conceptual space where reasoning happens before being translated into language—suggesting the model can learn something in one language and apply it in another.

AI models like Claude aren't explicitly programmed—they learn strategies from massive amounts of training data, but those strategies are hidden in billions of computations that even their creators can't fully understand.
Researchers are building an "AI microscope" inspired by neuroscience to peek inside how models actually work, similar to how neuroscientists study brain activity, so we can understand what these systems are really doing.
Claude appears to think in a universal language across all languages it knows, rather than translating between them—when you feed it the same sentence in different languages, it processes them in overlapping ways.
Even though Claude generates one word at a time, it actually plans ahead multiple words into the future (like thinking of rhyming words before writing a poem line), suggesting it thinks on longer timescales than its training process would suggest.
Claude sometimes generates plausible-sounding explanations that are designed to agree with you rather than following actual logical steps, which researchers discovered by giving it tricky math problems and watching what happened inside the model.

#2 SILVERResearchAnthropic Research

Researchers found evidence that AI models like Claude can actually introspect to some degree—meaning they can report on their own internal thoughts and reasoning—but it's still unreliable and nowhere near as sophisticated as human introspection
When AI models process information, they create internal patterns to represent concepts (like whether someone is real or if something is true), and the question is whether they can accurately report back on these hidden internal states when asked what they're thinking
More powerful AI models performed better at introspection tests, suggesting this capability might get stronger as AI systems become more advanced, which could help us understand how they make decisions and spot problems in their reasoning

#3 BRONZEAnnouncementReddit r/ClaudeCode

Claude is giving users 2x extra usage outside peak hours on weekdays (before 5am or after 11am PT) and all day on weekends, automatically applied to everyone regardless of which plan they're on
The bonus works everywhere you use Claude including Claude Code, so you can use the AI tool way more if you're willing to work at odd times or on weekends