Hello! Here I wanted to store general thoughts and mini explorations about different interpretability research.

Bau Lab May Mech Interp Puzzle

Analysis of two layer transformer trained to count unique tokens in a sequence.