Circuit Tracer-Anthropic's open source internal decision tracking tool for AI models

What is Circuit Tracer

Circuit Tracer is an open-source tool launched by Anthropic for researching the internal workings of large language models. Circuit Tracer generates attribution graphs to reveal the internal steps a model undergoes when producing specific outputs. These attribution graphs help researchers track the model’s decision-making process, visualize relationships between features, and test different hypotheses. Circuit Tracer supports popular open-source models like Gemma and Llama, and provides an interactive visualization interface via Neuronpedia, making it easy for users to explore and analyze model behavior.

Main Features of Circuit Tracer

Generate Attribution Graphs: Reveals model decision paths, showing influence relationships between features and nodes.
Visualization and Interaction: Uses an interactive interface to intuitively view and manipulate attribution graphs, facilitating understanding and sharing.
Model Intervention: Modify feature values to observe changes in output, validating model behavior.
Support for Multiple Models: Compatible with mainstream models like Gemma and Llama, enabling comparative research.

Technical Principles of Circuit Tracer

Transcoders: Uses pre-trained transcoders to generate attribution graphs. A transcoder is a neural network component that converts a model’s internal features into a more understandable and interpretable form. Through transcoders, Circuit Tracer captures relationships between features and nodes within the model.
Direct Effect Computation: Circuit Tracer calculates the direct impact of each non-zero transcoder feature, transcoder error node, and input token on other non-zero transcoder features and output logits.
Graph Pruning: Prunes the generated graph. The pruning process removes nodes and edges with minimal influence, retaining only parts that significantly affect the model’s decisions. Pruning parameters (such as node and edge thresholds) are user-defined, controlling the graph’s complexity and clarity.
Interactive Visualization Interface: Provides a web-based interactive visualization interface, allowing users to view and manipulate attribution graphs directly in the browser. The interface supports node labeling, grouping, and annotation, enabling users to more intuitively understand and analyze the model’s internal mechanisms.

Project Address of Circuit Tracer

Project Website: https://www.anthropic.com/research/open-source-circuit-tracing
GitHub Repository: https://github.com/safety-research/circuit-tracer

Application Scenarios of Circuit Tracer

Model Behavior Research: Analyzes the model’s decision-making process using attribution graphs to understand the internal logic behind specific outputs.
Multilingual Model Analysis: Studies the internal representations of multilingual models (e.g., Llama), exploring cross-language processing mechanisms.
Multi-Step Reasoning Research: Analyzes model behavior in multi-step reasoning tasks, revealing the process and logic of step-by-step reasoning.
Model Optimization and Improvement: Tests different hypotheses using intervention features, validates whether certain model behaviors meet expectations, and optimizes model structure.
Education and Sharing: Uses the interactive visualization interface to clearly present complex model decision processes to others, facilitating teaching and communication.

What is Circuit Tracer

Main Features of Circuit Tracer

Technical Principles of Circuit Tracer

Project Address of Circuit Tracer

Application Scenarios of Circuit Tracer

Is ChatGPT Making Us Smarter or Just Dependent?

Google Al Edge Gallery-Google launched an AI app that supports running AI models offline on cell phones

Related articles