# BERT Attention Visualiser
Loads `bert-base-uncased` and visualises attention weights across all 12 layers and 12 heads.
## What it does
- Tokenises any input sentence with the BERT WordPiece tokenizer
- Runs a forward pass with `output_attentions=True`
- Plots a 3×4 grid of heatmaps (one per attention head) for any chosen layer
- Optionally zooms into a single token to show what it attends to
- Prints the top-k token pairs by attention weight
## Requirements
```
pip install transformers torch matplotlib seaborn
```
## Usage
```bash
python bert_attention_viz.py
```
Outputs PNG files in the current directory:
- `attention_layer5.png` — all 12 heads for layer 5
- `token_attention_l5_h3.png` — what "bank" (first occurrence) attends to in layer 5, head 3
## Try your own sentence
Edit the `SENTENCE` variable at the bottom of the script:
```python
SENTENCE = "The animal didn't cross the street because it was too tired."
```
Useful for testing coreference resolution — which token does "it" attend to most strongly?