ArticlesProjectsWeeklyCredentialsAbout

BERT Attention Visualiser

Loads bert-base-uncased and extracts per-layer, per-head attention weights for any sentence. Includes a matplotlib heatmap showing how token pairs attend to each other across all 12 layers.

nlpberttransformerspython

Python script that loads bert-base-uncased, runs a forward pass with output_attentions=True, and plots a 12×12 heatmap of attention weights — one cell per layer/head combination.

Source code
# BERT Attention Visualiser

Loads `bert-base-uncased` and visualises attention weights across all 12 layers and 12 heads.

## What it does

- Tokenises any input sentence with the BERT WordPiece tokenizer
- Runs a forward pass with `output_attentions=True`
- Plots a 3×4 grid of heatmaps (one per attention head) for any chosen layer
- Optionally zooms into a single token to show what it attends to
- Prints the top-k token pairs by attention weight

## Requirements

```
pip install transformers torch matplotlib seaborn
```

## Usage

```bash
python bert_attention_viz.py
```

Outputs PNG files in the current directory:

- `attention_layer5.png` — all 12 heads for layer 5
- `token_attention_l5_h3.png` — what "bank" (first occurrence) attends to in layer 5, head 3

## Try your own sentence

Edit the `SENTENCE` variable at the bottom of the script:

```python
SENTENCE = "The animal didn't cross the street because it was too tired."
```

Useful for testing coreference resolution — which token does "it" attend to most strongly?