TMC #0008: Memex Simulator: Associative Trails vs. Keyword Search
Python implementation of Vannevar Bush's Memex concept from 'As We May Think' (1945). Demonstrates associative trail navigation, document linking, and the contrast between index-based keyword search and curated associative browsing: the logical precursor to hypertext and the web.
memexvannevar-bushhypertextinformation-retrievalassociative-trailspython
Python simulator of the Memex: Vannevar Bush's 1945 vision of a mechanised personal library with associative trails, the logical forerunner of hypertext and the World Wide Web.
What's in the code
memex.py: single self-contained file, no dependencies:
Document: dataclass:id,title,author,year,body,tags. The unit of storage in the Memex, analogous to a microfilm reel.TrailStep: a(doc_id, annotation)pair. The annotation is Bush's "marginal note": the researcher's comment on why this document is in the trail at this position.Trail: a named, ordered sequence ofTrailSteps. Bush's key invention: a reusable, shareable record of an associative reading path.Memex: the main class:add_doc(doc): add a document to the collectionlink(doc_id_a, doc_id_b, reason): create a bidirectional associative link between two documents with a stated reason (maps to Bush's "association code")create_trail(name, doc_ids, annotations): record a named associative pathfollow_trail(name): walk a trail step by step, printing each document summary and its annotationsearch(query): keyword search across title, body, and tags (returns a ranked list by match count)export_trail(name): serialise a trail to a shareable dict (Bush's "trail publishing" concept)
demo_search_vs_trail(): loads twelve 1945-era scientific paper excerpts, builds two trails, then runs the same query via keyword search and via trail navigation to show the structural difference. Search returns a ranked list; the trail returns a curated argument.
Running it
python3 memex.py
No dependencies. Python 3.10+. Outputs four sections:
- Corpus summary: the twelve documents loaded
- Keyword search: results for "computation" ranked by match count
- Trail: computation-to-cognition: Bush's associative path from Turing's computability to McCulloch-Pitts neurons, with annotations
- Trail export: the trail serialised as a shareable structure (Bush's "trail publishing")
Source code
#!/usr/bin/env python3
"""
memex.py
========
A simulator of Vannevar Bush's Memex concept.
Based on: Vannevar Bush, "As We May Think"
The Atlantic Monthly, July 1945.
The Memex is a hypothetical device — a mechanised personal library — with
one key invention that separates it from all prior filing systems:
ASSOCIATIVE TRAILS. A trail is a named, ordered, annotated sequence of
documents linked by the researcher's own judgment about how ideas connect.
Trails can be saved, shared, and published alongside research papers.
This simulator implements all five core Memex operations:
1. Store documents
2. Link documents associatively (Bush's "association codes")
3. Create named trails
4. Follow (navigate) trails step by step
5. Search by keyword (to contrast with trail navigation)
The central demo shows the difference between:
- Keyword search for "computation" → ranked list, no context
- Trail "computation-to-cognition" → curated argument, annotated path
"""
from __future__ import annotations
import re
from dataclasses import dataclass, field
from typing import NamedTuple
# ── Data structures ────────────────────────────────────────────────────────────
@dataclass
class Document:
"""
A single document in the Memex.
Represents one microfilm entry — a paper, article, book chapter, or note.
In Bush's design, these were stored on reels of microfilm. Here they are
plain Python objects.
"""
id: str
title: str
author: str
year: int
body: str
tags: list[str] = field(default_factory=list)
def summary(self, width: int = 72) -> str:
"""One-line summary for display."""
snippet = self.body[:120].replace("\n", " ")
if len(self.body) > 120:
snippet += "..."
return f"[{self.id}] {self.title} ({self.author}, {self.year})\n {snippet}"
class TrailStep(NamedTuple):
"""One step in an associative trail: a document + the researcher's note."""
doc_id: str
annotation: str = ""
@dataclass
class Trail:
"""
A named, ordered sequence of associative steps.
Bush (1945): 'The operator builds a trail of his interest through the
maze of materials available to him... It is exactly as though the
physical items had been gathered together from widely separated sources
and bound together to form a new book.'
"""
name: str
steps: list[TrailStep] = field(default_factory=list)
def export(self) -> dict:
"""Serialise to a shareable structure (Bush's 'trail publishing')."""
return {
"trail": self.name,
"steps": [
{"doc_id": s.doc_id, "annotation": s.annotation} for s in self.steps
],
}
# ── Memex ──────────────────────────────────────────────────────────────────────
class Memex:
"""
A Memex: a personal library with documents, associative links, and trails.
Bush imagined this as a physical desk with microfilm reels and projection
screens. This implementation uses Python dicts. The logical structure
is identical to what Bush described.
"""
def __init__(self, owner: str) -> None:
self.owner = owner
self._docs: dict[str, Document] = {}
self._links: dict[str, list[tuple[str, str]]] = {} # id → [(id, reason)]
self._trails: dict[str, Trail] = {}
# ── Storage ────────────────────────────────────────────────────────────
def add_doc(self, doc: Document) -> None:
"""Add a document to the Memex collection."""
self._docs[doc.id] = doc
self._links.setdefault(doc.id, [])
# ── Association ────────────────────────────────────────────────────────
def link(self, id_a: str, id_b: str, reason: str = "") -> None:
"""
Create a bidirectional associative link between two documents.
Bush called these 'association codes' — permanent marginal marks
that let the Memex snap directly from one document to the associated
one. This is the logical predecessor of the hyperlink.
"""
for x, y in ((id_a, id_b), (id_b, id_a)):
if x not in self._docs:
raise KeyError(f"Document '{x}' not in Memex")
self._links.setdefault(x, [])
# avoid duplicate links
if not any(existing_id == y for existing_id, _ in self._links[x]):
self._links[x].append((y, reason))
def get_links(self, doc_id: str) -> list[tuple[str, str]]:
"""Return all documents associated with doc_id, with reasons."""
return self._links.get(doc_id, [])
# ── Trails ─────────────────────────────────────────────────────────────
def create_trail(
self,
name: str,
doc_ids: list[str],
annotations: list[str] | None = None,
) -> Trail:
"""
Build a named associative trail.
Bush (1945): 'When numerous items have been joined together to form
a trail, they can be reviewed in turn... It is more than a
record... it is an entire new form of encyclopaedia.'
Unlike keyword search (which returns a ranked list), a trail is a
*curated sequence* — a preserved argument about how ideas connect.
"""
steps = []
for i, doc_id in enumerate(doc_ids):
if doc_id not in self._docs:
raise KeyError(f"Document '{doc_id}' not found in Memex")
annotation = annotations[i] if annotations and i < len(annotations) else ""
steps.append(TrailStep(doc_id=doc_id, annotation=annotation))
trail = Trail(name=name, steps=steps)
self._trails[name] = trail
return trail
def follow_trail(self, name: str) -> None:
"""
Walk a trail step by step, printing each document and its annotation.
This is the Memex's primary reading mode — associative navigation
rather than index lookup.
"""
if name not in self._trails:
raise KeyError(f"Trail '{name}' not found")
trail = self._trails[name]
print(f'\n Trail: "{trail.name}" ({len(trail.steps)} steps)')
print(" " + "─" * 60)
for i, step in enumerate(trail.steps, 1):
doc = self._docs[step.doc_id]
print(f"\n Step {i}/{len(trail.steps)}")
print(f" {doc.title}")
print(f" {doc.author}, {doc.year}")
snippet = doc.body[:160].replace("\n", " ")
if len(doc.body) > 160:
snippet += "..."
print(f' "{snippet}"')
if step.annotation:
print(f"\n >> Researcher's note: {step.annotation}")
def export_trail(self, name: str) -> dict:
"""Serialise a trail to a shareable dict (Bush's 'trail publishing')."""
if name not in self._trails:
raise KeyError(f"Trail '{name}' not found")
return self._trails[name].export()
# ── Search ─────────────────────────────────────────────────────────────
def search(self, query: str, top_n: int = 5) -> list[tuple[Document, int]]:
"""
Keyword search across title, body, and tags.
Returns documents ranked by number of query-term matches.
This is the INDEX-BASED approach Bush contrasted with trail
navigation. Search is fast and comprehensive but returns a flat
ranked list with no associative context. Compare with follow_trail()
to see the structural difference.
"""
tokens = set(re.findall(r"\w+", query.lower()))
results: list[tuple[Document, int]] = []
for doc in self._docs.values():
text = (doc.title + " " + doc.body + " " + " ".join(doc.tags)).lower()
score = sum(text.count(tok) for tok in tokens)
if score > 0:
results.append((doc, score))
results.sort(key=lambda x: x[1], reverse=True)
return results[:top_n]
# ── Reporting ──────────────────────────────────────────────────────────
def corpus_summary(self) -> None:
"""Print a summary of all documents in the Memex."""
print(
f"\n Memex of {self.owner} — {len(self._docs)} documents, "
f"{len(self._trails)} trails"
)
print(" " + "─" * 60)
for doc in self._docs.values():
print(
f" [{doc.id:8s}] {doc.year} {doc.title[:52]:<52s} "
f"{doc.author.split()[-1]}"
)
# ── Demo corpus ────────────────────────────────────────────────────────────────
# Twelve real 1936–1945 papers/articles summarised as excerpts.
# These are the kinds of documents a scientist's Memex would hold in 1945.
CORPUS: list[Document] = [
Document(
id="turing-1936",
title="On Computable Numbers, with an Application to the Entscheidungsproblem",
author="Alan M. Turing",
year=1936,
body=(
"Turing proposes the concept of a 'computing machine' — an abstract device with "
"a finite set of states, an infinite tape, and a read/write head. He shows that "
"some problems are not computable: no machine can decide them. The notion of a "
"universal machine that can simulate any other machine by reading its description "
"from the tape is introduced. Computation is formalised for the first time."
),
tags=["computation", "theory", "turing-machine", "decidability"],
),
Document(
id="shannon-1937",
title="A Symbolic Analysis of Relay and Switching Circuits",
author="Claude E. Shannon",
year=1937,
body=(
"Shannon's MIT master's thesis shows that Boolean algebra — the two-valued "
"logic of AND, OR, NOT — maps directly onto electrical switching circuits. "
"A circuit can implement any Boolean function. The equivalence between logic "
"and electricity is established. This is the theoretical basis for all digital "
"computers."
),
tags=["boolean-logic", "circuits", "switching", "computation"],
),
Document(
id="mcculloch-pitts-1943",
title="A Logical Calculus of the Ideas Immanent in Nervous Activity",
author="Warren S. McCulloch and Walter Pitts",
year=1943,
body=(
"The neuron is modelled as a threshold logic gate: it fires if and only if the "
"sum of its excitatory inputs meets or exceeds a threshold, and no inhibitory "
"input is active. Networks of such neurons can compute any Boolean function. "
"The brain is recast as a logical machine. Neural networks are connected to "
"formal logic and, implicitly, to Turing's computation."
),
tags=["neurons", "logic", "computation", "brain", "neural-networks"],
),
Document(
id="bush-1945",
title="As We May Think",
author="Vannevar Bush",
year=1945,
body=(
"Bush proposes the Memex: a mechanised personal library based on microfilm, "
"with associative trails linking documents. The central insight is that human "
"thought works by association, not by index. Trails are named sequences of "
"linked documents, annotated by the researcher, shareable with colleagues. "
"The Memex would extend the human memory and support creative thinking across "
"the growing mass of scientific literature."
),
tags=[
"memex",
"information-retrieval",
"associative-trails",
"knowledge",
"hypertext",
],
),
Document(
id="wiener-1943",
title="Behavior, Purpose and Teleology",
author="Norbert Wiener, Arturo Rosenblueth, Julian Bigelow",
year=1943,
body=(
"Wiener and colleagues propose that purposeful behaviour — in animals and "
"machines alike — can be understood as goal-directed feedback. A system with "
"a goal measures the gap between its current state and the desired state and "
"acts to reduce it. This is the foundational paper of cybernetics: the science "
"of communication and control in animals and machines."
),
tags=["cybernetics", "feedback", "control", "purpose", "teleology"],
),
Document(
id="von-neumann-1945",
title="First Draft of a Report on the EDVAC",
author="John von Neumann",
year=1945,
body=(
"Von Neumann describes a stored-program computer architecture with five organs: "
"central arithmetic, central control, memory, input, and output. The key "
"innovation is that instructions and data occupy the same memory. A program "
"is just numbers stored in memory. Loading a new program means writing new "
"numbers — no rewiring required. The fetch-decode-execute cycle is defined."
),
tags=[
"architecture",
"stored-program",
"computation",
"hardware",
"von-neumann",
],
),
Document(
id="rosenblueth-1945",
title="The Role of Models in Science",
author="Arturo Rosenblueth and Norbert Wiener",
year=1945,
body=(
"Rosenblueth and Wiener argue that models — simplified representations of "
"complex systems — are the central tool of science. A good model captures the "
"essential behaviour of a system while omitting irrelevant complexity. The "
"brain-as-computer metaphor is itself a model in this sense. The relationship "
"between mathematical models and physical reality is examined."
),
tags=["models", "science", "epistemology", "cybernetics"],
),
Document(
id="bush-sci-frontiers-1945",
title="Science — The Endless Frontier",
author="Vannevar Bush",
year=1945,
body=(
"Bush's report to President Truman proposing the creation of a National "
"Science Foundation. The argument: basic research is the seed corn of applied "
"science and technology. Government should fund research with no immediate "
"application because it creates the knowledge base from which practical "
"advances eventually spring. This report shaped American science policy for "
"decades."
),
tags=["science-policy", "basic-research", "nsf", "bush", "government"],
),
Document(
id="kolmogorov-1941",
title="The Local Structure of Turbulence in Incompressible Viscous Fluid",
author="Andrei N. Kolmogorov",
year=1941,
body=(
"Kolmogorov derives the scaling laws of turbulence from first principles. "
"At intermediate length scales (the 'inertial subrange'), the energy spectrum "
"of turbulent flow follows a power law E(k) ~ k^{-5/3}. The result is "
"universal — independent of the large-scale forcing and small-scale dissipation "
"mechanisms. This is one of the foundational results of fluid mechanics."
),
tags=["turbulence", "fluid-mechanics", "physics", "scaling-laws"],
),
Document(
id="penrose-1945",
title="The Elementary Statistics of Majority Voting",
author="Lionel S. Penrose",
year=1945,
body=(
"Penrose analyses the mathematics of majority voting in bodies where members "
"represent groups of different sizes. The optimal voting weight for a member "
"is proportional to the square root of the population they represent — a "
"result now called the Penrose square root law. This is a foundational result "
"in the theory of weighted voting and collective decision-making."
),
tags=["voting", "mathematics", "statistics", "decision-theory"],
),
Document(
id="craik-1943",
title="The Nature of Explanation",
author="Kenneth Craik",
year=1943,
body=(
"Craik proposes that the human mind works by constructing internal models of "
"the world and running simulations on them. Thought is a process of model "
"manipulation. Prediction is possible because the model captures the causal "
"structure of the domain. This anticipates the concept of mental models in "
"cognitive science and connects to the computational theory of mind."
),
tags=["cognition", "mental-models", "simulation", "brain", "computation"],
),
Document(
id="post-1944",
title="Recursively Enumerable Sets and Their Decision Problems",
author="Emil L. Post",
year=1944,
body=(
"Post independently develops a model of computation equivalent to Turing's "
"and extends the theory of decidability. He introduces the concept of "
"recursively enumerable sets and proves that many natural decision problems "
"are undecidable. The Post correspondence problem is introduced as a simple "
"example of an undecidable problem useful in proofs of undecidability."
),
tags=["computation", "decidability", "recursion", "theory", "post"],
),
]
# ── Demo ───────────────────────────────────────────────────────────────────────
def build_demo_memex() -> Memex:
"""Construct a Memex loaded with twelve 1936–1945 papers."""
m = Memex(owner="Vannevar Bush (simulated)")
for doc in CORPUS:
m.add_doc(doc)
# Associative links — Bush's "association codes"
m.link(
"turing-1936",
"shannon-1937",
"Shannon's circuits implement Turing's Boolean operations in hardware",
)
m.link(
"turing-1936",
"mcculloch-pitts-1943",
"MP neurons compute the same Boolean functions as Turing machines",
)
m.link(
"turing-1936",
"post-1944",
"Post and Turing independently proved the same limits of computability",
)
m.link(
"shannon-1937",
"von-neumann-1945",
"Von Neumann's stored-program machine is built from Shannon's switching logic",
)
m.link(
"mcculloch-pitts-1943",
"wiener-1943",
"Wiener's feedback and MP neurons both appeared in 1943; Wiener influenced both",
)
m.link(
"mcculloch-pitts-1943",
"craik-1943",
"Craik's mental models and MP neurons are parallel 1943 theories of mind as mechanism",
)
m.link(
"wiener-1943",
"rosenblueth-1945",
"Rosenblueth was Wiener's closest collaborator; these papers form the core of cybernetics",
)
m.link(
"bush-1945",
"bush-sci-frontiers-1945",
"Both 1945 Bush papers: one on organising knowledge, one on funding its creation",
)
m.link(
"von-neumann-1945",
"mcculloch-pitts-1943",
"Von Neumann cited McCulloch-Pitts; the EDVAC report uses neural-network language",
)
m.link(
"bush-1945",
"von-neumann-1945",
"Both July 1945: Bush on navigating knowledge, von Neumann on computing it",
)
return m
def demo_search_vs_trail(m: Memex) -> None:
"""
The central contrast: keyword search vs. associative trail navigation.
Both operate on the same corpus. Search returns a ranked list.
The trail returns a curated argument.
"""
# ── Trail 1: computation-to-cognition ─────────────────────────────────
m.create_trail(
name="computation-to-cognition",
doc_ids=[
"turing-1936",
"shannon-1937",
"mcculloch-pitts-1943",
"craik-1943",
"wiener-1943",
],
annotations=[
"Start here: Turing formalises computation abstractly, proves limits",
"Shannon shows those abstractions can be wired in silicon",
"McCulloch and Pitts show the same logic describes neurons",
"Craik independently argues the mind runs internal simulations — models",
"Wiener unifies: feedback, purpose, and control in brain and machine alike",
],
)
# ── Trail 2: the 1945 turning point ───────────────────────────────────
m.create_trail(
name="1945-turning-point",
doc_ids=["von-neumann-1945", "bush-1945", "bush-sci-frontiers-1945"],
annotations=[
"June 1945: von Neumann describes the stored-program machine — how to compute",
"July 1945: Bush describes the Memex — how to organise and navigate knowledge",
"November 1945: Bush tells Truman who should fund it all",
],
)
# ── Keyword search ────────────────────────────────────────────────────
query = "computation brain"
print(f'\n KEYWORD SEARCH for "{query}"')
print(" (Returns a ranked list — fast but no associative context)")
print(" " + "─" * 60)
results = m.search(query, top_n=5)
for rank, (doc, score) in enumerate(results, 1):
print(f" #{rank} (score {score:3d}) [{doc.id}] {doc.title[:55]}")
# ── Trail navigation ──────────────────────────────────────────────────
print('\n\n TRAIL NAVIGATION: "computation-to-cognition"')
print(" (Returns a curated argument — Bush's key invention)")
m.follow_trail("computation-to-cognition")
print('\n\n TRAIL NAVIGATION: "1945-turning-point"')
m.follow_trail("1945-turning-point")
# ── Trail export ──────────────────────────────────────────────────────
print("\n\n TRAIL EXPORT (Bush's 'trail publishing' — shareable structure):")
print(" " + "─" * 60)
exported = m.export_trail("1945-turning-point")
print(f' Trail: "{exported["trail"]}"')
for step in exported["steps"]:
print(f" → {step['doc_id']}: {step['annotation'][:60]}")
def main() -> None:
print("=" * 64)
print("Memex Simulator")
print("Based on: Vannevar Bush, 'As We May Think' (1945)")
print("=" * 64)
m = build_demo_memex()
print("\n── Corpus ───────────────────────────────────────────────────")
m.corpus_summary()
print("\n── Search vs. Trail Navigation ──────────────────────────────")
print("Bush's central argument: trails preserve the associative")
print("reasoning that keyword search destroys.")
demo_search_vs_trail(m)
print()
print("── Summary ──────────────────────────────────────────────────")
print("Keyword search: fast, comprehensive, returns ranked list.")
print("Associative trail: slow (human-curated), contextual, preserves")
print("the argument. Bush believed trails were the superior form of")
print("knowledge transmission. In 1989, Tim Berners-Lee wired them")
print("into the fabric of the internet as hyperlinks.")
if __name__ == "__main__":
main()