Relational Database Design with MySQL: Getting the Schema Right

MySQL is making serious inroads into production systems. It is fast, it is free and the engine is stable enough for real workloads. At Motorola we are evaluating it for our internal reporting and device inventory systems alongside Oracle and Sybase.

What I want to write about here is not MySQL itself but relational database design — the schema decisions that determine whether your database is fast and maintainable or slow and fragile. A good schema matters more than which database engine you pick.

Normalisation: The Foundation

Normalisation is the process of structuring your schema to eliminate redundancy and ensure data integrity. The normal forms give you a vocabulary for this, but the practical intuition is: each fact should be stored in exactly one place.

Consider a naive device inventory table:

CREATE TABLE devices (
    id          INT,
    device_name VARCHAR(64),
    device_type VARCHAR(32),
    vendor_name VARCHAR(64),
    vendor_contact VARCHAR(128),
    location    VARCHAR(64),
    ip_address  VARCHAR(15)
);

Insert 50 Cisco routers and vendor_name = 'Cisco' appears 50 times. If Cisco changes their support contact, you update 50 rows. Miss one and the data is inconsistent. This is a third normal form violation — vendor_contact depends on vendor_name, not on id.

Normalised:

CREATE TABLE vendors (
    id      INT PRIMARY KEY AUTO_INCREMENT,
    name    VARCHAR(64) NOT NULL,
    contact VARCHAR(128)
);

CREATE TABLE device_types (
    id   INT PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(32) NOT NULL
);

CREATE TABLE devices (
    id             INT PRIMARY KEY AUTO_INCREMENT,
    device_name    VARCHAR(64) NOT NULL,
    device_type_id INT NOT NULL,
    vendor_id      INT NOT NULL,
    location_id    INT NOT NULL,
    ip_address     VARCHAR(15) NOT NULL UNIQUE,
    FOREIGN KEY (device_type_id) REFERENCES device_types(id),
    FOREIGN KEY (vendor_id)      REFERENCES vendors(id),
    FOREIGN KEY (location_id)    REFERENCES locations(id)
);

Now vendor_contact lives in one row. Updates are trivial. Queries that need vendor information join to the vendors table.

Indexes: Where Performance Actually Comes From

MySQL is fast but indexes are what make queries fast at scale. A table scan on 100,000 rows is slow regardless of how fast the disk is. An index lookup is fast regardless of table size.

The B-tree index is the default. MySQL maintains a sorted tree structure over the indexed column. Queries that filter or sort on that column use the tree rather than scanning every row.

-- This query without an index scans every row in devices
SELECT * FROM devices WHERE ip_address = '192.168.1.50';

-- With an index, it is a tree lookup
CREATE INDEX idx_devices_ip ON devices(ip_address);

The rules for effective indexing:

Index columns you filter on frequently. WHERE device_type_id = 5 is fast if device_type_id is indexed.

Index columns you join on. Foreign key columns should almost always be indexed. MySQL does not automatically index foreign keys.

Index columns you order by. ORDER BY last_seen DESC uses an index on last_seen.

Composite indexes match left-to-right. An index on (vendor_id, device_type_id) supports queries filtering on vendor_id alone OR on vendor_id AND device_type_id. It does not support filtering on device_type_id alone.

Do not over-index. Every index slows down INSERT, UPDATE and DELETE because MySQL must maintain the index alongside the data. For read-heavy reporting tables, indexes are almost always worth it. For write-heavy event logs, be selective.

The EXPLAIN Command

MySQL's EXPLAIN shows you how the query planner executes a query. This is essential for diagnosing slow queries:

EXPLAIN SELECT d.device_name, v.name AS vendor, l.name AS location
FROM devices d
JOIN vendors  v ON d.vendor_id   = v.id
JOIN locations l ON d.location_id = l.id
WHERE d.device_type_id = 3
ORDER BY d.device_name;

EXPLAIN output:

+----+--------+-------+------+---------------------+---------------------+
| id | table  | type  | rows | key                 | Extra               |
+----+--------+-------+------+---------------------+---------------------+
|  1 | d      | ref   |  47  | idx_devices_type    | Using filesort      |
|  1 | v      | eq_ref|   1  | PRIMARY             |                     |
|  1 | l      | eq_ref|   1  | PRIMARY             |                     |
+----+--------+-------+------+---------------------+---------------------+

The type column tells you how the join is performed:

ALL — full table scan. Generally bad. Add an index.
ref — index lookup on a non-unique key. Good.
eq_ref — index lookup on a unique/primary key. Best for joins.

Using filesort in Extra means MySQL sorts the results in a temporary file because no index matches the ORDER BY. Adding an index on device_name may eliminate this.

Transactions

MySQL with the InnoDB storage engine supports transactions. MyISAM (the older default engine) does not. For any data that needs consistency guarantees, InnoDB is the right choice.

-- InnoDB supports proper transactions
CREATE TABLE events (
    id         INT PRIMARY KEY AUTO_INCREMENT,
    device_id  INT NOT NULL,
    event_type VARCHAR(32) NOT NULL,
    occurred   DATETIME NOT NULL,
    FOREIGN KEY (device_id) REFERENCES devices(id)
) ENGINE=InnoDB;

A transaction example — decommissioning a device should remove it from inventory and record the event atomically:

START TRANSACTION;

INSERT INTO events (device_id, event_type, occurred)
VALUES (47, 'DECOMMISSIONED', NOW());

DELETE FROM device_assignments WHERE device_id = 47;

DELETE FROM devices WHERE id = 47;

COMMIT;

If any statement fails, ROLLBACK reverts all three changes. Without transactions, a failure halfway through leaves the database in an inconsistent state — device deleted but event not recorded, or vice versa.

MyISAM is faster for read-heavy workloads because it does not carry transaction overhead. If you are building a read-mostly reporting table and do not need transactions, MyISAM is a reasonable choice. For anything where writes need to be atomic, use InnoDB.

A Practical Schema: Device Inventory

Pulling the above together into a working schema for the system we are building:

CREATE TABLE locations (
    id   INT PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(64) NOT NULL,
    site VARCHAR(64) NOT NULL
);

CREATE TABLE vendors (
    id      INT PRIMARY KEY AUTO_INCREMENT,
    name    VARCHAR(64) NOT NULL,
    support_contact VARCHAR(128)
);

CREATE TABLE device_types (
    id          INT PRIMARY KEY AUTO_INCREMENT,
    name        VARCHAR(32) NOT NULL,
    description VARCHAR(256)
);

CREATE TABLE devices (
    id             INT PRIMARY KEY AUTO_INCREMENT,
    hostname       VARCHAR(64) NOT NULL,
    ip_address     VARCHAR(15) NOT NULL,
    device_type_id INT NOT NULL,
    vendor_id      INT NOT NULL,
    location_id    INT NOT NULL,
    added          DATETIME NOT NULL DEFAULT NOW(),
    last_seen      DATETIME,
    notes          TEXT,
    UNIQUE KEY unique_ip (ip_address),
    FOREIGN KEY (device_type_id) REFERENCES device_types(id),
    FOREIGN KEY (vendor_id)      REFERENCES vendors(id),
    FOREIGN KEY (location_id)    REFERENCES locations(id)
) ENGINE=InnoDB;

CREATE INDEX idx_devices_last_seen    ON devices(last_seen);
CREATE INDEX idx_devices_location     ON devices(location_id);
CREATE INDEX idx_devices_vendor_type  ON devices(vendor_id, device_type_id);

CREATE TABLE events (
    id         INT PRIMARY KEY AUTO_INCREMENT,
    device_id  INT NOT NULL,
    event_type VARCHAR(32) NOT NULL,
    severity   TINYINT NOT NULL DEFAULT 0,
    occurred   DATETIME NOT NULL,
    detail     TEXT,
    FOREIGN KEY (device_id) REFERENCES devices(id)
) ENGINE=InnoDB;

CREATE INDEX idx_events_device   ON events(device_id);
CREATE INDEX idx_events_occurred ON events(occurred);
CREATE INDEX idx_events_severity ON events(severity, occurred);

The composite index on (severity, occurred) supports the common query pattern: give me all HIGH severity events in the last 24 hours, ordered by time.

A Note on Data Types

MySQL is permissive about data types in a way that can mask bugs. VARCHAR(15) for an IP address stores a string, which means nothing prevents inserting 'not an ip'. For small teams with discipline this is fine. For larger systems consider adding a CHECK constraint or validating in application code before inserting.

DATETIME stores date and time without timezone information. All our monitoring timestamps are in UTC. Mixing timezones in the same column is a debugging problem best avoided — pick UTC and stick to it everywhere.

Where MySQL Is Right Now

MySQL 3.22 is the current stable release. It is genuinely fast — faster than Oracle for simple read-heavy workloads in our benchmarks. The query planner is less sophisticated than Oracle's but for straightforward queries with proper indexes the performance is excellent.

The missing features versus Oracle: no stored procedures, no views, no subqueries in MySQL 3.x. These are coming in future versions but if you need them now, that is a constraint to factor in. We work around the lack of subqueries by using temporary tables and staging the logic in the application layer.

For systems where budget matters and the query patterns are relatively straightforward, MySQL is a serious production option. Get the schema right and it is very fast.