What is a Primary Key in a Database? Types, Rules & More

If you are building a database for the first time or setting up your first table relationships, the primary key is the most important decision you will make.

Because, if get it wrong and you’ll need to do painful migrations six months from now wondering why your joins are slow and your data has duplicates.

I'm Jonathan, co-founder of ChartDB, a database diagram and data modeling tool. I spend most building database schemas, and one thing I see broken more than anything else is the primary key. Sometimes its missing, chosen wrong, sized wrong, or reused in ways that quietly corrupt data months later.

That’s also whats led me to build the product that makes schema visualization easier for developers.

In this guide I'll walk you through what a primary key actually is, the two rules it has to follow, when to use a natural key vs a surrogate key, how it directly affects query performance, and the exact SQL to set one up in PostgreSQL, MySQL, and SQL Server.

I've also covered the mistakes I see teams make repeatedly so you don't have to learn them the hard way.

What is Primary Key in Database?

In a database, a primary key is a column that gives every row a unique identifier. You can think of it like a Social Security Number for a person or the Vehicle Identification Number (VIN) on a car. It's a one-of-a-kind label that guarantees you can find and work with a specific record without any confusion.

This identifier is what makes a row distinct from all others. It must be unique, and it can never be empty.

The two unbreakable rules

Every primary key lives by two simple, non-negotiable rules. These constraints are the bedrock of data integrity and are enforced automatically by the database system itself.

Property	Description	Why it matters
Uniqueness	No two rows in the same table can ever share the same primary key value.	This is your first line of defense against duplicate data. Every record is one-of-a-kind.
Non-Null	A primary key column cannot contain any empty or missing (NULL) values.	Every single row has an identifier. No record is untraceable or "anonymous."

These rules are not suggestions. They are how modern relational databases are built. Well-designed production databases on platforms like PostgreSQL, MySQL, and SQL Server rely on primary keys to maintain data integrity.

A primary key is the column that makes every row special. Without it, your data is just a crowd; with it, every record has a name and an address. These relationships between rows and tables are exactly what an ER diagram maps out visually.

Choosing your identifier: natural vs. surrogate keys

When you're designing a database, one of the first decisions you'll make is how to identify your data. This brings you to a fork in the road: do you use a natural key or a surrogate key? This choice has long-lasting effects on your system's stability and flexibility.

A natural key is an identifier that already exists in the real world and has business meaning. Think of a product's SKU, a book's ISBN, or a user's email address. These values are inherently unique to the data they describe.

A surrogate key is completely artificial. It's a value generated by the database with no business meaning whatsoever. Its sole purpose is to give a row a unique ID. The most common examples are auto-incrementing integers (1, 2, 3...) or a universally unique identifier (UUID).

The stability problem

At first glance, using a natural key feels intuitive. Why not use an email address as the primary key for a Users table? It's already unique, right? The problem is that real-world data changes. What happens when a user updates their email address?

If that email is your primary key, changing it sets off a dangerous chain reaction. Every other table that references that user (Orders, LoginHistory, SupportTickets) is now pointing to an old, non-existent key. You'd have to perform a series of risky cascading updates across your entire database. It's a maintenance nightmare waiting to happen.

This flowchart lays out the rules for what can be considered for a primary key.

As the diagram shows, a column must pass two tests to qualify: it must be unique for every row, and it can never be empty (null).

Why surrogate keys usually win

This is why surrogate keys have become the standard in modern database design. A system-generated user_id is immutable; it's set once and never changes. If a user updates their email, you simply change the value in the email column of the Users table. Every relationship built on user_id stays intact.

The debate over which key type is "better" is a classic one in database circles. In practice, most modern production systems favor surrogate keys because of the stability they provide.

Here's a direct comparison of the two approaches.

Natural keys vs surrogate keys: a comparison

Attribute	Natural key	Surrogate key
Origin	Comes from real-world business data (e.g., email, SSN).	Generated by the database system (e.g., auto-increment integer, UUID).
Meaning	Has intrinsic business meaning.	Has no business meaning; it's just an identifier.
Stability	Can change if the business data changes, causing update complexities.	Immutable. It never changes once created.
Performance	Can be inefficient if the key is large (e.g., a long string).	Typically, a small integer, which means faster joins and indexing.
Use case	Good for static, rarely changing data like country codes.	The standard for most tables, especially those with user-generated or mutable data.

Choosing a surrogate key means you're decoupling your database's internal structure from the unpredictable nature of business data.

A natural key is tied to the real world, which is messy and subject to change. A surrogate key provides a stable, internal anchor that is immune to outside volatility.

By separating the identifier from the business data itself, you build a more maintainable system. Understanding these trade-offs is a core principle of effective data modeling and its types.

How primary keys drive database performance

Primary keys are more than a way to keep your data organized. They directly affect how fast your database runs. When you define a primary key, the database automatically creates a special structure called an index on that column. This is one of the biggest performance gains available to you.

Think of it like the index at the back of a thick textbook. If you wanted to find a specific topic, you wouldn't read the book from cover to cover. You'd flip to the index, find the term, and get the exact page number. A primary key index does the same thing for your data, letting the database find any row almost instantly.

What happens without a primary key

Without a primary key and its index, the database is forced to do a full table scan. This is the digital equivalent of reading every single page of that textbook just to find one piece of information.

On a small table with a few hundred rows, you might not notice the delay. But on a table with millions or billions of records, a full table scan becomes painfully slow, causing application timeouts and a bad user experience.

Faster lookups and joins

The primary key's index pays off most during data lookups. When you run a query like SELECT * FROM Orders WHERE OrderID = 12345;, the database doesn't scan the whole table. It uses the index to pinpoint the exact location of that record and retrieve it in milliseconds.

This speed matters most for JOIN operations. When you link an Orders table to a Customers table on the CustomerID primary key, the database can match rows between them efficiently. This is also why surrogate keys (like simple integers) are preferred for performance: it's far faster for a computer to compare numbers than to compare long text strings.

If you need to convert the same primary key syntax between databases, the SQL Dialect Converter handles that instantly.

A well-designed primary key strategy can improve query performance by an order of magnitude for lookups and joins compared to tables without primary keys.

This automatic indexing is a core concept in database design. To go deeper into how different indexes work, check out our guide on database indexes and how to find missing ones. Getting your primary keys right is a core principle for building fast, scalable applications.

Implementing primary keys in SQL with code examples

Now for the practical part. The syntax for defining a primary key is consistent across major databases like PostgreSQL, MySQL, and SQL Server, so what you learn here applies broadly.

If you already have a schema written in SQL and want to visualize it, you can also generate an ERD directly from your SQL.

Defining a primary key during table creation

The most common time to set up a primary key is when you create the table. Using the CREATE TABLE statement, you define your primary key up front so every record is uniquely identified from the start.

Let's build a simple Users table with a surrogate key, an integer that automatically counts up with each new entry.

PostgreSQL / SQL Server Example (Auto-Incrementing Integer)
CREATE TABLE Users ( user_id INT PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY, -- PostgreSQL syntax -- For SQL Server, use: user_id INT PRIMARY KEY IDENTITY(1,1), username VARCHAR(50) NOT NULL, email VARCHAR(100) UNIQUE NOT NULL );

MySQL Example (Auto-Incrementing Integer)
CREATE TABLE Users ( user_id INT PRIMARY KEY AUTO_INCREMENT, username VARCHAR(50) NOT NULL, email VARCHAR(100) UNIQUE NOT NULL );

In both examples, user_id is the primary key. AUTO_INCREMENT (MySQL) or IDENTITY (SQL Server/PostgreSQL) tells the database to generate a fresh, unique ID for every new user. This saves you the headache of manually assigning and tracking IDs.

Watch out for the INT limit on high-volume tables. A 4-byte integer maxes out at 2,147,483,647. One company hit that limit and suffered 22 hours of downtime. For tables that could grow large, BIGINT is a safer choice.

Adding a primary key to an existing table

If you inherit a table that was built without a primary key, you can add one using the ALTER TABLE statement.

Not sure how to write it? The Text to SQL tool lets you describe what you need in plain English and generates the exact query for you. Before you run the command, check that the column you want to use already contains unique, non-null values.

Before you run the command, check that the column you want to use already contains unique, non-null values. If it doesn't, the database will throw an error.

ALTER TABLE Orders
ADD PRIMARY KEY (order_id);
This modifies the Orders table and promotes order_id to its primary key. The database will automatically create an index on the column, speeding up any queries that look up orders by their ID.

Creating a composite primary key

Sometimes a single column isn't enough to uniquely identify a record. A composite primary key solves this by combining two or more columns. You'll see these in "junction" or "linking" tables, which connect two other tables in a many-to-many relationship. To learn more about this, I have created a database relationship guide.

Take a Product_Categories table that links products to categories. A single product might belong to several categories (e.g., "Electronics" and "Laptops"), and a single category contains many products.

CREATE TABLE Product_Categories ( product_id INT, category_id INT, -- Other columns can go here PRIMARY KEY (product_id, category_id), FOREIGN KEY (product_id) REFERENCES Products(product_id), FOREIGN KEY (category_id) REFERENCES Categories(category_id) );
Here, neither product_id nor category_id is unique on its own. But the combination of the two is. This composite key enforces the rule that you can't assign the same product to the same category more than once. If you want to understand how the parent-child side of this works visually, see how to create a one-to-many relationship.

Common pitfalls and best practices

Knowing what a primary key is and using it effectively are two different things. A few missteps early on can snowball into performance problems and data integrity nightmares down the road. Here are the most common mistakes.

The biggest trap is picking a natural key that seems stable but isn't. Building a database with a customer's email address or name as the primary key works until that person gets married or changes email providers. Then you're stuck performing a risky cascade of updates across your entire system.

Another common mistake is using a bulky data type for your key. A long string (VARCHAR) bloats your indexes and slows down every join and lookup. An integer or UUID is much faster to compare.

Primary Key Design: Best Practices to Avoid Common Pitfalls

Prefer stable surrogate keys. Use an auto-incrementing integer or a UUID for most tables. A BIGINT is safer than a standard INT to avoid future capacity issues. These keys have no real-world meaning, so they're immune to changes in your business data.
Keep keys compact. Pick the smallest data type that can handle your table's projected growth. Smaller keys mean smaller indexes, faster queries, and lower memory use.
Never reuse primary key values. When you delete a row, its primary key should be retired permanently. Reusing old keys can corrupt historical data and create confusing "ghost" records.

Following these guidelines is solid architecture. When laying out your schema, sticking to database design best practices matters for data integrity and long-term performance.

By implementing these practices, you'll avoid many of the common database design mistakes that plague development teams. You're not just defining keys; you're building a reliable foundation.

Frequently asked questions about primary keys

Can a table have more than one primary key?

No. A table can only have one primary key.

The entire point of a primary key is to be the single source of truth for identifying a row. If you had more than one, you'd introduce ambiguity, which is the opposite of what a primary key does. It's the definitive address for each record.

That said, a single primary key can be built from multiple columns. This is called a composite primary key.

When should you use a composite primary key?

Composite keys are the solution for "junction" or "linking" tables that sit in the middle of a many-to-many relationship.

Consider an Enrollments table that connects Students and Courses.

A student can sign up for many different courses.
A course will have many different students in it.

The Enrollments table links them. In this table, student_id alone isn't unique (a student takes multiple classes), and course_id alone isn't unique (a class has multiple students). But the combination of (student_id, course_id) has to be unique, because a student can only enroll in the exact same course one time. That combination becomes the composite primary key.

A composite key is the right tool when no single column can guarantee uniqueness, but a specific combination of columns can. It's a common and useful pattern in relational database design.

What is the difference between a primary key and a unique key?

Both primary keys and unique keys prevent duplicate values in a column. The differences come down to their rules and roles.

Null values: A primary key cannot contain NULLs. A unique key is more lenient; most database systems allow it to contain one NULL value, since NULL isn't technically equal to anything, including another NULL.
Quantity: You get only one primary key per table. You can have as many unique keys as you need.
Purpose: The primary key is the main identifier for a row. It's what other tables use to create foreign key relationships. A unique key enforces a specific business rule (like "no two users can have the same email address") without being the table's main identifier.

Additional Resources