By Ryan Huber
So let’s start basic…”What is a database?”
Most simply, it’s just a system to store information. A common real-world example I’m going to use today is a file cabinet.
In a file cabinet, you have physical sheets of paper with bits of information written on them. You can think of SQL as the digital equivalent to a file cabinet.
SQL stores every piece of data collected on pages that are identical in size. In your file cabinet, the pages are 8x11.
In SQL, the pages are 8 KB. Just as those physical sheets of paper end up inside the metal cabinet, SQL stores its pages in its digital cabinet, which are the disk drives.
To continue our real world example, let’s say we’re storing tax returns. On each person’s return, you’re not just randomly storing each person’s information in different spots on the paper.
You have a highly formatted structure…a name box, address box, withholding box. And each one of those fields has a strict value type for the data.
For name, only text can go into that field. For submit date, only a date can go into that field. For tax liability amount, only numbers can go into that field.
This structure that you’ve created on the paper pages is the SQL equivalent to a table. Tables have columns, each with a data type specific to the data being stored, and individual rows to delimit each item from the next.
Inside that file cabinet, those tax returns aren’t just sitting loose in the drawers (unless you’re a very disorganized person).
You likely have them in a folder. And if you have a lot of returns, multiple folders each with a tab at the top with the first letter of your clients’ last name…”A-H”, “I-K”.
And also, on the drawers of the cabinet, you might label them by year…”2016”, “2017”. If you didn’t have this order in place, finding “John Smith’s 2016” return could become a time consuming task where you’re just scanning through every return that you have.
By going to the trouble of sorting your returns by last name, placing them in folders and labeling them, and storing specific years in different drawers, you’ve just created, in SQL terms, a clustered index.
The term “clustered” simply means ordered. In this example, our data is clustered by Year, Last Name, First Name.
Moving on in our example, let’s say you have returns for clients from across the US. And, from time-to-time, you have to find them by state.
You could take all of the returns out of the drawers and folders and reorder them by state, year, name. But that’s not the way you search for them the majority of the time, so that would make your searches inefficient overall.
Instead, you decide to take a new piece of paper, write down the drawer and folder location of each person’s return, but ordered first by state.
You then tape that piece of paper to the outside of the file cabinet so that whenever you need to find all of the “Nevada” tax returns, you know exactly where they are in the cabinet to go and get them.
In SQL terms, you just created a non-clustered index. Just as that piece of paper lives outside the cabinet, the non-clustered index is a data structure outside of the table that allows you to find data inside it more efficiently.
Up next in this series: Database 101-2: Why should I care about statistics and index fragmentation?
Ryan Huber is the Senior Database Architect at SoftWriters, a Pittsburgh-based company that offers pharmacy management software solutions to pharmacies serving the long-term care community. Ryan is our in-house SQL guru, providing technical expertise in every facet of database operations, including architecture and code review, resolving performance issues for customers, and formulating a strategy for existing and future product features and enhancements.