What should you know about SQL indexes as a Java developer?
The majority of developers don’t know a lot about SQL performance tuning. I find this topic very interesting and I read a lot about it but I think that every developer should at least know a little bit of theory about SQL indexing and database caching systems.
Why do we need SQL indexes?
Imagine your database has about 50 million records. 10000
of them have Smith
as their surname. You perform the query: “select first_name,id from citizen where last_name = 'Smith'
“. How much time would it take for the database to return the result?
In this case, the database doesn’t have a choice, it has to read every record, check conditions, and evict records that don’t fit. Doing that is slow as hell. But if we add INDEX
on last_name
field, the query will perform much faster.
How indexes work?
INDEX
is just B-TREE
where only on leaf nodes do we store values. We use other nodes to find paths to leaf nodes. In the SQL index tree, every node has about 300 values. This number depends on a lot of factors but for this simple tutorial you can say that it is 300. So in our case when we have an index and our table contains 50 milion records, in 4 hops(300^3 = 27000000 and 300^4 = 8100000000, 50 million is between those 2 values) we can easily find 34 leaf nodes (10000/300=33,3333) where our records are saved. So, instead of reading 50 million records we easily found 10000 records and read just them! This can improve query execution time by 30x or 50x, depending on column selectivity. Wow, that is a great improvement!
Can we add an index on every field? Unfortunately, not.
There is a penalty when adding indexes. When you update last_name
, you have to update your index too. When you add a citizen to the database or remove a citizen from the database, you have to update your index(particularly updating and deleting can impact performance). You use your CPU to update the index instead of adding rows faster. Also, indexes are stored on the disk, meaning you need more space. Fortunately, hard disks are cheap so this problem with hard disks is not as big as it seems.
How to choose where to add an index?
Choosing which column to add an index on is hard but some simplest advice would be: “Add index on those tables which are not updated often and on a column which is often in WHERE clause”
Further reading
This was the simplest explanation where I omitted a lot of important details. If this was an interesting article for you, consider readingĀ Sql tuning. I enjoyed this book and there are a lot of useful things on how to tune your SQL queries so they can run like a charm. Also, you might like my other SQL posts. Enjoy!