B-tree or Bitmap

Recently I read a lot about databases, mostly theory but also from the design point of view. I have to admit that until recently I had no idea what a B-tree is. I mean beside that it is a kind of a data structure, but how it is build and why on average there is a constant time of finding each value in the tree. Those were completely abstract topics.

The most popular indexes are B-tree or Bitmap. The first one is most appropriate for columns of high cardinality (many different values). On the other hand, bitmap indexes are most appropriate for columns of low cardinality. Hence, you should use bitmap index for columns like yes/no indicators, or list of countries, customer’s gender, etc.

Bitmap index has big advantage over the B-tree for low cardinality columns. The idea behind this type of index is to create a bit string where every single bit indicates if a row in a table has a value or not. But that is not the only reason why you should consider implementing a bitmap index. If you have a table in which users often query 10 columns. Creating a B-tree index would be very computationally inefficient, considering other actions like INSERT or UPDATE. It would also be inefficient regarding the combination of columns in WHERE clause. With bitmap index it is straightforward independent on the conjunctive (AND, OR, XOR) and the number of columns in WHERE clause. So, whether it is one, two or ten, it does not add much to complexity. Whereas in the case of B-tree index it can be a game changer. It would most probably lead to a situation in which the index would not be used at all.

When it is good to use B-tree kind of index? In online transaction processing (OLTP) applications, where users behaviour is routine and can be easily predicted. If the data are often inserted or updated, bitmap index can slow the system a lot. As mentioned before, another situation in which B-tree index is preferred is when the column cardinality is big. Although, I do not know what does it mean big. Probably it should be related to the number of rows in a table, but it is a term you will encounter a lot if you Google it out.

Although, as I read in one of discussions at PostgreSQL mailing list you may not see a big difference in performance of the two indexes. At least not on your personal computer. But you may notice the difference in the time your database needs to create such an index. Probably also on the storage side, but that is rather not an issue these days. This might be also a key factor for highly loaded systems.

To read:

Leave a Reply