Indexing MongoDB

Indexing MongoDB

Indexing columns of a database

I was working on a game-server at the time. The server had too much of code left to be optimized. One of them was leaderboard. We had tested the server with 20-50 players and sent to beta release. It was fine until our player's growth reached 20k. After it, searching or updating a data took longer than expected. We faced many lags, the player's info wasn't updated upon game end, simple query stacked up and took longer time to get finally executed.

We viewed some logs on server, dropped down our rank calculating feature (not optimized and heavy). But we were still facing the issue, though not as frequently as before. Upon further examination and research, we adopted indexes in database, and implementing this helped us with the issue. Let's learn about that.

What is a database index ?

A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure.

Indexes are used to quickly locate data without having to search every row in a database table every time a database table is accessed. I will be explaining its use via MongoDB-Compass.

Example :

let's insert dummy data into a MongoDB. In your MongoDB Compass, you can create a database and a collection within the database, then import the following JSON.

[
	{
		"name": "apple",
		"address": "here",
		"age": 26
	},
	{
		"name": "ball",
		"address": "there",
		"age": 20
	},
	{
		"name": "cat",
		"address": "where",
		"age": 26
	},
	{
		"name": "dog",
		"address": "here",
		"age": 20
	},
	{
		"name": "eagle",
		"address": "here",
		"age": 24
	},
	{
		"name": "frog",
		"address": "there",
		"age": 22
	}
]
JSON - dummy data for my collection

An "_id" field is automatically added to your data by MongoDB. If you open the database in Compass and see under section "Indexes", you can see that "_id" is indexed by default. Now, let's move up to section "Explain Plan" and filter by replacing the object Id by one of the value from "_id" of your data. Mine is :

{_id: ObjectId('614a8f8dd2d090e4e9f88ea0')}

You can see that the output shows that 1 document was examined and returned based on index keys.

MongoDB compass - searching by indexed ObjectId

It seems good. We had 6 data under the collection but it examined only 1. Now let's filter with "address" and see what happens for filter

{address: "here"}
MongoDB Compass - searching by unindexed address

Oops ! It examined 6, that means all documents and returned 3 documents that matched our query.

Now let's move over to section "Indexes" and then create an index on the "address" field.

MongoDB Compass - creating index

And run the same filter by address query from the "Explain Plan" section. That will result in following

MongoDB Compass - searching by indexed address

It examined 3 documents by index keys and then returned those three.

Lets discuss what happened

When the address was not indexed, the query had to examine all the records in a database. We had only 6 records, so the performance is negligible. But you can populate the JSON with 50,000 records or more and perform the above steps and look for "Actual Query Execution Time" in response. You will notice a huge difference.
Then when we indexed the address, the MongoDB internally created a table with values from address and it's pointer to the respective record. Thus when we re-ran the query after indexing, it searched for the index table for the "here" values and returned only the records that had the "here" address.

If we had 50,000 records and only those 3 record with address "here" and address was indexed, the serach for "here" would search only 3 documents and return them. However, if it was not indexed, it would search through all 50,000 records and return those 3 documents.

Other features

  • you can index other columns
  • you can index multiple columns at once
  • you can index asc, desc, text, 2dsphere, etc
  • you can have other indexing options, read the docs for it

Advantages

  • their use in queries usually result in better performance
  • faster execution time for searching with WHERE on indexed column
  • can be used for sorting, eliminates a post-fetch-sort operation
  • unique indexes guarantee uniquely identifiable records in database

Disadvantages

  • takes up additional disk and memory space
  • use of too much indexes might reduce the performance
  • INSERT, UPDATE and DELETE becomes slower because on each operation the indexes must also be updated

Conclusion

This article gives a basic idea on how indexes might help you with performance. You should read the docs and more articles on indexes to move on with a large project. I haven't detailed the disadvantages more elaborately, so you better look for them.
But this little information saved my day, and it might save yours.
Happy coding !