Performance and Tools
In this last chapter, we look at a few performance topics as well as some of the tools available to MongoDB developers. We won't dive deeply into either topic, but we will examine the most important aspects of each.
Indexes
At the very beginning we saw the special system.indexes
collection which contains information on all the indexes in our database. Indexes in MongoDB work a lot like indexes in a relational database: they help improve query and sorting performance. Indexes are created via ensureIndex
:
// where "name" is the field name
db.unicorns.ensureIndex({name: 1});
And dropped via dropIndex
:
db.unicorns.dropIndex({name: 1});
A unique index can be created by supplying a second parameter and setting unique
to true
:
db.unicorns.ensureIndex({name: 1},
{unique: true});
Indexes can be created on embedded fields (again, using the dot-notation) and on array fields. We can also create compound indexes:
db.unicorns.ensureIndex({name: 1,
vampires: -1});
The direction of your index (1 for ascending, -1 for descending) doesn't matter for a single key index, but it can make a difference for compound indexes when you are sorting on more than one indexed field.
The indexes page has additional information on indexes.
Explain
To see whether or not your queries are using an index, you can use the explain
method on a cursor:
db.unicorns.find().explain()
The output tells us that a BasicCursor
was used (which means non-indexed), that 12 objects were scanned, how long it took, what index, if any, was used as well as a few other pieces of useful information.
If we change our query to use an index, we'll see that a BtreeCursor
was used, as well as the index used to fulfill the request:
db.unicorns.find({name: 'Pilot'}).explain()
Replication
MongoDB replication works in some ways similarly to how relational database replication works. All production deployments should be replica sets, which consist of ideally three or more servers that hold the same data. Writes are sent to a single server, the primary, from where it's asynchronously replicated to every secondary. You can control whether you allow reads to happen on secondaries or not, which can help direct some special queries away from the primary, at the risk of reading slightly stale data. If the primary goes down, one of the secondaries will be automatically elected to be the new primary. Again, MongoDB replication is outside the scope of this book.
Sharding
MongoDB supports auto-sharding. Sharding is an approach to scalability which partitions your data across multiple servers or clusters. A naive implementation might put all of the data for users with a name that starts with A-M on server 1 and the rest on server 2. Thankfully, MongoDB's sharding capabilities far exceed such a simple algorithm. Sharding is a topic well beyond the scope of this book, but you should know that it exists and that you should consider it, should your needs grow beyond a single replica set.
While replication can help performance somewhat (by isolating long running queries to secondaries, and reducing latency for some other types of queries), its main purpose is to provide high availability. Sharding is the primary method for scaling MongoDB clusters. Combining replication with sharding is the perscribed approach to achieve scaling and high availability.
Stats
You can obtain statistics on a database by typing db.stats()
. Most of the information deals with the size of your database. You can also get statistics on a collection, say unicorns
, by typing db.unicorns.stats()
. Most of this information relates to the size of your collection and its indexes.
Profiler
You enable the MongoDB profiler by executing:
db.setProfilingLevel(2);
With it enabled, we can run a command:
db.unicorns.find({weight: {$gt: 600}});
And then examine the profiler:
db.system.profile.find()
The output tells us what was run and when, how many documents were scanned, and how much data was returned.
You disable the profiler by calling setProfilingLevel
again but changing the parameter to 0
. Specifying 1
as the first parameter will profile queries that take more than 100 milliseconds. 100 milliseconds is the default threshold, you can specify a different minimum time, in milliseconds, with a second parameter:
//profile anything that takes
//more than 1 second
db.setProfilingLevel(1, 1000);
Backups and Restore
Within the MongoDB bin
folder is a mongodump
executable. Simply executing mongodump
will connect to localhost and backup all of your databases to a dump
subfolder. You can type mongodump --help
to see additional options. Common options are --db DBNAME
to back up a specific database and --collection COLLECTIONNAME
to back up a specific collection. You can then use the mongorestore
executable, located in the same bin
folder, to restore a previously made backup. Again, the --db
and --collection
can be specified to restore a specific database and/or collection. mongodump
and mongorestore
operate on BSON, which is MongoDB's native format.
For example, to back up our learn
database to a backup
folder, we'd execute (this is its own executable which you run in a command/terminal window, not within the mongo shell itself):
mongodump --db learn --out backup
To restore only the unicorns
collection, we could then do:
mongorestore --db learn --collection unicorns \
backup/learn/unicorns.bson
It's worth pointing out that mongoexport
and mongoimport
are two other executables which can be used to export and import data from JSON or CSV. For example, we can get a JSON output by doing:
mongoexport --db learn --collection unicorns
And a CSV output by doing:
mongoexport --db learn \
--collection unicorns \
--csv --fields name,weight,vampires
Note that mongoexport
and mongoimport
cannot always represent your data. Only mongodump
and mongorestore
should ever be used for actual backups. You can read more about your backup options in the MongoDB Manual.
Summary
In this chapter we looked at various commands, tools and performance details of using MongoDB. We haven't touched on everything, but we've looked at some of the common ones. Indexing in MongoDB is similar to indexing with relational databases, as are many of the tools. However, with MongoDB, many of these are to the point and simple to use.