Performance and Tools

In this last chapter, we look at a few performance topics as well as some of the tools available to MongoDB developers. We won't dive deeply into either topic, but we will examine the most important aspects of each.

Indexes

At the very beginning we saw the special system.indexes collection which contains information on all the indexes in our database. Indexes in MongoDB work a lot like indexes in a relational database: they help improve query and sorting performance. Indexes are created via ensureIndex:

// where "name" is the field name
db.unicorns.ensureIndex({name: 1});

And dropped via dropIndex:

db.unicorns.dropIndex({name: 1});

A unique index can be created by supplying a second parameter and setting unique to true:

db.unicorns.ensureIndex({name: 1},
    {unique: true});

Indexes can be created on embedded fields (again, using the dot-notation) and on array fields. We can also create compound indexes:

db.unicorns.ensureIndex({name: 1,
    vampires: -1});

The direction of your index (1 for ascending, -1 for descending) doesn't matter for a single key index, but it can make a difference for compound indexes when you are sorting on more than one indexed field.

The indexes page has additional information on indexes.

Explain

To see whether or not your queries are using an index, you can use the explain method on a cursor:

db.unicorns.find().explain()

The output tells us that a BasicCursor was used (which means non-indexed), that 12 objects were scanned, how long it took, what index, if any, was used as well as a few other pieces of useful information.

If we change our query to use an index, we'll see that a BtreeCursor was used, as well as the index used to fulfill the request:

db.unicorns.find({name: 'Pilot'}).explain()

Replication

MongoDB replication works in some ways similarly to how relational database replication works. All production deployments should be replica sets, which consist of ideally three or more servers that hold the same data. Writes are sent to a single server, the primary, from where it's asynchronously replicated to every secondary. You can control whether you allow reads to happen on secondaries or not, which can help direct some special queries away from the primary, at the risk of reading slightly stale data. If the primary goes down, one of the secondaries will be automatically elected to be the new primary. Again, MongoDB replication is outside the scope of this book.

Sharding

MongoDB supports auto-sharding. Sharding is an approach to scalability which partitions your data across multiple servers or clusters. A naive implementation might put all of the data for users with a name that starts with A-M on server 1 and the rest on server 2. Thankfully, MongoDB's sharding capabilities far exceed such a simple algorithm. Sharding is a topic well beyond the scope of this book, but you should know that it exists and that you should consider it, should your needs grow beyond a single replica set.

While replication can help performance somewhat (by isolating long running queries to secondaries, and reducing latency for some other types of queries), its main purpose is to provide high availability. Sharding is the primary method for scaling MongoDB clusters. Combining replication with sharding is the perscribed approach to achieve scaling and high availability.

Stats

You can obtain statistics on a database by typing db.stats(). Most of the information deals with the size of your database. You can also get statistics on a collection, say unicorns, by typing db.unicorns.stats(). Most of this information relates to the size of your collection and its indexes.

Profiler

You enable the MongoDB profiler by executing:

db.setProfilingLevel(2);

With it enabled, we can run a command:

db.unicorns.find({weight: {$gt: 600}});

And then examine the profiler:

db.system.profile.find()

The output tells us what was run and when, how many documents were scanned, and how much data was returned.

You disable the profiler by calling setProfilingLevel again but changing the parameter to 0. Specifying 1 as the first parameter will profile queries that take more than 100 milliseconds. 100 milliseconds is the default threshold, you can specify a different minimum time, in milliseconds, with a second parameter:

//profile anything that takes
//more than 1 second
db.setProfilingLevel(1, 1000);

Backups and Restore

Within the MongoDB bin folder is a mongodump executable. Simply executing mongodump will connect to localhost and backup all of your databases to a dump subfolder. You can type mongodump --help to see additional options. Common options are --db DBNAME to back up a specific database and --collection COLLECTIONNAME to back up a specific collection. You can then use the mongorestore executable, located in the same bin folder, to restore a previously made backup. Again, the --db and --collection can be specified to restore a specific database and/or collection. mongodump and mongorestore operate on BSON, which is MongoDB's native format.

For example, to back up our learn database to a backup folder, we'd execute (this is its own executable which you run in a command/terminal window, not within the mongo shell itself):

mongodump --db learn --out backup

To restore only the unicorns collection, we could then do:

mongorestore --db learn --collection unicorns \
    backup/learn/unicorns.bson

It's worth pointing out that mongoexport and mongoimport are two other executables which can be used to export and import data from JSON or CSV. For example, we can get a JSON output by doing:

mongoexport --db learn --collection unicorns

And a CSV output by doing:

mongoexport --db learn \
    --collection unicorns \
    --csv --fields name,weight,vampires

Note that mongoexport and mongoimport cannot always represent your data. Only mongodump and mongorestore should ever be used for actual backups. You can read more about your backup options in the MongoDB Manual.

Summary

In this chapter we looked at various commands, tools and performance details of using MongoDB. We haven't touched on everything, but we've looked at some of the common ones. Indexing in MongoDB is similar to indexing with relational databases, as are many of the tools. However, with MongoDB, many of these are to the point and simple to use.

Performance and tools - MongoDb