On this page...
Archive
| July, 2011 (4) |
| September, 2010 (1) |
| May, 2010 (2) |
| April, 2010 (2) |
| March, 2010 (3) |
| February, 2010 (5) |
| November, 2009 (4) |
| October, 2009 (2) |
| October, 2008 (2) |
| July, 2008 (3) |
| June, 2008 (4) |
| April, 2008 (1) |
| January, 2007 (3) |
| December, 2006 (2) |
| October, 2006 (1) |
| September, 2006 (1) |
| August, 2006 (4) |
| July, 2006 (2) |
Categories
Archives
| | Sun | Mon | Tue | Wed | Thu | Fri | Sat |
|---|
| 29 | 30 | 1 | 2 | 3 | 4 | 5 | | 6 | 7 | 8 | 9 | 10 | 11 | 12 | | 13 | 14 | 15 | 16 | 17 | 18 | 19 | | 20 | 21 | 22 | 23 | 24 | 25 | 26 | | 27 | 28 | 29 | 30 | 31 | 1 | 2 | | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
Links
|
By Ivan, PhD., a Java developer at Murano Software’s team This is the last post from our series about MongoDB. We have shown you the features, the drawbacks and now we will show you how it handles Java. As mentioned before, MongoDB supports a number of drivers for different platforms; however, since it's mostly designed to operate in the Web's distributed environment, it comes naturally along Java technology. But enough words said, instead let's jump straight to code and see what it really looks like from the inside. To make it even more exciting, we put in some competition flavor by comparing MongoDB on Java with something very common and widely used, something like MySQL. So, we are about to assemble some kind of roughly made comparative performance test for MongoDB and MySQL with bulk inserts, selects and updates. Before we move on, I beg you not to take this performance test too seriously, since it's in no way a complete, redundant or ultimate performance test, and it's not intended to say some heavy 'yes' or 'no' toward a specific solution. It's merely an applied example to make your journey with MongoDB interesting and useful. There is plenty of detailed documentation for both MySQL’s and MongoDB’s installation setup processes. They both are available as a binary distribution. If you are a Linux user, I encourage you to use repository distributions. Let's assume we have our environment set up and running, and we ready to start coding. We need some generic testing and reporting suite and service running our insert/select/update routines for available database providers. You can download complete code to look into and play with it using this link: Let's take a closer look at MongoDB’s database service, MongoDbOpService.java: <code> ... public class MongoDbOpService implements DatabaseOpService, InitializingBean { ... @Override public void cleanUp() { col.remove(BasicDBObjectBuilder.start().get()); col.ensureIndex(BasicDBObjectBuilder.start().add("field1", 1).get()); } @Override public void afterPropertiesSet() throws Exception { Mongo mongo = new Mongo(host, port); col = mongo.getDB("test").getCollection("test"); cleanUp(); } @Override public void select(int numOfOps) throws DatabaseOperationException { for (int i = 0; i < numOfOps; i++){ col.find(BasicDBObjectBuilder.start() .add("field1", new BasicDBObject("$gt", i)).get()); } } @Override public void update(int numOfOps) throws DatabaseOperationException { for (int i = 0; i < numOfOps; i++) { col.update(new BasicDBObject("field1", new BasicDBObject("$gt", i)), new BasicDBObject("$set", new BasicDBObject("field2", 1)), false, true); } } @Override public void insert(int numOfOps) throws DatabaseOperationException { for (int i = 0; i < numOfOps; i++) { BasicDBObjectBuilder builder = new BasicDBObjectBuilder(); builder .add("field1", (int) (Math.random() * numOfOps)) .add("field2", (int) (Math.random() * numOfOps)) .add("field3", (int) (Math.random() * numOfOps)); col.save(builder.get()); } } ... } </code> We start by just creating Mongo instance, with its constructor accepting the host address as a parameter. Then we want to create a test database and collect in it. The trick is that we don't have to worry about whether the database with the given name exists. getDB() will return existing names or create a new one for us, and the same applies to getCollection(). Since we are going to run through our routines several times, we want to ensure we have a clean collection for each pass, and we can do it by calling the cleanUp() service method. It will remove all documents from the collection in case this is not the first run and ensure (create) index on the field ('field1') that we are going to query later on. If you did the same thing from a JavaScript console, it would look something like this: <code> > db.test.remove({}) > db.test.ensureIndex({“field1”:1}) </code> Where 'test' is the name of the collection, remove is given an empty JSON object as a filter argument, and ensureIndex is given an object with only one name as an index target field, while '1' indicates ascending indexing order. Worth mentioning is that almost everything inside MongoDB uses json extensively for filtering, querying, update operations, etc. Also note that there is nothing like data schema for the created MongoDB collection. The next point of interest is the insert(int N) call that will insert N simple objects into our test collection, which in json will look like this: <code> { “field1” : random_int, “field2” : random_int, “field3” : random_int } </code> Now, when we have N documents in our collection, we can start playing and selecting them. The select(int N) call will make N queries to test the collection, trying to fetch all documents that have their field2 value greater than some integer (0,N]. Note that such query returns the cursor, so it's usually a good idea to consider paging and result limiting in your live application, especially on large data results. Otherwise you will definitely hit a cursor timeout issue or some other nasty distributed environment issues. The very same query in JS console would look like this: <code> > db.test.find({“field1”:{$gt:i}}) </code> Note how we use the filter operator keyword '$gt' (greater than). There are two things to note here. First, there are plenty of such keywords in mongodb query syntax used for filtering, updating and some other cool stuff. You can find all the details in manuals. What is more important to note is that we actually use json object as a value for “field1.” So your documents can contain any number of nested documents in field values, as well as arrays and functions. This can lead to very interesting applications, including stored procedures. Just take a look at this example object: <code> { _id : ObjectID(“4e0a078e69fea677c91d3742”), name: “John”, address:{ city:“Moscow”, street:{ name: ”Lenin”, type: “Square” }, zip: 12345 }, sayHi : function(){print('Hello from function!');}, some_list:[1245968390, 2859408375, 8756203941], some_object_list:[ {field1:”something1”,field2:1}, {field1:”something2”,field2:-1} ] } </code> Even though it can look confusing, it turns out to be a very convenient way of making things work. Also, since this is actually an object, you can use “dot notation” to access any nested field. So if we match the object from the last example, we can do the following: <code> > a = db.test.findOne({name:”John”}) > a.address.street.type Square > a.sayHi() Hello from function! > a.some_list[0] 1245968390 </code> Finally, update(int N) call makes N update operations, trying to find objects with the same query select() does, but this time we want to change 'field2' value of matched objects. Note that we are using the '$set' atomic modifier here, so that this update won't lock the object being changed, only the field value is to be updated. Here are the results I've got on my local machine. I have to mention that even the simple and quick-made test we used shows how ridiculously slow bulk operation can be with the traditional relational approach. I basically stopped on the 10k set because it takes way too much time with little difference in results. In the meantime, using MongoDB, I've had times when I generated millions of documents for the demo collection, and it took a matter of minutes. Test: MongoDB vs MySQL quick performance test (insert/select/update) Environment: Ubuntu 11.04 Linux x86_64 2.6.38, i5-2400, RAM 8Gb | Service | OpNum | Ins | InsAvg | Sel | SelAvg | Upd | UpdAvg | | MongoDb | 10 | 1 | 0.1 | 1 | 0.1 | 1 | 0.1 | | MySQL | 10 | 533 | 53.3 | 3 | 0.3 | 599 | 59.9 | | MongoDb | 100 | 19 | 0.19 | 0 | 0.0 | 8 | 0.08 | | MySQL | 100 | 5145 | 51.45 | 42 | 0.42 | 5866 | 58.66 | | MongoDb | 1000 | 79 | 0.079 | 5 | 0.0050 | 53 | 0.053 | | MySQL | 1000 | 49695 | 49.695 | 966 | 0.966 | 52245 | 52.245 | | MongoDb | 10000 | 435 | 0.0435 | 18 | 0.0018 | 67182 | 6.7182 | | MySQL | 10000 | 490376 | 49.0376 | 40992 | 4.0992 | 863177 | 86.3177 |  Figure 2 - Average select time, ms/record ![clip_image002[5] clip_image002[5]](http://www.muranosoft.com/Outsourcingblog/content/binary/Windows-Live-Writer/d50d22ef1343_901E/clip_image002%5B5%5D_thumb.gif) Figure 3 - Average update time, ms/record ![clip_image002[9] clip_image002[9]](http://www.muranosoft.com/Outsourcingblog/content/binary/Windows-Live-Writer/d50d22ef1343_901E/clip_image002%5B9%5D_thumb.gif) Epilogue After having a brief look at MongoDB's features and trade offs, one can ask oneself, is this piece of technology production ready, and could I use it in my project? Well, it's up to you, but my word is that you should give it a try. The test we were using as an example cannot be taken to prove either MongoDB or MySQL is better. It just shows how these two systems can handle one specific test. What is absolutely exciting about MongoDB is that it's really easy to start working with. It's just a matter of an hour to layout some basic configuration and client code. There are huge advantages and yet a price to pay.
In our previous posts we introduced you to MongoDB as well as it’s features. In this part we will be talking about the Trade offs Finally, it’s payback time. Of course, we are not expecting to get a scalable and high-performance database with little to no cost. Current MongoDB drawbacks should be divided into two groups. The first group is caused by fundamental differences in approach and the price we have to pay – consistency. This means that, as soon as MongoDB is your weapon of choice, your application logic has to provide workarounds for concurrency issues and consistency checks. It's good to know that MongoDB has two options to tackle the last problem – write with concern (WriteConcern) allows you to go around the “save now write later” issue, and it can be a powerful tool in skilled hands, while the slaveOk setting allows you to force “read from primary only” behavior, so you are no longer trapped with replication sync issues. And of course, versions and atomic updates help with concurrency. I should name the second group “age-related problems.” These are the drawbacks that we are facing now, but they are more or less likely to go away in the future. Some of these drawback fixes are harder to implement and/or have a lower priority. To name a few: As mentioned before, queries in MongoDB work perfectly when the corresponding index is there. But if your query filter has some not-indexed fields, you will end up with a very long query time. There is a way to hint or force MongoDB to use the existing query without that field, but you have to understand indexing mechanics and use this feature carefully. Another thing is that MongoDB keeps all indexes in memory, and even though memory size is usually quite large for servers, it’s still limited, and you have to keep that in mind and make sure you are not creating too many indexes. Also when querying, you have to keep in mind that not all query types currently support or use indexes; therefore, using something like MapReduce, where clause or some regular expressions, you could end up with a long query time. Due to JavaScript engine limitations, MapReduce queries are processed in a single thread and can lock down the whole database instance. Eventually, indexes could be very productive for heavy-read applications and could show up as counterproductive in applications with lots of writes. MongoDB is supposed to be operated from a secured/trusted environments and only has basic authentication implemented. You have only two alternatives for user access rights: it's either a read-only user or an admin user with all write/read privileges. The user gets access database-wide, and it's not possible to set up different access per collection. Also, the basic auth mechanism currently doesn't work for sharded configurations. Not a big deal, usually, but still a limitation of single document size. As for version 1.8, it’s 8Mb and assumed to increase in later releases. MongoDB returns the cursor as a query result. There is such a thing as a cursor timeout. So, especially when operating in distributed configuration (shard + replicaset), you should consider paging queries in a smaller chunks. Last but not least, even though there are a number of MongoDB drivers for all platforms, they perform differently. Whether it's a platform or a driver implementation, it's always a good idea to try and test-driver performance in your environment before making a final decision. Stay tuned for our last posts where we will be talking about how MongoDB and Java
By Ivan, a Java Developer at Murano software’s team. In our previous post we have introduced you to MongoDB which has been getting quite the attention lately. We are continuing our series about Mongo, and more precisely with.. MongoDB Features MongoDB distribution includes a JavaScript interactive shell that can be used to access, query, update, configure and administer databases on local and remote hosts. Having that JavaScript is the language of choice. All MongoDB shell operations are JS calls, and JSON is the common way to define objects here. MongoDB is fast, I mean really fast, when it comes to search and update operations, as long as you have the corresponding index. It uses B-tree based indexes. Each collection can have more than one index, and each can include as many fields as you want (compound indexes supported). MongoDB doesn't support transactions as most of the ACID-compliant relational databases do, but it has its own way to tackle the concurrency problem – atomic update operations on the document level. This basically means you can update some of the document's fields without affecting the rest of them. There is a kind of convention, called “update-if-current,” to make sure you're not saving an outdated document using versions, or a filtering clause for the update operation. MongoDB also supports the findAndModify operation that updates the value and immediately returns to the previous or current value. This can be useful in many cases, like queues or sequencers. When it comes to aggregating queries, MongoDB has MapReduce support. This distributed computation technique is another gift from Google to humanity. It can look confusing to a common SQL-minded person at first glance, but a patient user should have no problems mastering it quickly. Some people like it to the extent they develop something like CoachDB, which is almost completely based on MapReduce and definitely is a worthy piece of technology. MongoDB was born with scalability in mind, so it implements replication and sharding naturally. Replicas are organized into so-called ReplicaSets that consist of one primary host (used for writes) and secondaries (currently up to 7). This is a perfect choice for failover and redundancy. If the primary host goes down, the new one will be “elected” automatically from the working secondaries. MongoDB provides horizontal scalability with the sharding implementation. MongoDB sharding configuration works on the collection level, evenly distributing chunks of collection data among available cluster shards using the shard key. The shard key can be one or more of the existing document fields or sometimes a specifically generated field to ensure shallower load distribution among shards. Sharded and unsharded collections feel exactly the same for the client application. These two scaling techniques can and should be used simultaneously, as when each shard for the sharded collection consists of a ReplicaSet. Look forward to Part 3 of our Hands on MongoDB series!
By Ivan, a Java developer at Murano Software’s team «Why is it you're always too small or too tall?», The Mad Hatter, Alice in Wonderland Prologue Indeed, there is no such thing as an ultimate solution. Every time we face a problem, even if we are lucky enough to have a precooked solution at our fingertips, we still have to tweak, adjust and adapt it to a specific case's demands. Sometimes, the “adjustment” part can take a while, and therefore putting the right tool in the right place has a great impact on the overall project development efficiency. From the day the Web was born, it never stopped growing in both size and quality. Whether you are developing an online shop, developing a news portal or trying to get to your customer with a company site, databases are always going to be a major building block for your solution. In terms of maturity and functional completeness, nothing can compare to relational databases. Of course, there are plenty of RDBMS, each with its own advantages and flaws, but they have proved that their strict rules and functionality can give a rock-solid basis for any web project. However, in some cases, the relational approach can be too “tall.” Schema limitations and weak scalability support can be a decisive factor leading you to look for another solution. And object-oriented databases are among the alternatives here. So, without further delay, let's get our hands on MongoDB – an object-oriented database that’s gaining popularity recently.
Essentials MongoDB is an open (GNU AGPL 3.0 license), schema-less, non-SQL, document-oriented database management system developed by a 10gen. It has been developed since 2007 with the first public release in 2009. It's available for download from http://www.mongodb.org. MongoDB has a variety of drivers to work with both officially supported (Java, C#, CPP, Python, PHP, Ruby, Scala, Erlang, Perl and others) and community developed. And, of course, MongoDB is available for all Mac, Solaris, Windows and Linux users, with sources available so you basically could try building it yourself for any platform. So what is “schema-less” all about? The deal is that MongoDB does not have any schema or any other kind of user-defined restrictions for data storage. When using a relational database, you have plenty of work to do in order to prepare your domain entities for storage in tables and rows. You define each field type, add restrictions, sometimes even put some logic in triggers and stored procedures. This is not the case with MongoDB. What once was a record in a relational table (usually more than one table) now becomes a document. Each document can have an unlimited number of fields. A field is a pair of key (unique string identifier) and value. A value can be of any supported type as well as an object (document) or array itself. You can use dot notation to access values at any field level. Documents are stored in collections. But since you don't have any schema, every document in a single collection can have any set of different fields. Of course, it's a good practice to put different objects in separate collections. Obviously, this kind of unrestrained storage is hardly compatible with SQL, and there is no such thing as “joint” operations in object-oriented databases. So, let's see what MongoDB has to offer instead. BASE vs. ACID While most, if not all, relational databases offer their prayers to ACID (Atomicity, Consistency, Isolation, Durability), this evil god demands sacrifices. According to Eric Brewer's CAP theorem, there are at most two of three properties a shared data system can have: Consistency, Availability and Partition tolerance. And while RDBMS strive hard for Consistency and tolerance, Availability is left behind. An object-oriented database’s logic instead declares that, in the cases where this is applicable, consistency can be given up in favor of high availability and scalability. This is sometimes referred to as a BASE acronym, which stands for Basically Available, Soft state, Eventually available. It means it's okay to go on without locks and transactions. It's okay to have data in a soft state (we can consider write operation as complete before the data is actually written to disk).The data will eventually become consistent, and in most cases, there are plenty of other ways to work around concurrency. Now we will make a small break. Have we piqued your interest? Look out for Part 2, where we will be introducing the Features of MongoDB
|