Traditional relational databases were ruling the roost until datasets were being reckoned in megabytes and gigabytes. However, as organizations around the world kept growing, a tsunami called “Big Data” rendered the old technologies unfeasible.
When it came to data storage and retrieval, these technologies simply crumbled under the burden of such colossal amounts of data. Thanks to Hadoop, Hive and Hbase, these popular technologies now have the capability of handling large sets of raw unstructured data, efficiently, as well as economically.
Image Credit : http://www.compassitesinc.com
Another aftermath of the above problems was the parallel advent of “Not Only SQL” or NoSQL databases. The primary advantage of the NoSQL databases is their mechanism that facilitates the storage and retrieval of data in the loser consistency model along with added benefits like horizontal scaling, better availability and quicker access.
With its implementation in over five hundred top notch organizations across the globe, MongoDB certainly has emerged as the most popular NoSql databases amongst all. In the absence of a concrete survey, it might be a bit difficult to assess the percentage of adoption and penetration of MongoDB. However, there are various metrics like Google searches and the number of employment opportunities for Hadoop and MongoDB professionals that give a good idea of the popularity of these technologies.
Based on its Google search volume, it was found that MongoDB ranked first and was three times more popular than the next prevailing technology. When it came to comparing with the least prevailing database, MongoDB fared 10 times better.
A survey of profiles of IT professionals on LinkedIn revealed that percentage of professionals skilled in MongoDB was almost 50% as compared to other NoSQL skilled professionals. When it comes down to acceptance levels, MongoDB equals the sum of next 3 NoSQL databases put together. Rackspace, one of the pioneers to adopt MongoDB for their cloud solutions affirms ““MongoDB is the de facto choice for NoSQL applications”.
The reasons that MongoDB is being widely adopted by developers follow:
- 1.MongoDB enhances productivity and it is easy to get started and to use.
- 2.Owing to the removal of schema barrier, developers can now concentrate on developing applications rather than databases.
- 3.MongoDB offers extensive support for an array of languages like C#, C, C++, Node.js, Scala, Javascript and Objective-C. These languages are pertinent to the future of the web.
Understanding how MongoDB teams up with Hadoop and Big Data technologies?
Of late, Technologists at MongoDB have successfully developed a MongoDB connector for Hadoop that facilitates enhanced integration combined with ease in execution of various tasks as below:
- The MongoDB-Hadoop connector uses the authority of Hadoop’s MapReduce to live application data in MongoDB by extracting values from Big Data – speedily as well as efficiently.
- The MongoDB-Hadoop connector projects it as ‘Hadoop compatible file system’ and MapReduce jobs can now be read directly from MongoDB, without being copied to the HDFS. Thus, doing away with the necessity of transferring terabytes of data across the network.
- The “necessity” of scanning entire collections has been eliminated as MapReduce jobs can pass queries by means of filters and can harness MongoDB’s indexing abilities like text search, compound, array, Geo-spatial and sparse indexes.
- Reading and writing back results from Hadoop jobs back to MongoDB in order to support queries and real time operational processes.
Scope of application - Hadoop and MongoDB
In context to Big Data stacks, MongoDB and Hadoop have the following scopes of application:
- 1)MongoDB is used for the operational part – as a real time data store.
- 2)Hadoop is used primarily for offline analysis and processing of batch data.
Scope of usage in Batch Aggregation
Image Credit: http://mobicon.tistory.com
When it comes to analyzing data, the inbuilt aggregation features incorporated in MongoDBhold good in the majority of situations. However, there are cases that require a higher degree of data aggregation. Under such circumstances, Hadoop provides a powerful support for complex analytics.
- a)Hadoop, by means of single or multiple MapReduce jobs processes the data extracted from MongoDB. It is also possible to pull data from other locations in these MapReduce jobs in order to formulate a multi data solution.
- b)The results received from MapReduce jobs can be written back to MongoDB and they can be used for analysis and querying as and when required.
- c)MongoDB applications can thus make use of the data from batch analytics with a view of handing over to the end user or to facilitate other features down the line.
Scope of usage in Data Warehousing
In a usual production environment, application data with their specific functionality and language may exist in more than one data store. Under such complex situations, Hadoop is used as an integrated source for data - as well as a data warehouse.
- a)MapReduce jobs transfer MongoDB data to Hadoop.
- b)As soon as the data from MongoDB and other sources is available in Hadoop, the datasets can be queried.
- c)At this stage data analysts can opt to use Pig or MapReduce for querying large datasets that includes data from MongoDB.
Owing to the above, MongoDB has emerged as the most preferred choice of developers. From the perspective of NoSQL databases, engineers at MongoDB have successfully integrated it with Hadoop. The MongoDB Hadoop permutation is extremely effective in solving quite a few architectural problems pertaining to data warehousing, processing, data retrieval and aggregating.
source : http://www.dezyre.com/article/mongodb-and-hadoop/81#.VPFJWbPF8VE .
Comments