Skip to main content

MongoDB and Hadoop

Traditional relational databases were ruling the roost until datasets were being reckoned in megabytes and gigabytes. However, as organizations around the world kept growing, a tsunami called “Big Data” rendered the old technologies unfeasible.
When it came to data storage and retrieval, these technologies simply crumbled under the burden of such colossal amounts of data. Thanks to Hadoop, Hive and Hbase, these popular technologies now have the capability of handling large sets of raw unstructured data, efficiently, as well as economically.
Another aftermath of the above problems was the parallel advent of “Not Only SQL” or NoSQL databases. The primary advantage of the NoSQL databases is their mechanism that facilitates the storage and retrieval of data in the loser consistency model along with added benefits like horizontal scaling, better availability and quicker access.
With its implementation in over five hundred top notch organizations across the globe, MongoDB certainly has emerged as the most popular NoSql databases amongst all. In the absence of a concrete survey, it might be a bit difficult to assess the percentage of adoption and penetration of MongoDB. However, there are various metrics like Google searches and the number of employment opportunities for Hadoop and MongoDB professionals that give a good idea of the popularity of these technologies.
Based on its Google search volume, it was found that MongoDB ranked first and was three times more popular than the next prevailing technology. When it came to comparing with the least prevailing database, MongoDB fared 10 times better.
A survey of profiles of IT professionals on LinkedIn revealed that percentage of professionals skilled in MongoDB was almost 50% as compared to other NoSQL skilled professionals. When it comes down to acceptance levels, MongoDB equals the sum of next 3 NoSQL databases put together. Rackspace, one of the pioneers to adopt MongoDB for their cloud solutions affirms ““MongoDB is the de facto choice for NoSQL applications”.
The reasons that MongoDB is being widely adopted by developers follow:
  • 1.MongoDB enhances productivity and it is easy to get started and to use.
  • 2.Owing to the removal of schema barrier, developers can now concentrate on developing applications rather than databases.
  • 3.MongoDB offers extensive support for an array of languages like C#, C, C++, Node.js, Scala, Javascript and Objective-C. These languages are pertinent to the future of the web.
Understanding how MongoDB teams up with Hadoop and Big Data technologies?
Of late, Technologists at MongoDB have successfully developed a MongoDB connector for Hadoop that facilitates enhanced integration combined with ease in execution of various tasks as below:
  • The MongoDB-Hadoop connector uses the authority of Hadoop’s MapReduce to live application data in MongoDB by extracting values from Big Data – speedily as well as efficiently.
  • The MongoDB-Hadoop connector projects it as ‘Hadoop compatible file system’ and MapReduce jobs can now be read directly from MongoDB, without being copied to the HDFS. Thus, doing away with the necessity of transferring terabytes of data across the network.
  • The “necessity” of scanning entire collections has been eliminated as MapReduce jobs can pass queries by means of filters and can harness MongoDB’s indexing abilities like text search, compound, array, Geo-spatial and sparse indexes.
  • Reading and writing back results from Hadoop jobs back to MongoDB in order to support queries and real time operational processes.
Scope of application - Hadoop and MongoDB
In context to Big Data stacks, MongoDB and Hadoop have the following scopes of application:
  • 1)MongoDB is used for the operational part – as a real time data store.
  • 2)Hadoop is used primarily for offline analysis and processing of batch data.
Scope of usage in Batch Aggregation

Image Credit: http://mobicon.tistory.com
When it comes to analyzing data, the inbuilt aggregation features incorporated in MongoDBhold good in the majority of situations. However, there are cases that require a higher degree of data aggregation. Under such circumstances, Hadoop provides a powerful support for complex analytics.
  • a)Hadoop, by means of single or multiple MapReduce jobs processes the data extracted from MongoDB. It is also possible to pull data from other locations in these MapReduce jobs in order to formulate a multi data solution.
  • b)The results received from MapReduce jobs can be written back to MongoDB and they can be used for analysis and querying as and when required.
  • c)MongoDB applications can thus make use of the data from batch analytics with a view of handing over to the end user or to facilitate other features down the line.
Scope of usage in Data Warehousing
In a usual production environment, application data with their specific functionality and language may exist in more than one data store. Under such complex situations, Hadoop is used as an integrated source for data - as well as a data warehouse.
  • a)MapReduce jobs transfer MongoDB data to Hadoop.
  • b)As soon as the data from MongoDB and other sources is available in Hadoop, the datasets can be queried.
  • c)At this stage data analysts can opt to use Pig or MapReduce for querying large datasets that includes data from MongoDB.
Owing to the above, MongoDB has emerged as the most preferred choice of developers. From the perspective of NoSQL databases, engineers at MongoDB have successfully integrated it with Hadoop. The MongoDB Hadoop permutation is extremely effective in solving quite a few architectural problems pertaining to data warehousing, processing, data retrieval and aggregating. 
source : http://www.dezyre.com/article/mongodb-and-hadoop/81#.VPFJWbPF8VE .

Comments

Popular posts from this blog

Cygwin Install Tutorial

1. Introduction OPEN-R requires a Unix/Linux like environment.   To work with it on Windows, the Cygwin package can be used.  It's based on Linux, and allows Linux programs (for the most part) to work under Windows.    You'll need at least 200 MBytes of free disk space available to download and install Cygwin. 2. Download Installer Download the following Cygwin setup program:   http://www.cygwin.com/setup.exe   (~250 KBytes).   Save it to your desktop, and launch once ready. 3. Cygwin Setup After launching the setup program, you'll see this screen.  Click the  Next  button...      4. Installation Type Select "Install from Internet".   Click the  Next  button... 5. Installation Directory The defaults are recommended.   Cygwin software packages will install to "C:\Cygwin". Click the  Next  button... . ...

Hear a PDF Instead of Reading It...

Hear a PDF Instead of Reading It Did you know you can listen to any PDF instead of reading it? It's possible with Adobe Reader 6.0+. Here's the short cut: CTRL+SHIFT+B : This allows you to hear the entire Document (or, you can use View—>Read Out Loud—>Read to the End of Document ). CTRL+SHIFT+V : This allows you to hear just the page you are viewing (or, you can use View—&GtRead Out Loud—>Read This Page ). If the voice is too fast, adjust the speed by going to Control Panel—>(in search box enter "SPEECH" and then click change text to speech setting)  —>Voice Speed—>Slow . You can listen to any PDF instead of reading with Adobe Reader 7.0 or 6.0, and the short cut is: Ctrl+shift+b - to hear the entire Document Ctrl+shift+v - to hear the page Ctrl+shift+c - to resume Ctrl+shift+e - to stop Open any PDF File and test.... its unbelievable..! 

The 7 Best Chrome Apps of 2014

The 7 Best Chrome Apps of 2014 With a brand new year looming large, it’s time for a quick look back at the rear view mirror before pressing on ahead — and what a view 2014 holds for Chrome fans.  We’ve seen native 64-bit builds of the browser land on both Mac OS X and Windows, savoured the introduction of nifty new features like tab audio indicators, and reaped the benefits of developers getting access to new APIs, design guidelines and features. The past 12 months have also seen a great many changes in the world of Chrome Apps. In this post we look back at some of the very best new and updated Chrome Apps from 2014. Sunrise (New) Sunrise  built up a huge following on iOS, making its arrival on Android and Chrome OS ( in May ) all the more anticipated. Kitted out with all kinds of handy organisational wizardry, support for multiple service providers, including Google Calendar, Outlook, Facebook and Todois...