Skip to main content

MongoDB and Hadoop

Traditional relational databases were ruling the roost until datasets were being reckoned in megabytes and gigabytes. However, as organizations around the world kept growing, a tsunami called “Big Data” rendered the old technologies unfeasible.
When it came to data storage and retrieval, these technologies simply crumbled under the burden of such colossal amounts of data. Thanks to Hadoop, Hive and Hbase, these popular technologies now have the capability of handling large sets of raw unstructured data, efficiently, as well as economically.
Another aftermath of the above problems was the parallel advent of “Not Only SQL” or NoSQL databases. The primary advantage of the NoSQL databases is their mechanism that facilitates the storage and retrieval of data in the loser consistency model along with added benefits like horizontal scaling, better availability and quicker access.
With its implementation in over five hundred top notch organizations across the globe, MongoDB certainly has emerged as the most popular NoSql databases amongst all. In the absence of a concrete survey, it might be a bit difficult to assess the percentage of adoption and penetration of MongoDB. However, there are various metrics like Google searches and the number of employment opportunities for Hadoop and MongoDB professionals that give a good idea of the popularity of these technologies.
Based on its Google search volume, it was found that MongoDB ranked first and was three times more popular than the next prevailing technology. When it came to comparing with the least prevailing database, MongoDB fared 10 times better.
A survey of profiles of IT professionals on LinkedIn revealed that percentage of professionals skilled in MongoDB was almost 50% as compared to other NoSQL skilled professionals. When it comes down to acceptance levels, MongoDB equals the sum of next 3 NoSQL databases put together. Rackspace, one of the pioneers to adopt MongoDB for their cloud solutions affirms ““MongoDB is the de facto choice for NoSQL applications”.
The reasons that MongoDB is being widely adopted by developers follow:
  • 1.MongoDB enhances productivity and it is easy to get started and to use.
  • 2.Owing to the removal of schema barrier, developers can now concentrate on developing applications rather than databases.
  • 3.MongoDB offers extensive support for an array of languages like C#, C, C++, Node.js, Scala, Javascript and Objective-C. These languages are pertinent to the future of the web.
Understanding how MongoDB teams up with Hadoop and Big Data technologies?
Of late, Technologists at MongoDB have successfully developed a MongoDB connector for Hadoop that facilitates enhanced integration combined with ease in execution of various tasks as below:
  • The MongoDB-Hadoop connector uses the authority of Hadoop’s MapReduce to live application data in MongoDB by extracting values from Big Data – speedily as well as efficiently.
  • The MongoDB-Hadoop connector projects it as ‘Hadoop compatible file system’ and MapReduce jobs can now be read directly from MongoDB, without being copied to the HDFS. Thus, doing away with the necessity of transferring terabytes of data across the network.
  • The “necessity” of scanning entire collections has been eliminated as MapReduce jobs can pass queries by means of filters and can harness MongoDB’s indexing abilities like text search, compound, array, Geo-spatial and sparse indexes.
  • Reading and writing back results from Hadoop jobs back to MongoDB in order to support queries and real time operational processes.
Scope of application - Hadoop and MongoDB
In context to Big Data stacks, MongoDB and Hadoop have the following scopes of application:
  • 1)MongoDB is used for the operational part – as a real time data store.
  • 2)Hadoop is used primarily for offline analysis and processing of batch data.
Scope of usage in Batch Aggregation

Image Credit: http://mobicon.tistory.com
When it comes to analyzing data, the inbuilt aggregation features incorporated in MongoDBhold good in the majority of situations. However, there are cases that require a higher degree of data aggregation. Under such circumstances, Hadoop provides a powerful support for complex analytics.
  • a)Hadoop, by means of single or multiple MapReduce jobs processes the data extracted from MongoDB. It is also possible to pull data from other locations in these MapReduce jobs in order to formulate a multi data solution.
  • b)The results received from MapReduce jobs can be written back to MongoDB and they can be used for analysis and querying as and when required.
  • c)MongoDB applications can thus make use of the data from batch analytics with a view of handing over to the end user or to facilitate other features down the line.
Scope of usage in Data Warehousing
In a usual production environment, application data with their specific functionality and language may exist in more than one data store. Under such complex situations, Hadoop is used as an integrated source for data - as well as a data warehouse.
  • a)MapReduce jobs transfer MongoDB data to Hadoop.
  • b)As soon as the data from MongoDB and other sources is available in Hadoop, the datasets can be queried.
  • c)At this stage data analysts can opt to use Pig or MapReduce for querying large datasets that includes data from MongoDB.
Owing to the above, MongoDB has emerged as the most preferred choice of developers. From the perspective of NoSQL databases, engineers at MongoDB have successfully integrated it with Hadoop. The MongoDB Hadoop permutation is extremely effective in solving quite a few architectural problems pertaining to data warehousing, processing, data retrieval and aggregating. 
source : http://www.dezyre.com/article/mongodb-and-hadoop/81#.VPFJWbPF8VE .

Comments

Popular posts from this blog

Export Internet Explorer Security Zone Information

Internet Explorer assigns a security zone to any website that the user is visiting. Next to the two generic zones Internet (all that are not in another zone), Local Intranet (local sites) are Trusted sites and Restricted Sites. Trusted sites will usually have a lower security level than restricted sites. One could for instance move financial sites or sites from companies like eBay or Amazon into the Trusted sites list. Restricted sites are those that should be accessed with lower permissions. Good for websites that need to be accessed but are not that trusted. If a user accesses the Internet with multiple computers he might want to use the same security zone settings on all of them. The easiest way to do that would be to export the security zones on one computer and export them to all others instead of adding sites to the zones manually on all computers. Internet Explorer is storing the security zone information in the Windows Registry. To export the settings of the currently log

The First 10 People Who Sign up On Facebook

The First 10 People Who Sign up On Facebook 10. Zach Bercu sphotos-b.xx.fbcdn.net “The past eight years have been extraordinary,” Bercu said. A graduate of Emory’s medical school, Bercu spent a year in Israel, where he became fluent in Hebrew. He completed his residency in New York, part of the last intern class at St. Vincent’s, whose “hospital infrastructure crumbled around me,” he remembered of the facility, which closed in 2010. Now a resident at Mount Sinai in radiology, Bercu plans to complete a fellowship in interventional radiology, a form of “micro-surgery.” From his undergraduate years, “whether through Facebook or in person,” Bercu says he “took with me some of the greatest friendships one could have.” 9. Manuel Antonio Aguilar publicogt.com Aguilar calls himself a social entrepreneur “focused on the base o

Shortcut key to align code in eclipse

While learning java or working with java, we may need to copy a code from other source like internet or other files. When we do that the code may looks messy like before. which is not in standard and difficult to work on complex logics. Below key is useful to align the messy code:       Ctrl + Shift + F:   Formats a selected block of code or a whole source file.  Format messy code to Java-standard code.  If a code block is not selected, Eclipse applies formatting for the whole file. #shortcut key to align code in eclipse #shortcut_key_to_align_code_in_eclipse #shortcut_key_to_format_code_in_eclipse #shortCut key to format code in eclipse #shortcut_key_to_code_ alignment_in_eclipse #Auto- Alignment Shortcut Key  in  Eclips e