Skip to main content

Apache Hadoop

  • Apache Hadoop is an open-source software framework written in Java for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware.
  •  All the modules in Hadoop are designed with a fundamental assumption that hardware failures (of individual machines, or racks of machines) are commonplace and thus should be automatically handled in software by the framework. 
  • The core of Apache Hadoop consists of a storage part (Hadoop Distributed File System (HDFS)) and a processing part (MapReduce). 
  •   To process the data, Hadoop MapReduce transfers packaged code for nodes to process in parallel, based on the data each node needs to process.  

The base Apache Hadoop framework is composed of the following modules:
    • Hadoop Common – contains libraries and utilities needed by other Hadoop modules;
    • Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster;
    • Hadoop YARN – a resource-management platform responsible for managing computing resources in clusters and using them for scheduling of users' applications
    • Hadoop MapReduce – a programming model for large scale data processing.

    The Hadoop framework itself is mostly written in the Java programming language, with some native code in C and command line utilities written as Shell script. For end-users, though MapReduce Java code is common, any programming language can be used with "Hadoop Streaming" to implement the "map" and "reduce" parts of the user's program. Other related projects expose other higher-level user interfaces.

    Comments

    Popular posts from this blog

    Shortcut key to align code in eclipse

    While learning java or working with java, we may need to copy a code from other source like internet or other files. When we do that the code may looks messy like before. which is not in standard and difficult to work on complex logics. Below key is useful to align the messy code:       Ctrl + Shift + F:   Formats a selected block of code or a whole source file.  Format messy code to Java-standard code.  If a code block is not selected, Eclipse applies formatting for the whole file. #shortcut key to align code in eclipse #shortcut_key_to_align_code_in_eclipse #shortcut_key_to_format_code_in_eclipse #shortCut key to format code in eclipse #shortcut_key_to_code_ alignment_in_eclipse #Auto- Alignment Shortcut Key  in  Eclips e

    MongoDB and Hadoop

    Traditional relational databases were ruling the roost until datasets were being reckoned in megabytes and gigabytes. However, as organizations around the world kept growing, a tsunami called “Big Data” rendered the old technologies unfeasible. When it came to data storage and retrieval, these technologies simply crumbled under the burden of such colossal amounts of data. Thanks to Hadoop, Hive and Hbase, these popular technologies now have the capability of handling large sets of raw unstructured data, efficiently, as well as economically. Image Credit :  http://www.compassitesinc.com Another aftermath of the above problems was the parallel advent of “Not Only SQL” or NoSQL databases. The primary advantage of the  NoSQL databases  is their mechanism that facilitates the storage and retrieval of data in the loser consistency model along with added benefits like horizontal scaling, better availability and quicker access. With its implementation in over five hun...

    Google Translate adds 20 new languages to video text translation

    The Google Translate app is about to get a lot more powerful. In an update released today, Google is adding dozens of new languages to some of the Translate app's most powerful features, and smoothing out the app to make it friendlier to slow connections. In particular, the update makes Translate's visual translation features significantly more powerful, letting mobile users translate 37 languages via photo, 32 via voice, and 27 through real-time video. Today's changes are updates to two features added to translate back in January: a real-time video translation feature called Word Lens and a conversation feature that translates bilingual speech in real-time. The Word Lens feature lets you point your phone's camera at a sign or any other text and have it translated into another language, with the translation appearing immediately on your screen. It's designed to work entirely offline, without making any queries to Google's servers — convenient for those b...