Hadoop 101

less than 1 minute read

Hadoop uses MR, structure unstructured semistructured

Large collection of data grows quickly, cannot be stored,

80% Unsturctred fb 700 TB, Twitter

Hadoop OS projects

Hadoop is neither OLAP nor OLTP.

OLTP which is Online transaction Processing

OLAP which is Online Analytical Processing

The difference between both is that OLAP is the reporting engine while OLTP is purely a business process engine.

Rack is a 3D or 4D collection of nodes. Node is a simple computer. Hadoop cluster is a collection of Racks.

Hadoop Architecture: Two main components

  • Distributed File System
    • HDFS
    • IBM Spectrum Scale -Map Reduce Enginer
    • Framework for perfoming calculations on the data in the file system
    • Has a built in resource manager and scheduler

YARN: Yet Another Resource Negotiator

POSIX: The Portable Operating System Interface (POSIX)[1] is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines the application programming interface (API), along with command line shells and utility interfaces, for software compatibility with variants of Unix and other operating systems.[2][3]

Map Reduce: Distributed MergeSort Engine

Categories:

Updated: