Essential Guide

Managing Hadoop projects: What you need to know to succeed

A comprehensive collection of articles, videos and more, hand-picked by our editors

Hadoop Distributed File System (HDFS)

HDFS is a distributed file system that provides high-performance access to data across Hadoop clusters. Like other Hadoop-related technologies, HDFS has become a key tool for managing pools of big data and supporting big data analytics applications.

The Hadoop Distributed File System (HDFS) is the primary storage system used by Hadoop applications.

HDFS is a distributed file system that provides high-performance access to data across Hadoop clusters. Like other Hadoop-related technologies, HDFS has become a key tool for managing pools of big data and supporting big data analytics applications.

Because HDFS typically is deployed on low-cost commodity hardware, server failures are common. The file system is designed to be highly fault-tolerant, however, by facilitating the rapid transfer of data between compute nodes and enabling Hadoop systems to continue running if a node fails. That decreases the risk of catastrophic failure, even in the event that numerous nodes fail.

When HDFS takes in data, it breaks the information down into separate pieces and distributes them to different nodes in a cluster, allowing for parallel processing. The file system also copies each piece of data multiple times and distributes the copies to individual nodes, placing at least one copy on a different server rack than the others. As a result, the data on nodes that crash can be found elsewhere within a cluster, which allows processing to continue while the failure is resolved.

HDFS is built to support applications with large data sets, including individual files that reach into the terabytes. It uses a master/slave architecture, with each cluster consisting of a single NameNode that manages file system operations and supporting DataNodes that manage data storage on individual compute nodes.

This was first published in July 2013

Continue Reading About Hadoop Distributed File System (HDFS)

Glossary

'Hadoop Distributed File System (HDFS)' is part of the:

View All Definitions

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

Essential Guide

Using big data and Hadoop 2: New version enables new applications

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

-ADS BY GOOGLE

File Extensions and File Formats

Powered by:

SearchDataManagement

SearchAWS

SearchContentManagement

SearchCRM

SearchOracle

SearchSAP

SearchSQLServer

Close