Big Data
The Certified Big Data Foundation Specialist (CBDFS) designation is a globally recognized certification for Big Data professionals. Being CBDFS showcases your experience in a cloud environment, and demonstrates the relevant skills and knowledge. Organizations that employ CBDFS will have experts on board that can help maximize the business opportunities that cloud is creating. The Certified Big Data Foundation Specialist (CBDFS) Certification is the certification awarded to individuals who have successfully passed exam. After completing our e-course, you will be equipped not only with Fundamental Big Data knowledge, but will also be introduced. This practical knowledge can be used as a starting point in the organization Big Data journey. The Big Data Foundation E-Course contains eLearning a study guide eBook and online course, and is delivered to you via our eLearning portal, giving you the freedom to access it anytime, whether at home or in the office.
Curriculum
Understanding Big Data and Hadoop
- Introduction to Big Data & Big Data Challenges
- Limitations & Solutions of Big Data Architecture
- Hadoop & its Features
- Hadoop Ecosystem
- Hadoop 2.x Core Components
- Hadoop Storage: HDFS (Hadoop Distributed File System)
- Hadoop Processing: MapReduce Framework
- Different Hadoop Distributions
Hadoop Architecture and HDFS
- Hadoop 2.x Cluster Architecture
- Federation and High Availability Architecture
- Typical Production Hadoop Cluster
- Hadoop Cluster Modes
- Common Hadoop Shell Commands
- Hadoop 2.x Configuration Files
- Single Node Cluster & Multi-Node Cluster set up
- Basic Hadoop Administration
Hadoop MapReduce Framework
- Traditional way vs MapReduce way
- Why MapReduce
- YARN Components
- YARN Architecture
- YARN MapReduce Application Execution Flow
- YARN Workflow
- Anatomy of MapReduce Program
- Input Splits, Relation between Input Splits and HDFS Blocks
- MapReduce: Combiner & Partitioner
- Demo of Health Care Dataset
- Demo of Weather Dataset
Advanced Hadoop MapReduce
- Counters
- Distributed Cache
- MRunit
- Reduce Join
- Custom Input Format
- Sequence Input Format
- XML file Parsing using MapReduce
Apache Pig
- Introduction to Apache Pig
- MapReduce vs Pig
- Pig Components & Pig Execution
- Pig Data Types & Data Models in Pig
- Pig Latin Programs
- Shell and Utility Commands
- Pig UDF & Pig Streaming
- Testing Pig scripts with Punit
- Aviation use-case in PIG
- Pig Demo of Healthcare Dataset
Apache Hive
- Introduction to Apache Hive
- Hive vs Pig
- Hive Architecture and Components
- Hive Metastore
- Limitations of Hive
- Comparison with Traditional Database
- Hive Data Types and Data Models
- Hive Partition
- Hive Bucketing
- Hive Tables (Managed Tables and External Tables)
- Importing Data
- Querying Data & Managing Outputs
- Hive Script & Hive UDF
- Retail use case in Hive
- Hive Demo on Healthcare Dataset
Advanced Apache Hive and HBase
- Hive QL: Joining Tables, Dynamic Partitioning
- Custom MapReduce Scripts
- Hive Indexes and views
- Hive Query Optimizers
- Hive Thrift Server
- Hive UDF
- HBase v/s RDBMS
- HBase Components
- HBase Architecture
- HBase Run Modes
- HBase Configuration
- HBase Cluster Deployment
Advanced Apache HBase
- HBase Data Model
- HBase Shell
- HBase Client API
- Hive Data Loading Techniques
- Apache Zookeeper Introduction
- ZooKeeper Data Model
- Zookeeper Service
- HBase Bulk Loading
- Getting and Inserting Data
- HBase Filters
Processing Distributed Data with Apache Spark
- What is Spark
- Spark Ecosystem
- Spark Components
- What is Scala
- Why Scala
- SparkContext
- Spark RDD
Oozie and Hadoop Project
- Oozie
- Oozie Components
- Oozie Workflow
- Scheduling Jobs with Oozie Scheduler
- Demo of Oozie Workflow
- Oozie Coordinator
- Oozie Commands
- Oozie Web Console
- Oozie for MapReduce
- Combining flow of MapReduce Jobs
- Hive in Oozie
- Hadoop Project Demo
- Hadoop Talend Integration
Get Free Career Guidance
Syllabus
Foundation
Execution and Implementation
Management
Big Data Solutions
Analytics and Big Data
Cloud Technologies
Target Audience
Best suited to Information Technology professionals who possess intermediate to advanced programming, systems administration or relational database skills and are looking to move into the area of Big Data. These include
- Software Engineersli
- Application Developers
- IT Architects
- System Administrators
- The course can also be of benefit to other professionals, e.g. business analysts, market/data researchers, etc. who possess strong information Technology skills and have a deep interest in Big Data analytics and the benefits it can bring to an organization.