The Big Data Hadoop course has been designed to impart an in-depth knowledge of Big Data processing using Hadoop. The course is packed with real-life projects and case studies to be executed in the Cloud Lab. This program covers Big Data Analytics process involved in storing, processing and managing Big Data – both structured and unstructured data, as well as the data analytics layer on top of Big Data systems, using both more traditional predictive models by connecting an analytics tool like R to Big Data Systems.
Linux overview and directory structure
- Installation of ubuntu
- Linux commands
- Java and python Installation
- Big Data Overview
- Hadoop Architecture & Components
- Hadoop Configuration
- Hadoop Processing – Map Reduce & HDFS
- Python
- Map Reduce with python
- Pig
- Pig’s Data Model
- Pig Functions
- Input and Output formats to MR program
- Case Study
- Overview of R, R data types and objects, reading and writing data
- Control structures, functions
- Loop functions, Simulation
- Database Connectivity
- Introduction to Scala
- Creating a Scala Project
- Classes, Objects and Methods
- Scala GUI and Connectivity
- Spark Overview
- RDD(Resilient Distributed Datasets) Fundamentals
- Cluster Architectures for Spark
- Spark Job Execution
- Introduction to MongoDB
- MongoDB API
- Indexing and Data Modeling
- Connection with Python
- Rest API
- Introduction to Cassandra
- Architecture of Cassandra and Configuration
- Cassandra Data Model
- CQL
- Connection with python
- Kafka
- Kafka Streaming