Courses

Home / courses

Emerging Technology/Certified Big Data Science Analyst

COURSE OBJECTIVES

The statistic shows a revenue forecast for the global big data industry from 2011 to 2026. For 2017, the source projects the global big data market size to grow to just under 34 billion U.S. dollars in revenue (https://www.statista.com/statistics/254266/global-big-data-market-forecast/)

The creation and consumption of data continues to grow by leaps and bounds and with it the investment in big data analytics hardware, software, and services and in data scientists and their continuing education. The availability of very large data sets is one of the reasons Deep Learning, a sub-set of artificial intelligence (AI), has recently emerged as the hottest tech trend, with Google, Facebook, Baidu, Amazon, IBM, Intel, and Microsoft, all with very deep pockets, investing in acquiring talent and releasing open AI hardware and software.

This course will transfer Technical know-how about the concept of Business Analytics and its importance in today’s market. Participants will acquire knowledge on different Data Mining techniques and tool (RapidMiner). This course objective is to introduce participants about the Big Data Solution (Hadoop) and the components working on top of Hadoop (HBase, Hive).

By the end of this course, participants will have a good understanding of how the Big Data storage and processing works to accomplish today’s growing need to work on all variety and volume of data. As part of the course, participant will be given a case study and it would cover all the aspects of the Business Analytics and Big Data covered in the course. Participants will be required to give a solution to the problem using all components taught in the course.

JOB ROLES IN NICF / TARGETED AUDIENCE
• Data Analyst - Statistics and Mining
• Big Data Analyst
• Operations Research Analyst
• Data Scientist
• IHL students

Certified Big Data Science Analyst program is a 5-day intensive training program with the following assessment components.
Component 1. Written Examination
Component 2. Project Work Component (PWC)
These components are individual based. Participants will need to obtain 70% in both the components in order to qualify for this certification. If the participant fails one of the components, they will not pass the course and have to re-take that particular failed component. If they fail both components, they will have to re-take the assessment.

COURSE OUTLINE
Unit 1: Introduction to Business Analytics

  1. The concept of Business Analytics
  2. Data, Information, Knowledge and Wisdom
  3. Data as Unique Enterprise Asset
  4. Data, Information and Analytics Lifecycle
  5. Business Analytics – Current Context
  6. Types of Analytics
    1. Descriptive Analytics
    2. Predictive Analytics
    3. Prescriptive Analytics

Unit 2: Data/Information Architecture for Business Analytics

  1. Data/Information Architecture
  2. Concept of Data Warehouse/Enterprise Data Warehouse (EDW)
  3. ETL – Key Process
  4. Concept of Data Mart
  5. Business Intelligence
  6. Data Mining

Unit 3: Data Mining Tool

  1. Understand the open source DM tool RapidMiner
  2. Explore the various features of RapidMiner
  3. Walkthrough a RapidMiner demo with different scenarios

Unit 4: Data Mining Techniques

  1. Understand the various data mining techniques
  2. Understand how correlation matrix works
  3. Understand how association rule mining works
  4. Understanding the Predictive Analytics technique
  5. Understand the forecasting technique

Unit 5: Introduction to Big Data

  1. What is Big Data? Why Big Data?
  2. 3V’s of Big Data
  3. The Rapid Growth of Unstructured Data
  4. Big Data Market Forecast
  5. Big Data Analytics
  6. Big Data in Business
  7. Big Data Types & Architecture

Unit 6: Introduction to Hadoop

  1. Big Data – Current Industry Trends
  2. Why Process Big Data?
  3. Challenges in Data Processing
  4. Why Hadoop?
  5. What is Hadoop offering?
  6. Hadoop Network Structure
  7. Hadoop Eco-System
  8. Hadoop Core Components
  9. Hadoop – Features
  10. Hadoop – Relevance
  11. Hadoop in Action
  12. Sqoop import and export

Unit 7: Hadoop HDFS & MapReduce

  1. Hadoop HDFS
  2. What does HDFS Facilitate?
  3. HDFS Architecture
  4. Hadoop Network and Server Infrastructure
  5. NameNode, Secondary NameNode and DataNode
  6. Ensuring Data Correctness
  7. Data Pipelining while Loading Data
  8. fs Operations
  9. Hadoop MapReduce
  10. MapReduce Conceptualization
  11. MapReduce – Overview
  12. MapReduce – Programming Model
  13. MapReduce – Execution Overview
  14. Hadoop – Application Examples
  15. Word Count – Example

Unit 8: Apache HBase

  1. What is HBase?
  2. HBase Architecture
  3. ZooKeeper
  4. HBase Data model
  5. HBase Deployment
  6. HBase Cluster Architecture
  7. Indexes in HBase
  8. Scaling HBase
  9. Data Locality, Coherence and Concurrency, Fault Tolerance
  10. Hadoop Integration
  11. High-Level Architecture
  12. Replication of Data Across Data Centres
  13. HBase Applications
  14. Advantages and Disadvantages

Unit 9: Apache Hive

  1. What is Hive?
  2. Why Hive?
  3. Where to use Hive?
  4. Hive Architecture
  5. Hive: Benefits
  6. Hive: Tradeoffs
  7. Hive: Real world Examples
# Fee structure Duration Fee (HKD) Reserve Seat View Dates

Enquire Now

Thank You for Enquiry. We will get back to you shortly