Emerging Technology/Certified Big Data Science Analyst
The statistic shows a revenue forecast for the global big data industry from 2011 to 2026. For 2017, the source projects the global big data market size to grow to just under 34 billion U.S. dollars in revenue (https://www.statista.com/statistics/254266/global-big-data-market-forecast/)
The creation and consumption of data continues to grow by leaps and bounds and with it the investment in big data analytics hardware, software, and services and in data scientists and their continuing education. The availability of very large data sets is one of the reasons Deep Learning, a sub-set of artificial intelligence (AI), has recently emerged as the hottest tech trend, with Google, Facebook, Baidu, Amazon, IBM, Intel, and Microsoft, all with very deep pockets, investing in acquiring talent and releasing open AI hardware and software.
This course will transfer Technical know-how about the concept of Business Analytics and its importance in today’s market. Participants will acquire knowledge on different Data Mining techniques and tool (RapidMiner). This course objective is to introduce participants about the Big Data Solution (Hadoop) and the components working on top of Hadoop (HBase, Hive).
By the end of this course, participants will have a good understanding of how the Big Data storage and processing works to accomplish today’s growing need to work on all variety and volume of data. As part of the course, participant will be given a case study and it would cover all the aspects of the Business Analytics and Big Data covered in the course. Participants will be required to give a solution to the problem using all components taught in the course.
JOB ROLES IN NICF / TARGETED AUDIENCE
• Data Analyst - Statistics and Mining
• Big Data Analyst
• Operations Research Analyst
• Data Scientist
• IHL students
Certified Big Data Science Analyst program is a 5-day intensive training program with the following assessment components.
Component 1. Written Examination
Component 2. Project Work Component (PWC)
These components are individual based. Participants will need to obtain 70% in both the components in order to qualify for this certification. If the participant fails one of the components, they will not pass the course and have to re-take that particular failed component. If they fail both components, they will have to re-take the assessment.
Unit 1: Introduction to Business Analytics
- The concept of Business Analytics
- Data, Information, Knowledge and Wisdom
- Data as Unique Enterprise Asset
- Data, Information and Analytics Lifecycle
- Business Analytics – Current Context
- Types of Analytics
- Descriptive Analytics
- Predictive Analytics
- Prescriptive Analytics
Unit 2: Data/Information Architecture for Business Analytics
- Data/Information Architecture
- Concept of Data Warehouse/Enterprise Data Warehouse (EDW)
- ETL – Key Process
- Concept of Data Mart
- Business Intelligence
- Data Mining
Unit 3: Data Mining Tool
- Understand the open source DM tool RapidMiner
- Explore the various features of RapidMiner
- Walkthrough a RapidMiner demo with different scenarios
Unit 4: Data Mining Techniques
- Understand the various data mining techniques
- Understand how correlation matrix works
- Understand how association rule mining works
- Understanding the Predictive Analytics technique
- Understand the forecasting technique
Unit 5: Introduction to Big Data
- What is Big Data? Why Big Data?
- 3V’s of Big Data
- The Rapid Growth of Unstructured Data
- Big Data Market Forecast
- Big Data Analytics
- Big Data in Business
- Big Data Types & Architecture
Unit 6: Introduction to Hadoop
- Big Data – Current Industry Trends
- Why Process Big Data?
- Challenges in Data Processing
- Why Hadoop?
- What is Hadoop offering?
- Hadoop Network Structure
- Hadoop Eco-System
- Hadoop Core Components
- Hadoop – Features
- Hadoop – Relevance
- Hadoop in Action
- Sqoop import and export
Unit 7: Hadoop HDFS & MapReduce
- Hadoop HDFS
- What does HDFS Facilitate?
- HDFS Architecture
- Hadoop Network and Server Infrastructure
- NameNode, Secondary NameNode and DataNode
- Ensuring Data Correctness
- Data Pipelining while Loading Data
- fs Operations
- Hadoop MapReduce
- MapReduce Conceptualization
- MapReduce – Overview
- MapReduce – Programming Model
- MapReduce – Execution Overview
- Hadoop – Application Examples
- Word Count – Example
Unit 8: Apache HBase
- What is HBase?
- HBase Architecture
- HBase Data model
- HBase Deployment
- HBase Cluster Architecture
- Indexes in HBase
- Scaling HBase
- Data Locality, Coherence and Concurrency, Fault Tolerance
- Hadoop Integration
- High-Level Architecture
- Replication of Data Across Data Centres
- HBase Applications
- Advantages and Disadvantages
Unit 9: Apache Hive
- What is Hive?
- Why Hive?
- Where to use Hive?
- Hive Architecture
- Hive: Benefits
- Hive: Tradeoffs
- Hive: Real world Examples
|#||Fee structure||Duration||Fee (HKD)||Reserve Seat||View Dates|