网站首页  软件下载  游戏下载  翻译软件  电子书下载  电影下载  电视剧下载  教程攻略

请输入您要查询的图书:

 

书名 Java大数据分析(影印版)(英文版)
分类
作者 (美)拉贾特·梅塔
出版社 东南大学出版社
下载
简介
内容推荐
本书以使用Java对大数据进行基本的统计分析开篇,然后讨论如分类、回归、聚类、集成等其他数据分析主题。本书还涵盖了如推荐引擎、大规模图形分析、实时分析、深度学习等高级主题。
书中涵盖了各种案例研究,例如tweet数据集的情绪分析、针对MovieLens数据集的推荐、电子商务数据集的客户细分、真实航班数据集的图表分析。这本书是使用Java实现大数据分析的端到端指南。Java如今已经是主流大数据环境(包括Hadoop)的事实语言。本书将教你如何使用产品友好的、Java对大数据进行分析。全书内容基本上分为两部分。第一部分是入门知识,帮助读者熟悉大数据环境;第二部分包含对大数据分析中所有概念的核心讨论。它涵盖了数据分析和数据可视化、机器学习的核心概念和优势、通过朴素贝叶斯进行回归和分类的现实用法、对聚类概念的深入讨论并且回顾了使用deepLearning4j或普通的Java Spark代码基于大数据实现简单的神经网络。对于想要开始学习大数据分析并希望将其应用于现实世界的Java开发人员而言,这是一本必不可少的书籍。
作者简介
拉贾特·梅塔 is a VP (technical architect) in technology at JP Morgan Chase in New York. He is a Sun certified Java developer and has worked on Java-related technologies for more than 16 years. His current role for the past few years heavily involves the use of a big data stack and running analytics on it. He is alsoa contributor to various open source projects that are available on his GitHub repository, and is also a frequent writer for dev magazines.
目录
Preface
Chapter 1:Big Data Analytics with Java
Why data analytics on big data?
Big data for analytics
Big data - a bigger pay package for Java developers
Basics of Hadoop - a Java sub-project
Distributed computing on Hadoop
HDFS concepts
Design and architecture of HDFS
Main components of HDFS
HDFS simple commands
Apache Spark
Concepts
Transformations
Actions
Spark Java API
Spark samples using Java 8
Loading data
Data operations - cleansing and munging
Analyzing data - count, projection, grouping, aggregation, and max/min
Actions on RDDs
Paired RDDs
Saving data
Collecting and printing results
Executing Spark programs on Hadoop
Apache Spark sub-projects
Spark machine learning modules
Mahout - a popular Java ML library
Deeplearning4j - a deep learning library
Summary
Chapter 2: First Steps in Data Analysis
Datasets
Data cleaning and munging
Basic analysis of data with Spark SQL
Building SparkConf and context
Dataframe and datasets
Load and parse data
Analyzing data - the Spark-SQL way
Spark SQL for data exploration and analytics
Market basket analysis - Apriori algorithm
Implementation of the Apriori algorithm in Apache Spark
Efficient market basket analysis using FP-Growth algorithm
Running FP-Growth on Apache Spark
Summary
Chapter 3: Data Visualization
Data visualization with Java JFreeChart
Using charts in big data analytics
Time Series chart
All India seasonal and annual average temperature series dataset
Simple single Time Series chart
Multiple Time Series on a single chart window
Bar charts
Histograms
When would you use a histogram?
How to make histograms using JFreeChart?
Line charts
Scatter plots
Box plots
Advanced visualization technique
Prefuse
IVTK Graph toolkit
Other libraries
Summary
Chapter 4: Basics of Machine Learning
What is machine learning?
Real-life examples of machine learning
Type of machine learning
A small sample case study of supervised and unsupervised learning
Steps for machine learning problems
Choosing the machine learning model
What are the feature types that can be extracted from the datasets?
How do you select the best features to train your models?
How do you run machine learning analytics on big data?
Getting and preparing data in Hadoop
Training and storing models on big data
Apache Spark machine learning API
Summary
Chapter 5: Regression on Big Data
Linear regression
What is simple linear regression?
Where is linear regression used?
Logistic regression
Which mathematical functions does logistic regression use?
Where is logistic regression used?
Predicting heart disease using logistic regression
Summary
Chapter 6: Naive Bayes and Sentiment Analysis
Conditional probability
Bayes theorem
Naive Bayes algorithm
Advantages of Naive Bayes
Disadvantages of Naive Bayes
Sentimental analysis
Concepts for sentimental analysis
Tokenization
Stop words removal
Stemming
N-grams
Term presence and Term Frequency
TF-IDF
Bag of words
Dataset
Data exploration of text data
Sentimental analysis on this dataset
SVM or Support Vector Machine
Summary
Chapter 7: Decision Trees
What is a decision tree?
Building a decision tree
Choosing the best features for splitting the datasets
Dataset
Data exploration
Cleaning and munging the data
Training and testing the model
Summary
Chapter 8: Ensembling on Big Data
Ensembling
Types of ensembling
Bagging
Boosting
Advantages and disadvantages of ensembling
Random forests
Gradient boosted trees (GBTs)
Classification problem and dataset used
Data exploration
Training and testing our random forest model
Training and testing our gradient boosted tree model
Summary
Chapter 9: Recommendation Systems
Recommendation systems and
随便看

 

霍普软件下载网电子书栏目提供海量电子书在线免费阅读及下载。

 

Copyright © 2002-2024 101bt.net All Rights Reserved
更新时间:2025/2/22 13:13:34