| 内容推荐     数据分析是包含检查、清洗、转化和建模的整个过程,旨在发现有用的信息。Java是实现数据分析任务的最流行语言之一。约翰·R.哈伯德著的这本《Java数据分析(影印版)(英文版)》将提供数据科学和相关流程步骤的快速概览。你将从中学到统计数据分析技巧,并通过流行的Java API和类库把它们实现。你还能在实际案例中学到诸如分类和回归之类的机器学习概念。
 在这个过程中,你将熟悉RapidMinet和Weka等工具,了解这些Java工具如何更有效地用于分析。还会学到如何与关系型、NoSQL和时间序列数据打交道。本书也将介绍如何利用不同的Java类库创建富有洞见又容易理解的图表。
 学完本书,你将对多种数据分析技巧和相应的Java实现拥有扎实的基础知识。
 目录 PrefaceChapter 1: Introduction to Data Analysis
 Origins of data analysis
 The scientific method
 Actuarial science
 Calculated by steam
 A spectacular example
 Herman Hollerith
 ENIAC
 VisiCalc
 Data, information, and knowledge
 Why Java?
 Java Integrated Development Environments
 Summary
 Chapter 2: Data Pre_processing
 Data types
 Variables
 Data points and datasets
 Null values
 Relational database tables
 Key fields
 Key-value pairs
 Hash tables
 File formats
 Microsoft Excel data
 XML and JSON data
 Generating test datasets
 Metadata
 Data cleaning
 Data scaling
 Data filtering
 Sorting
 Merging
 Hashing
 Summary
 Chapter 3: Data Visualization
 Tables and graphs
 Scatter plots
 Line graphs
 Bar charts
 Histograms
 Time series
 Java implementation
 Moving average
 Data ranking
 Frequency distributions
 The normal distribution
 A thought experiment
 The exponential distribution
 Java example
 Summary
 Chapter 4: Statistics
 Descriptive statistics
 Random sampling
 Random variables
 Probability distributions
 Cumulative distributions
 The binomial distribution
 Multivariate distributions
 Conditional probability
 The independence of probabilistic events
 Contingency tables
 Bayes' theorem
 Covariance and correlation
 The standard normal distribution
 The central limit theorem
 Confidence intervals
 Hypothesis testing
 Summary
 Chapter 5: Relational Databases
 The relation data model
 Relational databases
 Foreign keys
 Relational database design
 Creating a database
 SQL commands
 Inserting data into the database
 Database queries
 SQL data types
 JDBC
 Using a JDBC PreparedStatement
 Batch processing
 Database views
 Subqueries
 Table indexes
 Summary
 Chapter 6: Regression Analysis
 Linear regression
 Linear regression in Excel
 Computing the regression coefficients
 Variation statistics
 Java implementation of linear regression
 Anscombe's quartet
 Polynomial regression
 Multiple linear regression
 The Apache Commons implementation
 Curve fitting
 Summary
 Chapter 7: Classification Analysis
 Decision trees
 What does entropy have to do with it?
 The ID3 algorithm
 Java Implementation of the ID3 algorithm
 The Weka platform
 The ARFF filetype for data
 Java implementation with Weka
 Bayesian classifiers
 Java implementation with Weka
 Support vector machine algorithms
 Logistic regression
 K-Nearest Neighbors
 Fuzzy classification algorithms
 Summary
 Chapter 8: Cluster Analysis
 Measuring distances
 The curse of dimensionality
 Hierarchical clustering
 Weka implementation
 K-means clustering
 K-mecloids clustering
 Affinity propagation clustering
 Summary
 Chapter 9: Recommender Systems
 Utility matrices
 Similarity measures
 Cosine similarity
 A simple recommender system
 Amazon's item-to-item collaborative filtering recommender
 Implementing user ratings
 Large sparse matrices
 Using random access files
 The Netflix prize
 Summary
 Chapter 10: NoSQL Databases
 The Map data structure
 SQL versus NoSQL
 The Mongo database system
 The Library database
 Java development with MongoDB
 The MongoDB extension for geospatial databases
 Indexing in MongoDB
 Why NoSQL and why MongoDB?
 Other NoSQL database systems
 Summary
 Chapter 11:Data Analysis with Java
 Scaling, data striping, and sharding
 Google's PageRank algorithm
 Google's MapReduce framework
 Some examples of MapReduce applications
 The WordCount example
 Scalability
 Matrix multiplication with MapReduce
 MapReduce in MongoDB
 Apache Hadoop
 Hadoop MapReduce
 Summary
 Appendix: Java Tools
 The command line
 Java
 NetBeans
 MySQL
 MySQL Workbench
 Accessing the MySQL database from NetBeans
 The Apache Commons Math Library
 The javax JSON Library
 The Weka librari
 |