![]()
内容推荐 如果你想构建一款使用自然语言文本的企业级应用,但不确定从哪里着手或者该使用什么工具,这本实用指南可以助你一臂之力。Wisecube首席数据科学家Alex Thomas向软件工程师和数据科学家们展示了如何使用深度学习和Apache Spark NLP库构建可扩展的自然语言处理(Natural Language Processing,NLP)应用。 通过具体的示例、实践和理论解释,以及在Spark处理框架上使用NLP进行的动手练习,本书将教授你从基本语言学和书写系统到情感分析和搜索引擎的一切。除此之外,你还将探究开发基于文本的应用时要特别注意的性能等问题。 在本书的四个部分中,你将学习到NLP基础知识和基本构成要素,然后再深入研究应用和系统构建: 基础:理解自然语言处理、Apache Stark上的NLP及深度学习的基础知识。 基本构成要素:学习包括标记化、句子分割和命名实体识别在内的NLP应用构建技术,知晓其工作方式及工作原理。 应用:探究构建你自己的NLP应用所涉及的设计、开发和实验过程。 构建NLP系统:考虑生产和部署NLP模型的备选方案,包括支持哪些人类语言。 作者简介 亚历克斯·托马斯是Wisecube的首席数据科学家。他将自然语言处理和机器学习运用于临床数据、身份数据、雇主和求职者数据以及如今的生化数据。Alex从09版本开始使用Apache Spark,在工作中也用过包括UIMA和OpenNLP在内的多种NLP库和框架。 目录 Preface Part I. Basics 1. Getting Started Introduction Other Tools Setting Up Your Environment Prerequisites Starting Apache Spark Checking Out the Code Getting Familiar with Apache Spark Starting Apache Spark with Spark NLP Loading and Viewing Data in Apache Spark Hello World with Spark NLP 2. Natural Language Basics What Is Natural Language? Origins of Language Spoken Language Versus Written Language Linguistics Phonetics and Phonology Morphology Syntax Semantics Sociolinguistics: Dialects, Registers, and Other Varieties Formality Context Pragmatics Roman ]akobson How To Use Pragmatics Writing Systems Origins Alphabets Abiads Abugidas Syllabaries Logographs Encodings ASCII Unicode UTF-8 Exercises: Tokenizing Tokenize English Tokenize Greek Tokenize Ge'ez (Amharic) Resources 3. NLP on Apache Spark Parallelism, Concurrency, Distributing Computation Parallelization Before Apache Hadoop MapReduce and Apache Hadoop Apache Spark Architecture of Apache Spark Physical Architecture Logical Architecture Spark SQL and Spark MLlib Transformers Estimators and Models Evaluators NLP Libraries Functionality Libraries Annotation Libraries NLP in Other Libraries Spark NLP Annotation Library Stages Pretrained Pipelines Finisher Exercises: Build a Topic Model Resources 4. Deep Learning Basics Gradient Descent Backpropagation Convolutional Neural Networks Filters Pooling Recurrent Neural Networks Backpropagation Through Time Elman Nets LSTMs Exercise 1 Exercise 2 Resources Part II. Building Blocks 5. Processing Words 6. Information Retrieval 7. Classification and Regression 8. Sequence Modeling with Keras 9. Information Extraction 10. Topic Modeling 11. Word Embeddings Part III. Applications 12. Sentiment Analysis and Emotion Detection 13. Building Knowledqe Bases 14. Search Engine 15. Chatbot 16. Object Character Recognition Part IV. Building NLP Systems 17. Supporting Multiple Languages 18. Human Labeling 19. Productionizing NLP Applications Glossary Index |