Pro Hadoop Data Analytics (eBook, PDF) - Koitzsch, Kerry
36,95 €
36,95 €
inkl. MwSt.
Sofort per Download lieferbar
36,95 €
inkl. MwSt.
Sofort per Download lieferbar

Alle Infos zum eBook verschenken
Als Download kaufen
36,95 €
inkl. MwSt.
Sofort per Download lieferbar
Abo Download
9,90 € / Monat*
*Abopreis beinhaltet vier eBooks, die aus der tolino select Titelauswahl im Abo geladen werden können.

inkl. MwSt.
Sofort per Download lieferbar

Einmalig pro Kunde einen Monat kostenlos testen (danach 9,90 € pro Monat), jeden Monat 4 aus 40 Titeln wählen, monatlich kündbar.

Mehr zum tolino select eBook-Abo
Jetzt verschenken
36,95 €
inkl. MwSt.
Sofort per Download lieferbar

Alle Infos zum eBook verschenken
18 °P sammeln

  • Format: PDF



Learn advanced analytical techniques and leverage existing tool kits to make your analytic applications more powerful, precise, and efficient. This book provides the right combination of architecture, design, and implementation information to create analytical systems that go beyond the basics of classification, clustering, and recommendation.
Pro Hadoop Data Analytics emphasizes best practices to ensure coherent, efficient development. A complete example system will be developed using standard third-party components that consist of the tool kits, libraries, visualization and
…mehr

  • Geräte: PC
  • ohne Kopierschutz
  • eBook Hilfe
  • Größe: 22.68MB
Produktbeschreibung


Learn advanced analytical techniques and leverage existing tool kits to make your analytic applications more powerful, precise, and efficient. This book provides the right combination of architecture, design, and implementation information to create analytical systems that go beyond the basics of classification, clustering, and recommendation.

Pro Hadoop Data Analytics emphasizes best practices to ensure coherent, efficient development. A complete example system will be developed using standard third-party components that consist of the tool kits, libraries, visualization and reporting code, as well as support glue to provide a working and extensible end-to-end system.

The book also highlights the importance of end-to-end, flexible, configurable, high-performance data pipeline systems with analytical components as well as appropriate visualization results. You'll discover the importance of mix-and-match or hybrid systems, using different analytical components in one application. This hybrid approach will be prominent in the examples.

What You'll Learn

  • Build big data analytic systems with the Hadoop ecosystem
  • Use libraries, tool kits, and algorithms to make development easier and more effective
  • Apply metrics to measure performance and efficiency of components and systems
  • Connect to standard relational databases, noSQL data sources, and more
  • Follow case studies with example components to create your own systems
Who This Book Is For
Software engineers, architects, and data scientists with an interest in the design and implementation of big data analytical systems using Hadoop, the Hadoop ecosystem, and other associated technologies.

Dieser Download kann aus rechtlichen Gründen nur mit Rechnungsadresse in A, B, BG, CY, CZ, D, DK, EW, E, FIN, F, GB, GR, HR, H, IRL, I, LT, L, LR, M, NL, PL, P, R, S, SLO, SK ausgeliefert werden.

  • Produktdetails
  • Verlag: Springer-Verlag GmbH
  • Erscheinungstermin: 29.12.2016
  • Englisch
  • ISBN-13: 9781484219102
  • Artikelnr.: 53059884
Autorenporträt
Kerry Koitzsch is a software engineer and interested in the early history of science, particularly chemistry. He frequently publishes papers and attends conferences on scientific and historical topics, including early chemistry and alchemy, and sociology of science. He has presented many lectures, talks, and demonstrations on a variety of subjects for the United States Army, the Society for Utopian Studies, American Association for Artificial Intelligence (AAAI), Association for Studies in Esotericism (ASE), and others. He has also published several papers and written two historical books.
Kerry was educated at Interlochen Arts Academy, MIT, and the San Francisco Conservatory of Music. He served in the United States Army and United States Army Reserve, and is the recipient of the United States Army Achievement Medal. He has been a software engineer specializing in computer vision, machine learning, and database technologies for 30 years, and currently lives and works in Sunnyvale, California.
Inhaltsangabe
[PART I: CONCEPTS] Chapter 1: Overview: Building Data Analytic Systems with Hadoop In this chapter we discuss what analytic systems using Hadoop are, why they are important, data sources which may be used, and applications which are --- and are not suitable for a distributed system approach using Hadoop. Subtopics: 1. Introduction: The Need for Distributed Analysis 2. How the Hadoop Ecosystem Implements Big Data Analysis 3. A Survey of the Hadoop Ecosystem 4. Architectures for Building 5. Summary Chapter 2: Programming Languages: A Scala and Python Refresher This chapter consists of a concise overview of the Scala and Python programming languages, and details why these languages are important ingredients of most modern Hadoop analytical systems. The chapter is primarily aimed at Java/C++ programmers who need a quick review/introduction to the Scala and Python programming languages.< Subtopics: 1. Motivation: Selecting the Right Language(s) Defines the Application 1. Review of Scala 2. Review of Python 3. Programming Applications and Examples 4. Summary Chapter 3: Necessary Ingredients: Standard Toolkits for Hadoop and Analytics In this chapter we describe an example system which we develop throughout the remainder of the book using standard toolkits from the Hadoop ecosystem, and other analytical toolkits in combination with development components such as Maven, openCV, Apache Mahout, and others to create a Hadoop-based system appropriate for a variety of applications. Subtopics: 1. Libraries, Components, and Toolkits: A Survey 2. Numerical and Statistical Libraries; R, Weka, and Others 3. Hadoop Toolkits for Analysis: Mahout and Friends 4. Apache Spark Libraries and Components: H20, Sparkling Water, and More 5. Examples of Use and System Building 6. Summary Chapter 4: Relational, noSQL, and Graph Databases In this chapter we describe relational databases, such as mysql, noSQL databases such as Cassandra, and graph databases such as neo4j, how to integrate them with the Hadoop ecosystem, and how to create customized data sources and sinks using Apache Camel. Subtopics: 1. Introduction to Databases: Relational, NoSQL, and Graph 2. Relational Data Sources 3. noSQL Data Sources: Cassandra 4. Gra ph Databases: Neo4j 5. Integrating Data with the Analytical Engine 6. Summary Chapter 5: Data Pipelines and How to Construct Them In this chapter we describe how to construct basic data pipelines using data sources and the Hadoop ecosystem. We provide an end-to-end example of how data sources may be linked and processed using Hadoop and other analytical components, and how this is s imilar to a standard ETL process. Subtopics: 1. The Basic Data Pipeline 2. Data Sources and Sinks 3. Computation and Transformation 4. Visualizing and Reporting the Results 5. Summary Chapter 6: Advanced Search Techniques with Hadoop, Lucene, and Solr In this chapter we describe the structure and use of the Lucene and Solr third-party search engine components, how to use them with Hadoop, and how to develop advanced search capability customized for an analytical application. Subtopics: 1. Introduction to Customized Search Engines 2. Distributed Search Techniques 3. Basic Examples: A Custom Search Component 4. Extended Examples: Scaling, Tuning, and Customizing the Search Component 5. Summary [ PART II: ARCHITECTURES AND ALGORITHMS] Chapter 7: An Overview of Analytical Techniques and Algorithms In this chapter, we provide an overview of four categories of algorithm: statistical, Bayesian, ontology-driven, and hybrid algorithms which leverage the more basic algorithms found in standard libraries to perform more in-depth and accurate analyses using Hadoop. Subtopics: 1. Survey of Algorithm Types 2. Statistical / Numerical Techniques 3. Bayesian Techniques 4. Ontology Driven Algorithms 5. Hybrid Algorithms: Combining Algorithm Types 6. Code Ex amples 7. Summary Chapter 8: Rule Engines, System Control, and System Orchestration In this chapter, we describe the D