23,99 €
inkl. MwSt.

Versandfertig in 6-10 Tagen
payback
12 °P sammeln
  • Broschiertes Buch

Data analysis and predictive analytics today are driven by large scale distributed deployments of complex pipelines, guiding data cleaning, model training and evaluation. In this work, we focus on the problem of modelling such a pipeline framework and providing algorithms that build on top of basic abstractions, fundamental to stream processing. We design a streaming machine learning pipeline as a series of stages such as model building, concept drift detection and continuous evaluation. We build our prototype on Apache Flink, a distributed data processing system with streaming capabilities…mehr

Produktbeschreibung
Data analysis and predictive analytics today are driven by large scale distributed deployments of complex pipelines, guiding data cleaning, model training and evaluation. In this work, we focus on the problem of modelling such a pipeline framework and providing algorithms that build on top of basic abstractions, fundamental to stream processing. We design a streaming machine learning pipeline as a series of stages such as model building, concept drift detection and continuous evaluation. We build our prototype on Apache Flink, a distributed data processing system with streaming capabilities along with a state-of-the-art implementation of a variation of Vertical Hoeffding Tree (VHT), a distributed decision tree classification algorithm as a proof of concept. Furthermore, we compare our version of VHT with the current state-of-the-art implementations on distributed data processing systems. Our experimental results on real-world data sets show significant performance benefits of ourpipeline while maintaining low classification error. We believe, that this pipeline framework can offer a good baseline for a full-fledged implementation of streaming algorithms which can work in parallel.
Autorenporträt
Foteini Beligianni finished her BA in Computer Engineering at University of Thessaly in Greece. She received a double MS degree in ICT Innovation with specialisation in Distributed Systems and Services, by studying one year in TU Berlin and one year in KTH Royal Institute of Technology, an MS program provided by EIT Digital Academy.