Gabor Szabo, Gungor Polatkan, P. Oscar Boykin, Antonios Chalkiopoulos
Social Media Data Mining and Analytics (eBook, PDF)
Schade – dieser Artikel ist leider ausverkauft. Sobald wir wissen, ob und wann der Artikel wieder verfügbar ist, informieren wir Sie an dieser Stelle.
Gabor Szabo, Gungor Polatkan, P. Oscar Boykin, Antonios Chalkiopoulos
Social Media Data Mining and Analytics (eBook, PDF)
- Format: PDF
- Merkliste
- Auf die Merkliste
- Bewerten Bewerten
- Teilen
- Produkt teilen
- Produkterinnerung
- Produkterinnerung
Bitte loggen Sie sich zunächst in Ihr Kundenkonto ein oder registrieren Sie sich bei
bücher.de, um das eBook-Abo tolino select nutzen zu können.
Hier können Sie sich einloggen
Hier können Sie sich einloggen
Sie sind bereits eingeloggt. Klicken Sie auf 2. tolino select Abo, um fortzufahren.
Bitte loggen Sie sich zunächst in Ihr Kundenkonto ein oder registrieren Sie sich bei bücher.de, um das eBook-Abo tolino select nutzen zu können.
Harness the power of social media to predict customer behavior and improve sales Social media is the biggest source of Big Data. Because of this, 90% of Fortune 500 companies are investing in Big Data initiatives that will help them predict consumer behavior to produce better sales results. Social Media Data Mining and Analytics shows analysts how to use sophisticated techniques to mine social media data, obtaining the information they need to generate amazing results for their businesses. Social Media Data Mining and Analytics isn't just another book on the business case for social media.…mehr
- Geräte: PC
- eBook Hilfe
Harness the power of social media to predict customer behavior and improve sales Social media is the biggest source of Big Data. Because of this, 90% of Fortune 500 companies are investing in Big Data initiatives that will help them predict consumer behavior to produce better sales results. Social Media Data Mining and Analytics shows analysts how to use sophisticated techniques to mine social media data, obtaining the information they need to generate amazing results for their businesses. Social Media Data Mining and Analytics isn't just another book on the business case for social media. Rather, this book provides hands-on examples for applying state-of-the-art tools and technologies to mine social media - examples include Twitter, Wikipedia, Stack Exchange, LiveJournal, movie reviews, and other rich data sources. In it, you will learn: * The four key characteristics of online services-users, social networks, actions, and content * The full data discovery lifecycle-data extraction, storage, analysis, and visualization * How to work with code and extract data to create solutions * How to use Big Data to make accurate customer predictions * How to personalize the social media experience using machine learning Using the techniques the authors detail will provide organizations the competitive advantage they need to harness the rich data available from social media platforms.
Produktdetails
- Produktdetails
- Verlag: John Wiley & Sons
- Seitenzahl: 352
- Erscheinungstermin: 18. September 2018
- Englisch
- ISBN-13: 9781118824900
- Artikelnr.: 54194000
- Verlag: John Wiley & Sons
- Seitenzahl: 352
- Erscheinungstermin: 18. September 2018
- Englisch
- ISBN-13: 9781118824900
- Artikelnr.: 54194000
GABOR SZABO, PHD, is a Senior Staff Software Engineer at Tesla and a former data scientist at Twitter, where he focused on predicting user behavior and content popularity in crowdsourced online services, and on modeling large-scale content dynamics. He also authored the PyCascading data processing library. GUNGOR POLATKAN, PHD, is a Tech Lead/Engineering Manager designing and implementing end-to-end machine learning and artificial intelligence offline/online pipelines for the LinkedIn Learning relevance backend. He was previously a machine learning scientist at Twitter, where he worked on topics such as ad targeting and user modeling. P. OSCAR BOYKIN, PHD, is a software engineer at Stripe where he works on machine learning infrastructure. He was previously a Senior Staff Engineer at Twitter, where he worked on data infrastructure problems. He is coauthor of the Scala big-data libraries Algebird, Scalding and Summingbird. ANTONIOS CHALKIOPOULOS, MSC, is a Distributed Systems Specialist. A system engineer who has delivered fast/big data projects in media, betting, and finance, he is now leading the effort on the Lenses platform for data streaming as a co-founder and CEO at https://lenses.stream.
Introduction xvii Chapter 1 Users: TheWho of Social Media 1 Measuring
Variations in User Behavior in Wikipedia 2 The Diversity of User Activities
3 The Origin of the User Activity Distribution 12 The Consequences of the
Power Law 20 The Long Tail in Human Activities 25 Long Tails Everywhere:
The 80/20 Rule (p/q Rule) 28 Online Behavior on Twitter 32 Retrieving
Tweets for Users 33 Logarithmic Binning 36 User Activities on Twitter 37
Summary 39 Chapter 2 Networks: The How of Social Media 41 Types and
Properties of Social Networks 42 When Users Create the Connections:
Explicit Networks 43 Directed Versus Undirected Graphs 45 Node and Edge
Properties 45 Weighted Graphs 46 Creating Graphs from Activities: Implicit
Networks 48 Visualizing Networks 51 Degrees: The Winner Takes All 55
Counting the Number of Connections 57 The Long Tail in User Connections 58
Beyond the Idealized Network Model 62 Capturing Correlations: Triangles,
Clustering, and Assortativity 64 Local Triangles and Clustering 64
Assortativity 70 Summary 75 Chapter 3 Temporal Processes: The When of
Social Media 77 What Traditional Models Tell You About Events in Time 77
When Events Happen Uniformly in Time 79 Inter-Event Times 81 Comparing to a
Memoryless Process 86 Autocorrelations 89 Deviations from Memorylessness 91
Periodicities in Time in User Activities 93 Bursty Activities of
Individuals 99 Correlations and Bursts 105 Reservoir Sampling 106
Forecasting Metrics in Time 110 Finding Trends 112 Finding Seasonality 115
Forecasting Time Series with ARIMA 117 The Autoregressive Part ("AR") 118
The Moving Average Part ("MA") 119 The Full ARIMA(p, d, q) Model 119
Summary 121 Chapter 4 Content: The What of Social Media 123 Defining
Content: Focus on Text and Unstructured Data 123 Creating Features from
Text: The Basics of Natural Language Processing 125 The Basic Statistics of
Term Occurrences in Text 128 Using Content Features to Identify Topics 129
The Popularity of Topics 138 How Diverse Are Individual Users' Interests?
141 Extracting Low-Dimensional Information from High-Dimensional Text 144
Topic Modeling 145 Unsupervised Topic Modeling 147 Supervised Topic
Modeling 155 Relational Topic Modeling 162 Summary 169 Chapter 5 Processing
Large Datasets 171 Map Reduce: Structuring Parallel and Sequential
Operations 172 Counting Words 174 Skew: The Curse of the Last Reducer 177
Multi-Stage MapReduce Flows 179 Fan-Out 180 Merging Data Streams 181
Joining Two Data Sources 183 Joining Against Small Datasets 186 Models of
Large-Scale MapReduce 187 Patterns in MapReduce Programming 188 Static
MapReduce Jobs 188 Iterative MapReduce Jobs 195 PageRank for Ranking in
Graphs 195 K-means Clustering 199 Incremental MapReduce Jobs 203 Temporal
MapReduce Jobs 204 Rollups and Data Cubing 205 Expanding Rollup Jobs 211
Challenges with Processing Long-Tailed Social Media Data 212 Sampling and
Approximations: Getting Results with Less Computation 214 HyperLogLog 217
HyperLogLog Example 219 HyperLogLog on the Stack Exchange Dataset 221
Performance of HLL on Large Datasets 222 Bloom Filters 223 A Bloom Filter
Example 226 Bloom Filter as Pre-Computed Membership Knowledge 228 Bloom
Filters on Large Social Datasets 229 Count-Min Sketch 231 Count-Min
Sketch--Heavy Hitters Example 233 Count-Min Sketch--Top Percentage Example
235 Aggregating Approximate Data Structures 235 Summary of Approximations
236 Executing on a Hadoop Cluster (Amazon EC2) 237 Installing a CDH Cluster
on Amazon EC2 237 Providing IAM Access to Collaborators 241 Adding
On-Demand Cluster Capabilities 242 Summary 243 Chapter 6 Learn, Map, and
Recommend 245 Social Media Services Online 246 Search Engines 246 Content
Engagement 246 Interactions with the Real World 248 Interactions with
People 249 Problem Formulation 251 Learning and Mapping 253 Matrix
Factorization 255 Learning, Training 257 Under- and Overfitting 257
Regularizing in Matrix Factorization 259 Non-Negative Matrix Factorization
and Sparsity 260 Demonstration on Movie Ratings 261 Interpreting the
Learned Stereotypes 265 Exploratory Analysis 269 Prediction and
Recommendation 274 Evaluation 277 Overview of Methodologies 278 Nearest
Neighbor-Based Approaches 278 Approaches Based on Supervised Learning 280
Predicting Movie Ratings with Logistic Regression 280 Common Issues with
Features 288 Domain-Specific Applications 289 Summary 290 Chapter 7
Conclusions 293 The Surprising Stability of Human Interaction Patterns 293
Averages, Standard Deviations, and Sampling 296 Removing Outliers 303 Index
309
Variations in User Behavior in Wikipedia 2 The Diversity of User Activities
3 The Origin of the User Activity Distribution 12 The Consequences of the
Power Law 20 The Long Tail in Human Activities 25 Long Tails Everywhere:
The 80/20 Rule (p/q Rule) 28 Online Behavior on Twitter 32 Retrieving
Tweets for Users 33 Logarithmic Binning 36 User Activities on Twitter 37
Summary 39 Chapter 2 Networks: The How of Social Media 41 Types and
Properties of Social Networks 42 When Users Create the Connections:
Explicit Networks 43 Directed Versus Undirected Graphs 45 Node and Edge
Properties 45 Weighted Graphs 46 Creating Graphs from Activities: Implicit
Networks 48 Visualizing Networks 51 Degrees: The Winner Takes All 55
Counting the Number of Connections 57 The Long Tail in User Connections 58
Beyond the Idealized Network Model 62 Capturing Correlations: Triangles,
Clustering, and Assortativity 64 Local Triangles and Clustering 64
Assortativity 70 Summary 75 Chapter 3 Temporal Processes: The When of
Social Media 77 What Traditional Models Tell You About Events in Time 77
When Events Happen Uniformly in Time 79 Inter-Event Times 81 Comparing to a
Memoryless Process 86 Autocorrelations 89 Deviations from Memorylessness 91
Periodicities in Time in User Activities 93 Bursty Activities of
Individuals 99 Correlations and Bursts 105 Reservoir Sampling 106
Forecasting Metrics in Time 110 Finding Trends 112 Finding Seasonality 115
Forecasting Time Series with ARIMA 117 The Autoregressive Part ("AR") 118
The Moving Average Part ("MA") 119 The Full ARIMA(p, d, q) Model 119
Summary 121 Chapter 4 Content: The What of Social Media 123 Defining
Content: Focus on Text and Unstructured Data 123 Creating Features from
Text: The Basics of Natural Language Processing 125 The Basic Statistics of
Term Occurrences in Text 128 Using Content Features to Identify Topics 129
The Popularity of Topics 138 How Diverse Are Individual Users' Interests?
141 Extracting Low-Dimensional Information from High-Dimensional Text 144
Topic Modeling 145 Unsupervised Topic Modeling 147 Supervised Topic
Modeling 155 Relational Topic Modeling 162 Summary 169 Chapter 5 Processing
Large Datasets 171 Map Reduce: Structuring Parallel and Sequential
Operations 172 Counting Words 174 Skew: The Curse of the Last Reducer 177
Multi-Stage MapReduce Flows 179 Fan-Out 180 Merging Data Streams 181
Joining Two Data Sources 183 Joining Against Small Datasets 186 Models of
Large-Scale MapReduce 187 Patterns in MapReduce Programming 188 Static
MapReduce Jobs 188 Iterative MapReduce Jobs 195 PageRank for Ranking in
Graphs 195 K-means Clustering 199 Incremental MapReduce Jobs 203 Temporal
MapReduce Jobs 204 Rollups and Data Cubing 205 Expanding Rollup Jobs 211
Challenges with Processing Long-Tailed Social Media Data 212 Sampling and
Approximations: Getting Results with Less Computation 214 HyperLogLog 217
HyperLogLog Example 219 HyperLogLog on the Stack Exchange Dataset 221
Performance of HLL on Large Datasets 222 Bloom Filters 223 A Bloom Filter
Example 226 Bloom Filter as Pre-Computed Membership Knowledge 228 Bloom
Filters on Large Social Datasets 229 Count-Min Sketch 231 Count-Min
Sketch--Heavy Hitters Example 233 Count-Min Sketch--Top Percentage Example
235 Aggregating Approximate Data Structures 235 Summary of Approximations
236 Executing on a Hadoop Cluster (Amazon EC2) 237 Installing a CDH Cluster
on Amazon EC2 237 Providing IAM Access to Collaborators 241 Adding
On-Demand Cluster Capabilities 242 Summary 243 Chapter 6 Learn, Map, and
Recommend 245 Social Media Services Online 246 Search Engines 246 Content
Engagement 246 Interactions with the Real World 248 Interactions with
People 249 Problem Formulation 251 Learning and Mapping 253 Matrix
Factorization 255 Learning, Training 257 Under- and Overfitting 257
Regularizing in Matrix Factorization 259 Non-Negative Matrix Factorization
and Sparsity 260 Demonstration on Movie Ratings 261 Interpreting the
Learned Stereotypes 265 Exploratory Analysis 269 Prediction and
Recommendation 274 Evaluation 277 Overview of Methodologies 278 Nearest
Neighbor-Based Approaches 278 Approaches Based on Supervised Learning 280
Predicting Movie Ratings with Logistic Regression 280 Common Issues with
Features 288 Domain-Specific Applications 289 Summary 290 Chapter 7
Conclusions 293 The Surprising Stability of Human Interaction Patterns 293
Averages, Standard Deviations, and Sampling 296 Removing Outliers 303 Index
309
Introduction xvii Chapter 1 Users: TheWho of Social Media 1 Measuring
Variations in User Behavior in Wikipedia 2 The Diversity of User Activities
3 The Origin of the User Activity Distribution 12 The Consequences of the
Power Law 20 The Long Tail in Human Activities 25 Long Tails Everywhere:
The 80/20 Rule (p/q Rule) 28 Online Behavior on Twitter 32 Retrieving
Tweets for Users 33 Logarithmic Binning 36 User Activities on Twitter 37
Summary 39 Chapter 2 Networks: The How of Social Media 41 Types and
Properties of Social Networks 42 When Users Create the Connections:
Explicit Networks 43 Directed Versus Undirected Graphs 45 Node and Edge
Properties 45 Weighted Graphs 46 Creating Graphs from Activities: Implicit
Networks 48 Visualizing Networks 51 Degrees: The Winner Takes All 55
Counting the Number of Connections 57 The Long Tail in User Connections 58
Beyond the Idealized Network Model 62 Capturing Correlations: Triangles,
Clustering, and Assortativity 64 Local Triangles and Clustering 64
Assortativity 70 Summary 75 Chapter 3 Temporal Processes: The When of
Social Media 77 What Traditional Models Tell You About Events in Time 77
When Events Happen Uniformly in Time 79 Inter-Event Times 81 Comparing to a
Memoryless Process 86 Autocorrelations 89 Deviations from Memorylessness 91
Periodicities in Time in User Activities 93 Bursty Activities of
Individuals 99 Correlations and Bursts 105 Reservoir Sampling 106
Forecasting Metrics in Time 110 Finding Trends 112 Finding Seasonality 115
Forecasting Time Series with ARIMA 117 The Autoregressive Part ("AR") 118
The Moving Average Part ("MA") 119 The Full ARIMA(p, d, q) Model 119
Summary 121 Chapter 4 Content: The What of Social Media 123 Defining
Content: Focus on Text and Unstructured Data 123 Creating Features from
Text: The Basics of Natural Language Processing 125 The Basic Statistics of
Term Occurrences in Text 128 Using Content Features to Identify Topics 129
The Popularity of Topics 138 How Diverse Are Individual Users' Interests?
141 Extracting Low-Dimensional Information from High-Dimensional Text 144
Topic Modeling 145 Unsupervised Topic Modeling 147 Supervised Topic
Modeling 155 Relational Topic Modeling 162 Summary 169 Chapter 5 Processing
Large Datasets 171 Map Reduce: Structuring Parallel and Sequential
Operations 172 Counting Words 174 Skew: The Curse of the Last Reducer 177
Multi-Stage MapReduce Flows 179 Fan-Out 180 Merging Data Streams 181
Joining Two Data Sources 183 Joining Against Small Datasets 186 Models of
Large-Scale MapReduce 187 Patterns in MapReduce Programming 188 Static
MapReduce Jobs 188 Iterative MapReduce Jobs 195 PageRank for Ranking in
Graphs 195 K-means Clustering 199 Incremental MapReduce Jobs 203 Temporal
MapReduce Jobs 204 Rollups and Data Cubing 205 Expanding Rollup Jobs 211
Challenges with Processing Long-Tailed Social Media Data 212 Sampling and
Approximations: Getting Results with Less Computation 214 HyperLogLog 217
HyperLogLog Example 219 HyperLogLog on the Stack Exchange Dataset 221
Performance of HLL on Large Datasets 222 Bloom Filters 223 A Bloom Filter
Example 226 Bloom Filter as Pre-Computed Membership Knowledge 228 Bloom
Filters on Large Social Datasets 229 Count-Min Sketch 231 Count-Min
Sketch--Heavy Hitters Example 233 Count-Min Sketch--Top Percentage Example
235 Aggregating Approximate Data Structures 235 Summary of Approximations
236 Executing on a Hadoop Cluster (Amazon EC2) 237 Installing a CDH Cluster
on Amazon EC2 237 Providing IAM Access to Collaborators 241 Adding
On-Demand Cluster Capabilities 242 Summary 243 Chapter 6 Learn, Map, and
Recommend 245 Social Media Services Online 246 Search Engines 246 Content
Engagement 246 Interactions with the Real World 248 Interactions with
People 249 Problem Formulation 251 Learning and Mapping 253 Matrix
Factorization 255 Learning, Training 257 Under- and Overfitting 257
Regularizing in Matrix Factorization 259 Non-Negative Matrix Factorization
and Sparsity 260 Demonstration on Movie Ratings 261 Interpreting the
Learned Stereotypes 265 Exploratory Analysis 269 Prediction and
Recommendation 274 Evaluation 277 Overview of Methodologies 278 Nearest
Neighbor-Based Approaches 278 Approaches Based on Supervised Learning 280
Predicting Movie Ratings with Logistic Regression 280 Common Issues with
Features 288 Domain-Specific Applications 289 Summary 290 Chapter 7
Conclusions 293 The Surprising Stability of Human Interaction Patterns 293
Averages, Standard Deviations, and Sampling 296 Removing Outliers 303 Index
309
Variations in User Behavior in Wikipedia 2 The Diversity of User Activities
3 The Origin of the User Activity Distribution 12 The Consequences of the
Power Law 20 The Long Tail in Human Activities 25 Long Tails Everywhere:
The 80/20 Rule (p/q Rule) 28 Online Behavior on Twitter 32 Retrieving
Tweets for Users 33 Logarithmic Binning 36 User Activities on Twitter 37
Summary 39 Chapter 2 Networks: The How of Social Media 41 Types and
Properties of Social Networks 42 When Users Create the Connections:
Explicit Networks 43 Directed Versus Undirected Graphs 45 Node and Edge
Properties 45 Weighted Graphs 46 Creating Graphs from Activities: Implicit
Networks 48 Visualizing Networks 51 Degrees: The Winner Takes All 55
Counting the Number of Connections 57 The Long Tail in User Connections 58
Beyond the Idealized Network Model 62 Capturing Correlations: Triangles,
Clustering, and Assortativity 64 Local Triangles and Clustering 64
Assortativity 70 Summary 75 Chapter 3 Temporal Processes: The When of
Social Media 77 What Traditional Models Tell You About Events in Time 77
When Events Happen Uniformly in Time 79 Inter-Event Times 81 Comparing to a
Memoryless Process 86 Autocorrelations 89 Deviations from Memorylessness 91
Periodicities in Time in User Activities 93 Bursty Activities of
Individuals 99 Correlations and Bursts 105 Reservoir Sampling 106
Forecasting Metrics in Time 110 Finding Trends 112 Finding Seasonality 115
Forecasting Time Series with ARIMA 117 The Autoregressive Part ("AR") 118
The Moving Average Part ("MA") 119 The Full ARIMA(p, d, q) Model 119
Summary 121 Chapter 4 Content: The What of Social Media 123 Defining
Content: Focus on Text and Unstructured Data 123 Creating Features from
Text: The Basics of Natural Language Processing 125 The Basic Statistics of
Term Occurrences in Text 128 Using Content Features to Identify Topics 129
The Popularity of Topics 138 How Diverse Are Individual Users' Interests?
141 Extracting Low-Dimensional Information from High-Dimensional Text 144
Topic Modeling 145 Unsupervised Topic Modeling 147 Supervised Topic
Modeling 155 Relational Topic Modeling 162 Summary 169 Chapter 5 Processing
Large Datasets 171 Map Reduce: Structuring Parallel and Sequential
Operations 172 Counting Words 174 Skew: The Curse of the Last Reducer 177
Multi-Stage MapReduce Flows 179 Fan-Out 180 Merging Data Streams 181
Joining Two Data Sources 183 Joining Against Small Datasets 186 Models of
Large-Scale MapReduce 187 Patterns in MapReduce Programming 188 Static
MapReduce Jobs 188 Iterative MapReduce Jobs 195 PageRank for Ranking in
Graphs 195 K-means Clustering 199 Incremental MapReduce Jobs 203 Temporal
MapReduce Jobs 204 Rollups and Data Cubing 205 Expanding Rollup Jobs 211
Challenges with Processing Long-Tailed Social Media Data 212 Sampling and
Approximations: Getting Results with Less Computation 214 HyperLogLog 217
HyperLogLog Example 219 HyperLogLog on the Stack Exchange Dataset 221
Performance of HLL on Large Datasets 222 Bloom Filters 223 A Bloom Filter
Example 226 Bloom Filter as Pre-Computed Membership Knowledge 228 Bloom
Filters on Large Social Datasets 229 Count-Min Sketch 231 Count-Min
Sketch--Heavy Hitters Example 233 Count-Min Sketch--Top Percentage Example
235 Aggregating Approximate Data Structures 235 Summary of Approximations
236 Executing on a Hadoop Cluster (Amazon EC2) 237 Installing a CDH Cluster
on Amazon EC2 237 Providing IAM Access to Collaborators 241 Adding
On-Demand Cluster Capabilities 242 Summary 243 Chapter 6 Learn, Map, and
Recommend 245 Social Media Services Online 246 Search Engines 246 Content
Engagement 246 Interactions with the Real World 248 Interactions with
People 249 Problem Formulation 251 Learning and Mapping 253 Matrix
Factorization 255 Learning, Training 257 Under- and Overfitting 257
Regularizing in Matrix Factorization 259 Non-Negative Matrix Factorization
and Sparsity 260 Demonstration on Movie Ratings 261 Interpreting the
Learned Stereotypes 265 Exploratory Analysis 269 Prediction and
Recommendation 274 Evaluation 277 Overview of Methodologies 278 Nearest
Neighbor-Based Approaches 278 Approaches Based on Supervised Learning 280
Predicting Movie Ratings with Logistic Regression 280 Common Issues with
Features 288 Domain-Specific Applications 289 Summary 290 Chapter 7
Conclusions 293 The Surprising Stability of Human Interaction Patterns 293
Averages, Standard Deviations, and Sampling 296 Removing Outliers 303 Index
309