Packt Publishing, 2013. — 339 p. — ISBN: 978-1-78328-099-5.
Transform, model, and visualize your data through hands-on projects, developed in open source tools.
Overview1) Explore how to analyze your data in various innovative ways and turn them into insight.
2) Learn to use the D3js visualization tool for exploratory data analysis.
3) Understand how to work with graphs and social data analysis.
4) Discover how to perform advanced query techniques and run MapReduce on MongoDB
In DetailPlenty of small businesses face big amounts of data but lack the internal skills to support quantitative analysis. Understanding how to harness the power of data analysis using the latest open source technology can lead them to providing better customer service, the visualization of customer needs, or even the ability to obtain fresh insights about the performance of previous products.
Practical Data Analysis is a book ideal for home and small business users who want to slice and dice the data they have on hand with minimum hassle.
Practical Data Analysis is a hands-on guide to understanding the nature of your data and turn it into insight. It will introduce you to the use of machine learning techniques, social networks analytics, and econometrics to help your clients get insights about the pool of data they have at hand. Performing data preparation and processing over several kinds of data such as text, images, graphs, documents, and time series will also be covered.
Practical Data Analysis presents a detailed exploration of the current work in data analysis through self-contained projects. First you will explore the basics of data preparation and transformation through OpenRefine. Then you will get started with exploratory data analysis using the D3js visualization framework. You will also be introduced to some of the machine learning techniques such as, classification, regression, and clusterization through practical projects such as spam classification, predicting gold prices, and finding clusters in your Facebook friends' network. You will learn how to solve problems in text classification, simulation, time series forecast, social media, and MapReduce through detailed projects. Finally you will work with large amounts of Twitter data using MapReduce to perform a sentiment analysis implemented in Python and MongoDB.
Practical Data Analysis contains a combination of carefully selected algorithms and data scrubbing that enables you to turn your data into insight.
What you will learn from this book1) Work with data to get meaningful results from your data analysis projects.
2) Visualize your data to find trends and correlations.
3) Build your own image similarity search engine.
4) Learn how to forecast numerical values from time series data.
5) Create an interactive visualization for your social media graph.
6) Explore the MapReduce framework in MongoDB.
7) Create interactive simulations with D3js.
ApproachPractical Data Analysis is a practical, step-by-step guide to empower small businesses to manage and analyze your data and extract valuable information from the data.
Who this book is written forThis book is for developers, small business users, and analysts who want to implement data analysis and visualization for their company in a practical way. You need no prior experience with data analysis or data processing; however, basic knowledge of programming, statistics, and linear algebra is assumed.
Getting Started.
Computer science, Artificial intelligence, Machine Learning, Statistics, Mathematics, Knowledge domain,
Data, information, and knowledge, The nature of data, The data analysis process,
Quantitative versus qualitative data analysis, Importance of data visualization, What about big data?
Working with Data.
Datasource, Data scrubbing, Data formats, Getting started with OpenRefine.
Data Visualization.
Data-Driven Documents, Getting started with D3js, Interaction and animation.
Text Classification.
Learning and classification, Bayesian classification, E-mail subject line tester, The algorithm, Classifier accuracy.
Similarity-based Image Retrieval.
Image similarity search, Dynamic time warping (DTW), Processing the image dataset, Implementing DTW, Analyzing the results.
Simulation of Stock Prices.
Financial time series, Random walk simulation, Monte Carlo methods, Generating random numbers, Implementation in D3js.
Predicting Gold Prices.
Working with the time series data, Smoothing the time series, The data – historical gold prices, Nonlinear regression.
Working with Support Vector Machines.
Understanding the multivariate dataset, Dimensionality reduction, Getting started with support vector machine.
Modeling Infectious Disease with Cellular Automata.
Introduction to epidemiology, The epidemic models, Modeling with cellular automata, Simulation of the SIRS model in CA with D3js.
Working with Social Graphs.
Structure of a graph, Social Networks Analysis, Acquiring my Facebook graph, Representing graphs with Gephi,
Statistical analysis, Degree distribution, Transforming GDF to JSON, Graph visualization with D3js.
Sentiment Analysis of Twitter Data.
The anatomy of Twitter data, Using OAuth to access Twitter API, Getting started with Twython,
Sentiment classification, Getting started with Natural Language Toolkit.
Data Processing and Aggregation with MongoDB.
Getting started with MongoDB, Data preparation, Group, The aggregation framework.
Working with MapReduce.
MapReduce overview, Programming model, Using MapReduce with MongoDB, Filtering the input collection,
Grouping and aggregation, Word cloud visualization of the most common positive words in tweets.
Online Data Analysis with IPython and Wakari.
Getting started with Wakari, Getting started with IPython Notebook, Getting started with Pandas,
Multiprocessing with IPython, Sharing your Notebook.
Appendix: Setting Up the Infrastructure.
Installing and running: Python 3, NumPy, SciPy, mlpy, OpenRefine, MongoDB, Umongo, Gephi.