a)an emphasis on the substantive understanding of data that address the broad question of what is going on here?
b)an emphasis on graphic representations of data;
c) a focus on tentative model building and hypothesis generation in an iterative process of model specification, residual analysis, and model respecification;
d) use of robust measures, reexpression, and subset analysis; and
e) positions of skepticism, flexibility, and ecumenism regarding which methods to apply.
The goal of EDA is to discover patterns in data. It cannot be overemphasized that an appropriate technique for EDA is determined not by computation but rather by a procedure’s purpose and use.
1 Understand the Context理解上下文
This view holds that, in quantitative data analysis, numbers map onto aspects of reality. Numbers themselves are meaning- less unless the data analyst understands the mapping process and the nexus of theory and categorization in which objects under study are conceptualized. 定量数据分析，数字是对真相各个方面的映射。数字本身是无意义的，除非数据分析家理解对象的映射化的过程以及理论上的联结和当前研究对象的概念化后的类别。
2 Use Graphic Representations of Data
Graphical analysis is central to EDA. “the greatest value of a picture is when it forces us to notice what we never expected to see”
a. “stem-and-leaf plot” 数据量小的时候比较好用，数据量大的话不好看。 The stem-and-leaf plot shown in Figure 2 repre- sents a type of frequency table organized graphically to resemble a histogram while retaining information about the exact value of each observation。 When a large number of data points are examined, the stem- and-leaf plot may become cumbersome。
b. dot plot 查看单个分布或者对比分布
c.box-plot When seeking additional structure in univariate distributions or when a number of distributions need to be compared, a box plot is often used. A dot plot can be an effective tool to examine a single distribution or compare a number of distributions.
The box plot offers a five-number summary in schematic form. The ends of a box mark the first and third quartiles, and the median is indicated with a line positioned within the boxJ The ranges of most or all of the data in the tails of the distribution are marked using lines extend- ing away from the box, creating “whiskers” or “tails.”
d.核密度曲线Kernel density smoothers are graphic devices that provide estimates of a population shape,
A major component of the detective work of EDA is the rough assessment of hunches。
3 Develop Models in an Iterative Process of Tentative Model Specification and Residual Assessment
data = fit + residual data = smooth + rough. To create quantitative descriptions of data, the ex- ploratory data analyst conducts an iterative process of suggesting a tentative model, examining residuals from the model to assess model adequacy, and modi- fying the model in view of the residual analysis.
4 Building a Two-Way Fit
5 Data Analysis: A Picture is Worth a Thousand Word
Putting It All Together: A Reexamination of the Paap and Johansen Data A first look. boxplots，histograms, density plots, and dot plots scatter plot matrix
6 A Better Description.
A straightforward way to find an appropriate description for the curved function is to find a reexpression of the univariate distributions that leads them to a roughly Gaussian shape.
A choice of transformation is recommended by moving up or down the ladder in the direction of the bulk of the data on the scale. Positively skewed dis- tributions with the bulk of the data lower on the scale can be normalized by moving down the ladder of reexpression; distributions with the bulk of the data high on the scale can be normalized by moving up the ladder of reexpression.