<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>R | Little World</title>
    <link>/categories/r/</link>
      <atom:link href="/categories/r/index.xml" rel="self" type="application/rss+xml" />
    <description>R</description>
    <generator>Source Themes Academic (https://sourcethemes.com/academic/)</generator><language>en-us</language><copyright>©Yihong WANG 2020</copyright><lastBuildDate>Mon, 20 Jan 2020 00:00:00 +0000</lastBuildDate>
    <image>
      <url>/img/icon-192.png</url>
      <title>R</title>
      <link>/categories/r/</link>
    </image>
    
    <item>
      <title>用R取代Stata与SAS</title>
      <link>/post/2020-01-20-r-stata-workflow/</link>
      <pubDate>Mon, 20 Jan 2020 00:00:00 +0000</pubDate>
      <guid>/post/2020-01-20-r-stata-workflow/</guid>
      <description>
&lt;script src=&#34;../../rmarkdown-libs/jquery/jquery.min.js&#34;&gt;&lt;/script&gt;
&lt;script src=&#34;../../rmarkdown-libs/elevate-section-attrs/elevate-section-attrs.js&#34;&gt;&lt;/script&gt;

&lt;div id=&#34;TOC&#34;&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#安装stata&#34;&gt;安装Stata&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#在r中调用stata&#34;&gt;在R中调用Stata&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#三种环境下数据互通&#34;&gt;三种环境下数据互通&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;

&lt;div id=&#34;安装stata&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;安装Stata&lt;/h2&gt;
&lt;p&gt;首先安装&lt;code&gt;ncurses5-compat-libs&lt;/code&gt;和&lt;code&gt;libpng12&lt;/code&gt;这两个包，其次&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;% sudo -s

cd /tmp/

mkdir statafiles

cd statafiles

tar -zxf /home/you/Downloads/Stata14Linux64.tar.gz

cd /usr/local

mkdir stata14

cd stata14

/tmp/statafiles/install&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;安完之后把安装目录加到环境变量中去。我选择编辑&lt;code&gt;/etc/profile&lt;/code&gt;加入：&lt;/p&gt;
&lt;pre class=&#34;bash&#34;&gt;&lt;code&gt;export PATH=&amp;quot;$PATH:/usr/local/stata14&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;若想不重启就生效可以&lt;code&gt;source /etc/profile&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Lic文件可以直接COPY到安装目录，或者在目录中放&lt;code&gt;stata.lic.tar.gz&lt;/code&gt;。&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;在r中调用stata&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;在R中调用Stata&lt;/h2&gt;
&lt;p&gt;通过&lt;a href=&#34;https://github.com/lbraglia/RStata&#34;&gt;&lt;code&gt;RStata&lt;/code&gt;&lt;/a&gt;实现&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#run Stata in R----
library(&amp;quot;RStata&amp;quot;)
options(&amp;quot;RStata.StataPath&amp;quot; = &amp;quot;D:\\Stata15\\StataSE-64&amp;quot;) #office
options(&amp;quot;RStata.StataPath&amp;quot; = &amp;quot;/usr/local/stata14/stata&amp;quot;) #linux #cannot use stata-se?
options(&amp;quot;RStata.StataVersion&amp;quot; = 14)&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;三种环境下数据互通&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;三种环境下数据互通&lt;/h2&gt;
&lt;p&gt;R下通过两个包&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(haven) #nead read_dta to read dta
library(rio) # rio::import to read sas data
#haven::read_sas can also import sas7bdat
f1 &amp;lt;- str_c(data_loc,&amp;quot;after2007.sas7bdat&amp;quot;,sep = &amp;quot;/&amp;quot;) 
o1 &amp;lt;- str_c(data_loc,&amp;quot;after2007.dta&amp;quot;,sep = &amp;quot;/&amp;quot;) 
after2007_raw &amp;lt;-  import(f1)
after2007 %&amp;gt;% 
  mutate_if(is.numeric, as.integer) %&amp;gt;% 
  write_dta(.,o1, version = 12)
# Because sas only supports Stata 12 files (or earlier) while haven supports stata versions 8-15.&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;如以上方法都无法顺利读入sas7bdat，用SAS中转&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#import stata data file, only supports 12 or earlier
PROC IMPORT OUT= WORK.S1 
            DATAFILE= &amp;quot;E:\after2007.dta&amp;quot; 
            DBMS=STATA REPLACE;
RUN;

proc export data=raw1 outfile= &amp;quot;D:\sample.dta&amp;quot; replace;
run;&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Data Vis Chapter 8</title>
      <link>/post/data-vis-chapter-8/</link>
      <pubDate>Wed, 09 Oct 2019 00:00:00 +0000</pubDate>
      <guid>/post/data-vis-chapter-8/</guid>
      <description>

&lt;div id=&#34;TOC&#34;&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#use-color-palette&#34;&gt;Use Color Palette&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#layer-color-and-text-together&#34;&gt;Layer Color and Text Together&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#themes&#34;&gt;Themes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#use-theme-elements&#34;&gt;Use Theme Elements&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#two-y-axes&#34;&gt;Two y-axes&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;

&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;head(asasec)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##                                Section         Sname Beginning Revenues
## 1      Aging and the Life Course (018)         Aging     12752    12104
## 2     Alcohol, Drugs and Tobacco (030) Alcohol/Drugs     11933     1144
## 3 Altruism and Social Solidarity (047)      Altruism      1139     1862
## 4            Animals and Society (042)       Animals       473      820
## 5             Asia/Asian America (024)          Asia      9056     2116
## 6            Body and Embodiment (048)          Body      3408     1618
##   Expenses Ending Journal Year Members
## 1    12007  12849      No 2005     598
## 2      400  12677      No 2005     301
## 3     1875   1126      No 2005      NA
## 4     1116    177      No 2005     209
## 5     1710   9462      No 2005     365
## 6     1920   3106      No 2005      NA&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;-
  ggplot(
    data = subset(asasec, Year == 2014),
    mapping = aes(x = Members,
                  y = Revenues, label = Sname)
  )

p + geom_point() + geom_smooth()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-10-09-data-vis-chapter-8_files/figure-html/unnamed-chunk-2-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;-
  ggplot(
    data = subset(asasec, Year == 2014),
    mapping = aes(x = Members,
                  y = Revenues, label = Sname)
  )

p + geom_point(mapping = aes(color = Journal)) + geom_smooth(method = &amp;quot;lm&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-10-09-data-vis-chapter-8_files/figure-html/unnamed-chunk-3-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p0 &amp;lt;-
  ggplot(
    data = subset(asasec, Year == 2014),
    mapping = aes(x = Members,
                  y = Revenues, label = Sname)
  )

p1 &amp;lt;-
  p0 + geom_smooth(method = &amp;quot;lm&amp;quot;, se = FALSE, color = &amp;quot;gray80&amp;quot;) +
  geom_point(mapping = aes(color = Journal))
library(ggrepel)
p2 &amp;lt;- p1 + geom_text_repel(data = subset(asasec, Year == 2014 &amp;amp;
                                           Revenues &amp;gt; 7000),
                           size = 2)&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p3 &amp;lt;- p2 + labs(
  x = &amp;quot;Membership&amp;quot;,
  y = &amp;quot;Revenues&amp;quot;,
  color = &amp;quot;Section has own Journal&amp;quot;,
  title = &amp;quot;ASA Sections&amp;quot;,
  subtitle = &amp;quot;2014 Calendar year.&amp;quot;,
  caption = &amp;quot;Source: ASA annual report.&amp;quot;
)
p4 &amp;lt;- p3 + scale_y_continuous(labels = scales::dollar) +
  theme(legend.position = &amp;quot;bottom&amp;quot;)
p4&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-10-09-data-vis-chapter-8_files/figure-html/unnamed-chunk-5-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;div id=&#34;use-color-palette&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Use Color Palette&lt;/h2&gt;
&lt;p&gt;Use the &lt;code&gt;RColorBrewer&lt;/code&gt; package. Access the colors by specifying the &lt;code&gt;scale_color_brewer()&lt;/code&gt; or &lt;code&gt;scale_ﬁll_brewer()&lt;/code&gt; functions, depending on the aesthetic you are mapping.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = organdata,
            mapping = aes(x = roads, y = donors,
                          color = world))
p + geom_point(size = 2) + scale_color_brewer(palette = &amp;quot;Set2&amp;quot;) +
  theme(legend.position = &amp;quot;top&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-10-09-data-vis-chapter-8_files/figure-html/unnamed-chunk-6-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p + geom_point(size = 2) + scale_color_brewer(palette = &amp;quot;Pastel2&amp;quot;) +
  theme(legend.position = &amp;quot;top&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-10-09-data-vis-chapter-8_files/figure-html/unnamed-chunk-6-2.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p + geom_point(size = 2) + scale_color_brewer(palette = &amp;quot;Dark2&amp;quot;) +
  theme(legend.position = &amp;quot;top&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-10-09-data-vis-chapter-8_files/figure-html/unnamed-chunk-6-3.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Specify colors manually, via &lt;code&gt;scale_color_manual()&lt;/code&gt; or &lt;code&gt;scale_fill_manual()&lt;/code&gt;. Try &lt;code&gt;demo(&#39;color&#39;)&lt;/code&gt; to see the color names in R.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;cb_palette &amp;lt;-
  c(
    &amp;quot;#999999&amp;quot;,
    &amp;quot;#E69F00&amp;quot;,
    &amp;quot;#56B4E9&amp;quot;,
    &amp;quot;#009E73&amp;quot;,
    &amp;quot;#F0E442&amp;quot;,
    &amp;quot;#0072B2&amp;quot;,
    &amp;quot;#D55E00&amp;quot;,
    &amp;quot;#CC79A7&amp;quot;
  )

p4 + scale_color_manual(values = cb_palette)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-10-09-data-vis-chapter-8_files/figure-html/unnamed-chunk-7-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(dichromat)
library(RColorBrewer)

Default &amp;lt;- brewer.pal(5, &amp;quot;Set2&amp;quot;)

types &amp;lt;- c(&amp;quot;deutan&amp;quot;, &amp;quot;protan&amp;quot;, &amp;quot;tritan&amp;quot;)
names(types) &amp;lt;- c(&amp;quot;Deuteronopia&amp;quot;, &amp;quot;Protanopia&amp;quot;, &amp;quot;Tritanopia&amp;quot;)

color_table &amp;lt;- types %&amp;gt;% purrr::map(~ dichromat(Default, .x)) %&amp;gt;%
  as_tibble() %&amp;gt;% add_column(Default, .before = TRUE)

color_table&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 5 x 4
##   Default Deuteronopia Protanopia Tritanopia
##   &amp;lt;chr&amp;gt;   &amp;lt;chr&amp;gt;        &amp;lt;chr&amp;gt;      &amp;lt;chr&amp;gt;     
## 1 #66C2A5 #AEAEA7      #BABAA5    #82BDBD   
## 2 #FC8D62 #B6B661      #9E9E63    #F29494   
## 3 #8DA0CB #9C9CCB      #9E9ECB    #92ABAB   
## 4 #E78AC3 #ACACC1      #9898C3    #DA9C9C   
## 5 #A6D854 #CACA5E      #D3D355    #B6C8C8&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;layer-color-and-text-together&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Layer Color and Text Together&lt;/h2&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# Democrat Blue and Republican Red party_colors ← c(&amp;quot;#2E74C0&amp;quot;, &amp;quot;#CB454A&amp;quot;)
p0 &amp;lt;- ggplot(
  data = subset(county_data, flipped == &amp;quot;No&amp;quot;),
  mapping = aes(x = pop, y = black / 100)
)
p1 &amp;lt;-
  p0 + geom_point(alpha = 0.15, color = &amp;quot;gray50&amp;quot;) + scale_x_log10(labels =
                                                                    scales::comma)
p1&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-10-09-data-vis-chapter-8_files/figure-html/unnamed-chunk-9-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;party_colors &amp;lt;- c(&amp;quot;#2E74C0&amp;quot;, &amp;quot;#CB454A&amp;quot;)
p2 &amp;lt;- p1 + geom_point(
  data = subset(county_data, flipped == &amp;quot;Yes&amp;quot;),
  mapping = aes(x = pop, y = black / 100, color = partywinner16)
) +
  scale_color_manual(values = party_colors) 
p2&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-10-09-data-vis-chapter-8_files/figure-html/unnamed-chunk-10-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p3 &amp;lt;-
  p2 + scale_y_continuous(labels = scales::percent) + labs(
    color = &amp;quot;County flipped to ... &amp;quot;,
    x = &amp;quot;County Population (log scale)&amp;quot;,
    y = &amp;quot;Percent Black Population&amp;quot;,
    title = &amp;quot;Flipped counties, 2016&amp;quot;,
    caption = &amp;quot;Counties in gray did not flip.&amp;quot;
  )
p3&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-10-09-data-vis-chapter-8_files/figure-html/unnamed-chunk-11-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p4 &amp;lt;-
  p3 + geom_text_repel(
    data = subset(county_data, flipped == &amp;quot;Yes&amp;quot; &amp;amp; black &amp;gt; 25),
    mapping = aes(x = pop, y = black / 100, label = state),
    size = 2
  )
p4 + theme_minimal() + theme(legend.position = &amp;quot;top&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-10-09-data-vis-chapter-8_files/figure-html/unnamed-chunk-12-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;themes&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Themes&lt;/h2&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;theme_set(theme_bw()) 
p4 + theme(legend.position = &amp;quot;top&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-10-09-data-vis-chapter-8_files/figure-html/unnamed-chunk-13-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;theme_set(theme_dark()) 
p4 + theme(legend.position = &amp;quot;top&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-10-09-data-vis-chapter-8_files/figure-html/unnamed-chunk-13-2.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p4 + theme_gray()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-10-09-data-vis-chapter-8_files/figure-html/unnamed-chunk-14-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(ggthemes)
theme_set(theme_economist())
p4 + theme(legend.position = &amp;quot;top&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-10-09-data-vis-chapter-8_files/figure-html/unnamed-chunk-15-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;theme_set(theme_wsj())
p4 + theme(
  plot.title = element_text(size = rel(0.6)),
  legend.title = element_text(size = rel(0.35)),
  plot.caption = element_text(size = rel(0.35)),
  legend.position = &amp;quot;top&amp;quot;
)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-10-09-data-vis-chapter-8_files/figure-html/unnamed-chunk-15-2.png&#34; width=&#34;672&#34; /&gt;
Claus O. Wilke’s &lt;a href=&#34;https://wilkelab.org/cowplot/articles/introduction.html&#34;&gt;&lt;code&gt;cowplot&lt;/code&gt; package&lt;/a&gt;, contains a well-developed theme suitable for figures whose final destination is a journal article. BobRudis’s &lt;a href=&#34;https://github.com/hrbrmstr/hrbrthemes&#34;&gt;&lt;code&gt;hrbrthemes&lt;/code&gt; package&lt;/a&gt;, has a distinctive and compact look and feel that takes advantage of some freely available typefaces.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(hrbrthemes)
theme_set(theme_ipsum())
p4 + theme(legend.position = &amp;quot;top&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-10-09-data-vis-chapter-8_files/figure-html/unnamed-chunk-16-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p4 + theme(
  legend.position = &amp;quot;top&amp;quot;,
  plot.title = element_text(
    size = rel(2),
    lineheight = .5,
    family = &amp;quot;Times&amp;quot;,
    face = &amp;quot;bold.italic&amp;quot;,
    colour = &amp;quot;orange&amp;quot;
  ),
  axis.text.x = element_text(
    size = rel(1.1),
    family = &amp;quot;Courier&amp;quot;,
    face = &amp;quot;bold&amp;quot;,
    color = &amp;quot;purple&amp;quot;
  )
)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-10-09-data-vis-chapter-8_files/figure-html/unnamed-chunk-16-2.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;use-theme-elements&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Use Theme Elements&lt;/h2&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;yrs &amp;lt;- c(seq(1972, 1988, 4), 1993, seq(1996, 2016, 4))
mean_age &amp;lt;-
  gss_lon %&amp;gt;% filter(age %nin% NA &amp;amp;&amp;amp;
                       year %in% yrs) %&amp;gt;% group_by(year) %&amp;gt;% summarize(xbar = round(mean(age, na.rm = TRUE), 0))
mean_age$y &amp;lt;- 0.3
yr_labs &amp;lt;- data.frame(x = 85, y = 0.8, year = yrs)&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;-
  ggplot(data = subset(gss_lon, year %in% yrs),
         mapping = aes(x = age))
p1 &amp;lt;-
  p + geom_density(
    fill = &amp;quot;gray20&amp;quot;,
    color = FALSE,
    alpha = 0.9,
    mapping = aes(y = ..scaled..)
  ) +
  geom_vline(
    data = subset(mean_age, year %in% yrs),
    aes(xintercept = xbar),
    color = &amp;quot;white&amp;quot;,
    size = 0.5
  ) +
  geom_text(
    data = subset(mean_age, year %in% yrs),
    aes(x = xbar, y = y, label = xbar),
    nudge_x = 7.5,
    color = &amp;quot;white&amp;quot;,
    size = 3.5,
    hjust = 1
  ) +
  geom_text(data = subset(yr_labs, year %in% yrs), aes(x = x, y = y, label = year)) +
  facet_grid(year ~ ., switch = &amp;quot;y&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p1 + 
  theme(
    plot.title = element_text(size = 16),
    axis.text.x = element_text(size = 12),
    axis.title.y = element_blank(),
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank(),
    strip.background = element_blank(),
    strip.text.y = element_blank(),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank()
  ) +
  labs(x = &amp;quot;Age&amp;quot;, y = NULL, title = &amp;quot;Age Distribution of\nGSS Respondents&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-10-09-data-vis-chapter-8_files/figure-html/unnamed-chunk-19-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(ggridges)
p &amp;lt;-
  ggplot(data = gss_lon, mapping = aes(x = age, y = factor(
    year, levels = rev(unique(year)), ordered = TRUE
  )))
p + geom_density_ridges(alpha = 0.6,
                        fill = &amp;quot;lightblue&amp;quot;,
                        scale = 1.5) + scale_x_continuous(breaks = c(25, 50, 75)) + scale_y_discrete(expand = c(0.01, 0)) + labs(x = &amp;quot;Age&amp;quot;, y = NULL, title = &amp;quot;Age Distribution of\nGSS Respondents&amp;quot;) +
  theme_ridges() + theme(title = element_text(size = 16, face = &amp;quot;bold&amp;quot;))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-10-09-data-vis-chapter-8_files/figure-html/unnamed-chunk-20-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;two-y-axes&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Two y-axes&lt;/h2&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;head(fredts)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##         date  sp500 monbase  sp500_i monbase_i
## 1 2009-03-11 696.68 1542228 100.0000  100.0000
## 2 2009-03-18 766.73 1693133 110.0548  109.7849
## 3 2009-03-25 799.10 1693133 114.7012  109.7849
## 4 2009-04-01 809.06 1733017 116.1308  112.3710
## 5 2009-04-08 830.61 1733017 119.2240  112.3710
## 6 2009-04-15 852.21 1789878 122.3245  116.0579&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;fredts_m &amp;lt;-
  fredts %&amp;gt;% select(date, sp500_i, monbase_i) %&amp;gt;% gather(key = series, value = score, sp500_i:monbase_i)
head(fredts_m)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##         date  series    score
## 1 2009-03-11 sp500_i 100.0000
## 2 2009-03-18 sp500_i 110.0548
## 3 2009-03-25 sp500_i 114.7012
## 4 2009-04-01 sp500_i 116.1308
## 5 2009-04-08 sp500_i 119.2240
## 6 2009-04-15 sp500_i 122.3245&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;-
  ggplot(data = fredts_m,
         mapping = aes(
           x = date,
           y = score,
           group = series,
           color = series
         ))
p1 &amp;lt;-
  p + geom_line() + theme(legend.position = &amp;quot;top&amp;quot;) + labs(x = &amp;quot;Date&amp;quot;, y = &amp;quot;Index&amp;quot;, color = &amp;quot;Series&amp;quot;)
p &amp;lt;-
  ggplot(data = fredts,
         mapping = aes(x = date, y = sp500_i - monbase_i))
p2 &amp;lt;- p + geom_line() + labs(x = &amp;quot;Date&amp;quot;, y = &amp;quot;Difference&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;cowplot::plot_grid(p1, p2, nrow = 2, rel_heights = c(0.75, 0.25), align = &amp;quot;v&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-10-09-data-vis-chapter-8_files/figure-html/unnamed-chunk-24-1.png&#34; width=&#34;672&#34; /&gt;
Using two y-axes gives you an extra degree of freedom to mess about with the data that, in most cases, you really should not take advantage of.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = yahoo, mapping = aes(x = Employees, y = Revenue))
p + geom_path(color = &amp;quot;gray80&amp;quot;) + geom_text(aes(color = Mayer, label = Year),
                                            size = 3,
                                            fontface = &amp;quot;bold&amp;quot;) +
  theme(legend.position = &amp;quot;bottom&amp;quot;) + labs(
    color = &amp;quot;Mayer is CEO&amp;quot;,
    x = &amp;quot;Employees&amp;quot;,
    y = &amp;quot;Revenue (Millions)&amp;quot;,
    title = &amp;quot;Yahoo Employees vs Revenues, 2004-2014&amp;quot;
  ) + scale_y_continuous(labels = scales::dollar) + scale_x_continuous(labels = scales::comma)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-10-09-data-vis-chapter-8_files/figure-html/unnamed-chunk-25-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;-
  ggplot(data = yahoo,
         mapping = aes(x = Year, y = Revenue / Employees))
p + geom_vline(xintercept = 2012) + geom_line(color = &amp;quot;gray60&amp;quot;, size = 2) + annotate(
  &amp;quot;text&amp;quot;,
  x = 2013,
  y = 0.44,
  label = &amp;quot; Mayer becomes CEO&amp;quot;,
  size = 2.5
) +
  labs(x = &amp;quot;Year\n&amp;quot;, y = &amp;quot;Revenue/Employees&amp;quot;, title = &amp;quot;Yahoo Revenue to Employee Ratio, 2004-2014&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-10-09-data-vis-chapter-8_files/figure-html/unnamed-chunk-26-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Saying no to pie&lt;/strong&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p_xlab &amp;lt;-
  &amp;quot;Amount Owed, in thousands of Dollars&amp;quot; 
p_title &amp;lt;- &amp;quot;Outstanding Student Loans&amp;quot; 
p_subtitle &amp;lt;- &amp;quot;44 million borrowers owe a total of $1.3 trillion&amp;quot; 
p_caption &amp;lt;- &amp;quot;Source: FRB NY&amp;quot;
f_labs &amp;lt;-
  c(`Borrowers` = &amp;quot;Percent of\nall Borrowers&amp;quot;, `Balances` = &amp;quot;Percent of\nall Balances&amp;quot;)
p &amp;lt;-
  ggplot(data = studebt,
         mapping = aes(x = Debt, y = pct / 100, fill = type))
p + geom_bar(stat = &amp;quot;identity&amp;quot;) + scale_fill_brewer(type = &amp;quot;qual&amp;quot;, palette = &amp;quot;Dark2&amp;quot;) + scale_y_continuous(labels = scales::percent) + guides(fill = FALSE) + theme(strip.text.x = element_text(face = &amp;quot;bold&amp;quot;)) + labs(
  y = NULL,
  x = p_xlab,
  caption = p_caption,
  title = p_title,
  subtitle = p_subtitle
) + facet_grid( ~ type, labeller = as_labeller(f_labs)) + coord_flip()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-10-09-data-vis-chapter-8_files/figure-html/unnamed-chunk-27-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(viridis)
p &amp;lt;-
  ggplot(studebt, aes(y = pct / 100, x = type, fill = Debtrc)) 
p + geom_bar(stat = &amp;quot;identity&amp;quot;, color = &amp;quot;gray80&amp;quot;) + scale_x_discrete(labels = as_labeller(f_labs)) + scale_y_continuous(labels = scales::percent) + scale_fill_viridis(discrete = TRUE) + guides(
    fill = guide_legend(
      reverse = TRUE,
      title.position = &amp;quot;top&amp;quot;,
      label.position = &amp;quot;bottom&amp;quot;,
      keywidth = 3,
      nrow = 1
    )
  ) +
  labs(
    x = NULL,
    y = NULL,
    fill = &amp;quot;Amount Owed, in thousands of dollars&amp;quot;,
    caption = p_caption,
    title = p_title,
    subtitle = p_subtitle
  ) +
  theme(
    legend.position = &amp;quot;top&amp;quot;,
    axis.text.y = element_text(face = &amp;quot;bold&amp;quot;, hjust = 1, size = 12),
    axis.ticks.length = unit(0, &amp;quot;cm&amp;quot;),
    panel.grid.major.y = element_blank()
  ) +
  coord_flip()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-10-09-data-vis-chapter-8_files/figure-html/unnamed-chunk-28-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://r-graph-gallery.com/&#34; class=&#34;uri&#34;&gt;http://r-graph-gallery.com/&lt;/a&gt; for more examples&lt;/p&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Data Vis Chapter 6</title>
      <link>/post/data-vis-chapter-6/</link>
      <pubDate>Thu, 26 Sep 2019 00:00:00 +0000</pubDate>
      <guid>/post/data-vis-chapter-6/</guid>
      <description>

&lt;div id=&#34;TOC&#34;&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#show-several-fits-at-once-with-a-legend&#34;&gt;Show Several Fits at Once, with a Legend&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#model-based-graphics&#34;&gt;Model-based Graphics&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#tidy-model-objects-with-broom&#34;&gt;Tidy Model Objects with Broom&lt;/a&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#get-component-level-statistics-with-tidy&#34;&gt;get component-level statistics with tidy()&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#get-observation-level-statistics-with-augment&#34;&gt;Get observation-level statistics with augment()&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#grouped-analysis&#34;&gt;Grouped Analysis&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#plots-for-surveys&#34;&gt;Plots for Surveys&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;

&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;-  ggplot(data = gapminder,
             mapping = aes(x = log(gdpPercap), y = lifeExp))

p + geom_point(alpha = 0.1) +
  geom_smooth(color = &amp;quot;tomato&amp;quot;,
              fill = &amp;quot;tomato&amp;quot;,
              method = MASS::rlm) + #robust regression line
  geom_smooth(color = &amp;quot;steelblue&amp;quot;,
              fill = &amp;quot;steelblue&amp;quot;,
              method = &amp;quot;lm&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-data-vis-chapter-6_files/figure-html/unnamed-chunk-1-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p + geom_point(alpha = 0.1) +
  geom_smooth(
    color = &amp;quot;tomato&amp;quot;,
    method = &amp;quot;lm&amp;quot;,
    size = 1.2,
    formula = y ~ splines::bs(x, 3),
    se = FALSE
  )&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-data-vis-chapter-6_files/figure-html/unnamed-chunk-1-2.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p + geom_point(alpha = 0.1) +
  geom_quantile( # specialized version of geom)smooth that can fit quantile regression
    color = &amp;quot;tomato&amp;quot;,
    size = 1.2,
    method = &amp;quot;rqss&amp;quot;,
    lambda = 1,
    quantiles = c(0.20, 0.5, 0.85)
  )&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Smoothing formula not specified. Using: y ~ qss(x, lambda = 1)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-data-vis-chapter-6_files/figure-html/unnamed-chunk-1-3.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;div id=&#34;show-several-fits-at-once-with-a-legend&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Show Several Fits at Once, with a Legend&lt;/h2&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;model_colors &amp;lt;- RColorBrewer::brewer.pal(3, &amp;quot;Set1&amp;quot;)
model_colors&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] &amp;quot;#E41A1C&amp;quot; &amp;quot;#377EB8&amp;quot; &amp;quot;#4DAF4A&amp;quot;&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p0 &amp;lt;- ggplot(data = gapminder,
             mapping = aes(x = log(gdpPercap), y = lifeExp))

p1 &amp;lt;- p0 + geom_point(alpha = 0.2) +
  geom_smooth(method = &amp;quot;lm&amp;quot;, aes(color = &amp;quot;OLS&amp;quot;, fill = &amp;quot;OLS&amp;quot;)) +
  geom_smooth(
    method = &amp;quot;lm&amp;quot;,
    formula = y ~ splines::bs(x, df = 3),
    aes(color = &amp;quot;Cubic Spline&amp;quot;, fill = &amp;quot;Cubic Spline&amp;quot;)
  ) +
  geom_smooth(method = &amp;quot;loess&amp;quot;,
              aes(color = &amp;quot;LOESS&amp;quot;, fill = &amp;quot;LOESS&amp;quot;))

p1 + scale_color_manual(name = &amp;quot;Models&amp;quot;, values = model_colors) +
  scale_fill_manual(name = &amp;quot;Models&amp;quot;, values = model_colors) +
  theme(legend.position = &amp;quot;top&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-data-vis-chapter-6_files/figure-html/unnamed-chunk-2-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;model-based-graphics&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Model-based Graphics&lt;/h2&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;min_gdp &amp;lt;- min(gapminder$gdpPercap)
max_gdp &amp;lt;- max(gapminder$gdpPercap)
med_pop &amp;lt;- median(gapminder$pop)

pred_df &amp;lt;- expand.grid(gdpPercap = (seq(from = min_gdp, to = max_gdp,
length.out = 100)), pop = med_pop, continent = c(&amp;quot;Africa&amp;quot;,
&amp;quot;Americas&amp;quot;, &amp;quot;Asia&amp;quot;, &amp;quot;Europe&amp;quot;, &amp;quot;Oceania&amp;quot;))

dim(pred_df)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 500   3&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;out &amp;lt;- lm(formula = lifeExp ~ gdpPercap + pop + continent, data = gapminder)

pred_out &amp;lt;- predict(object = out, newdata = pred_df, interval = &amp;quot;predict&amp;quot;)
pred_df &amp;lt;- cbind(pred_df, pred_out)&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;-
  ggplot(
    data = subset(pred_df, continent %in% c(&amp;quot;Europe&amp;quot;, &amp;quot;Africa&amp;quot;)),
    aes(
      x = gdpPercap,
      y = fit,
      ymin = lwr,
      ymax = upr,
      color = continent,
      fill = continent,
      group = continent
    )
  )

p + geom_point(
  data = subset(gapminder,
                continent %in% c(&amp;quot;Europe&amp;quot;, &amp;quot;Africa&amp;quot;)),
  aes(x = gdpPercap, y = lifeExp,
      color = continent),
  alpha = 0.5,
  inherit.aes = FALSE
) +
  geom_line() +
  geom_ribbon(alpha = 0.2, color = FALSE) +
  scale_x_log10(labels = scales::dollar)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-data-vis-chapter-6_files/figure-html/unnamed-chunk-4-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;tidy-model-objects-with-broom&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Tidy Model Objects with Broom&lt;/h2&gt;
&lt;div id=&#34;get-component-level-statistics-with-tidy&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;get component-level statistics with tidy()&lt;/h3&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(broom)
out_comp &amp;lt;- tidy(out)
out_comp %&amp;gt;% round_df()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 7 x 5
##   term              estimate std.error statistic p.value
##   &amp;lt;chr&amp;gt;                &amp;lt;dbl&amp;gt;     &amp;lt;dbl&amp;gt;     &amp;lt;dbl&amp;gt;   &amp;lt;dbl&amp;gt;
## 1 (Intercept)          47.8      0.34     141.         0
## 2 gdpPercap             0        0         19.2        0
## 3 pop                   0        0          3.33       0
## 4 continentAmericas    13.5      0.6       22.5        0
## 5 continentAsia         8.19     0.570     14.3        0
## 6 continentEurope      17.5      0.62      28.0        0
## 7 continentOceania     18.1      1.78      10.2        0&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;“not in” &lt;code&gt;%nin%&lt;/code&gt; is availabe via &lt;code&gt;socviz&lt;/code&gt;.
&lt;code&gt;prefix_strip&lt;/code&gt; from &lt;code&gt;socviz&lt;/code&gt; drops prefixes&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#confidence interval
out_conf &amp;lt;- tidy(out, conf.int = TRUE)
out_conf &amp;lt;- subset(out_conf, term %nin% &amp;quot;(Intercept)&amp;quot;)
out_conf$nicelabs &amp;lt;- prefix_strip(out_conf$term, &amp;quot;continent&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(out_conf,
            mapping = aes(
              x = reorder(nicelabs, estimate),
              y = estimate,
              ymin = conf.low,
              ymax = conf.high
            ))
p + geom_pointrange() + coord_flip() + labs(x = &amp;quot;&amp;quot;, y = &amp;quot;OLS Estimate&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-data-vis-chapter-6_files/figure-html/unnamed-chunk-7-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;get-observation-level-statistics-with-augment&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Get observation-level statistics with augment()&lt;/h3&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;out_aug &amp;lt;- augment(out)
p &amp;lt;- ggplot(data = out_aug, mapping = aes(x = .fitted, y = .resid))
p + geom_point()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-data-vis-chapter-6_files/figure-html/unnamed-chunk-8-1.png&#34; width=&#34;672&#34; /&gt;
### Get model-level statistics with glance()
Broom is able to &lt;code&gt;tidy&lt;/code&gt; (and &lt;code&gt;augment&lt;/code&gt;, and &lt;code&gt;glance&lt;/code&gt; at) a wide range of model types.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(survival)

out_cph &amp;lt;- coxph(Surv(time, status) ~ age + sex, data = lung)
out_surv &amp;lt;- survfit(out_cph)
out_tidy &amp;lt;- tidy(out_surv)
p &amp;lt;- ggplot(data = out_tidy, mapping = aes(time, estimate))
p + geom_line() + geom_ribbon(mapping = aes(ymin = conf.low,
                                            ymax = conf.high),
                              alpha = 0.2)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-data-vis-chapter-6_files/figure-html/unnamed-chunk-9-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;grouped-analysis&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Grouped Analysis&lt;/h2&gt;
&lt;p&gt;&lt;code&gt;nest&lt;/code&gt; and &lt;code&gt;unnest&lt;/code&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;out_le &amp;lt;- gapminder %&amp;gt;%
  group_by(continent, year) %&amp;gt;%
  nest()&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;fit_ols &amp;lt;- function(df) {
  lm(lifeExp ~ log(gdpPercap), data = df)
}

out_le &amp;lt;- gapminder %&amp;gt;%
  group_by(continent, year) %&amp;gt;%
  nest() %&amp;gt;%
  mutate(model = map(data, fit_ols))



out_tidy &amp;lt;- gapminder %&amp;gt;%
  group_by(continent, year) %&amp;gt;%
  nest() %&amp;gt;%
  mutate(model = map(data, fit_ols),
         tidied = map(model, tidy)) %&amp;gt;%
  unnest(tidied, .drop = TRUE) %&amp;gt;%
  filter(term %nin% &amp;quot;(Intercept)&amp;quot; &amp;amp;
           continent %nin% &amp;quot;Oceania&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: The `.drop` argument of `unnest()` is deprecated as of tidyr 1.0.0.
## All list-columns are now preserved.
## This warning is displayed once per session.
## Call `lifecycle::last_warnings()` to see where this warning was generated.&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(
  data = out_tidy,
  mapping = aes(
    x = year,
    y = estimate,
    ymin = estimate - 2 * std.error,
    ymax = estimate + 2 * std.error,
    group = continent,
    color = continent
  )
)

p + geom_pointrange(position = position_dodge(width = 1)) +
  scale_x_continuous(breaks = unique(gapminder$year)) +
  theme(legend.position = &amp;quot;top&amp;quot;) +
  labs(x = &amp;quot;Year&amp;quot;, y = &amp;quot;Estimate&amp;quot;, color = &amp;quot;Continent&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-data-vis-chapter-6_files/figure-html/unnamed-chunk-11-1.png&#34; width=&#34;672&#34; /&gt;
## Plot Marginal Effects&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(margins)
gss_sm$polviews_m &amp;lt;- relevel(gss_sm$polviews, ref = &amp;quot;Moderate&amp;quot;)
out_bo &amp;lt;- glm(obama ~ polviews_m + sex * race,
              family = &amp;quot;binomial&amp;quot;,
              data = gss_sm)
summary(out_bo)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
## Call:
## glm(formula = obama ~ polviews_m + sex * race, family = &amp;quot;binomial&amp;quot;, 
##     data = gss_sm)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.9045  -0.5541   0.1772   0.5418   2.2437  
## 
## Coefficients:
##                                   Estimate Std. Error z value Pr(&amp;gt;|z|)    
## (Intercept)                       0.296493   0.134091   2.211  0.02703 *  
## polviews_mExtremely Liberal       2.372950   0.525045   4.520 6.20e-06 ***
## polviews_mLiberal                 2.600031   0.356666   7.290 3.10e-13 ***
## polviews_mSlightly Liberal        1.293172   0.248435   5.205 1.94e-07 ***
## polviews_mSlightly Conservative  -1.355277   0.181291  -7.476 7.68e-14 ***
## polviews_mConservative           -2.347463   0.200384 -11.715  &amp;lt; 2e-16 ***
## polviews_mExtremely Conservative -2.727384   0.387210  -7.044 1.87e-12 ***
## sexFemale                         0.254866   0.145370   1.753  0.07956 .  
## raceBlack                         3.849526   0.501319   7.679 1.61e-14 ***
## raceOther                        -0.002143   0.435763  -0.005  0.99608    
## sexFemale:raceBlack              -0.197506   0.660066  -0.299  0.76477    
## sexFemale:raceOther               1.574829   0.587657   2.680  0.00737 ** 
## ---
## Signif. codes:  0 &amp;#39;***&amp;#39; 0.001 &amp;#39;**&amp;#39; 0.01 &amp;#39;*&amp;#39; 0.05 &amp;#39;.&amp;#39; 0.1 &amp;#39; &amp;#39; 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2247.9  on 1697  degrees of freedom
## Residual deviance: 1345.9  on 1686  degrees of freedom
##   (1169 observations deleted due to missingness)
## AIC: 1369.9
## 
## Number of Fisher Scoring iterations: 6&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;bo_m &amp;lt;- margins(out_bo)
summary(bo_m)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##                            factor     AME     SE        z      p   lower
##            polviews_mConservative -0.4119 0.0283 -14.5394 0.0000 -0.4674
##  polviews_mExtremely Conservative -0.4538 0.0420 -10.7971 0.0000 -0.5361
##       polviews_mExtremely Liberal  0.2681 0.0295   9.0996 0.0000  0.2103
##                 polviews_mLiberal  0.2768 0.0229  12.0736 0.0000  0.2319
##   polviews_mSlightly Conservative -0.2658 0.0330  -8.0596 0.0000 -0.3304
##        polviews_mSlightly Liberal  0.1933 0.0303   6.3896 0.0000  0.1340
##                         raceBlack  0.4032 0.0173  23.3568 0.0000  0.3694
##                         raceOther  0.1247 0.0386   3.2297 0.0012  0.0490
##                         sexFemale  0.0443 0.0177   2.5073 0.0122  0.0097
##    upper
##  -0.3564
##  -0.3714
##   0.3258
##   0.3218
##  -0.2011
##   0.2526
##   0.4371
##   0.2005
##   0.0789&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The margins library comes with several plot methods of its own. If you wish, at this point you can just try &lt;code&gt;plot(bo_m)&lt;/code&gt; to see a plot of the average marginal effects, produced with the general look of a Stata graphic. Other plot methods in the margins
library include &lt;code&gt;cplot()&lt;/code&gt;, which visualizes marginal effects conditional on a second variable, and &lt;code&gt;image()&lt;/code&gt;, which shows predictions or marginal effects as a filled heatmap or contour plot.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;bo_gg &amp;lt;- as_tibble(summary(bo_m))
prefixes &amp;lt;- c(&amp;quot;polviews_m&amp;quot;, &amp;quot;sex&amp;quot;)
bo_gg$factor &amp;lt;- prefix_strip(bo_gg$factor, prefixes)
bo_gg$factor &amp;lt;- prefix_replace(bo_gg$factor, &amp;quot;race&amp;quot;, &amp;quot;Race: &amp;quot;)

bo_gg %&amp;gt;% select(factor, AME, lower, upper)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 9 x 4
##   factor                     AME    lower   upper
##   &amp;lt;chr&amp;gt;                    &amp;lt;dbl&amp;gt;    &amp;lt;dbl&amp;gt;   &amp;lt;dbl&amp;gt;
## 1 Conservative           -0.412  -0.467   -0.356 
## 2 Extremely Conservative -0.454  -0.536   -0.371 
## 3 Extremely Liberal       0.268   0.210    0.326 
## 4 Liberal                 0.277   0.232    0.322 
## 5 Slightly Conservative  -0.266  -0.330   -0.201 
## 6 Slightly Liberal        0.193   0.134    0.253 
## 7 Race: Black             0.403   0.369    0.437 
## 8 Race: Other             0.125   0.0490   0.200 
## 9 Female                  0.0443  0.00967  0.0789&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = bo_gg, aes(
  x = reorder(factor, AME),
  y = AME,
  ymin = lower,
  ymax = upper
))

p + geom_hline(yintercept = 0, color = &amp;quot;gray80&amp;quot;) +
  geom_pointrange() + coord_flip() +
  labs(x = NULL, y = &amp;quot;Average Marginal Effect&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-data-vis-chapter-6_files/figure-html/unnamed-chunk-13-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;pv_cp &amp;lt;- cplot(out_bo, x = &amp;quot;sex&amp;quot;, draw = FALSE)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##    xvals     yvals     upper     lower
## 1   Male 0.5735849 0.6378653 0.5093045
## 2 Female 0.6344507 0.6887845 0.5801169&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = pv_cp, aes(
  x = reorder(xvals, yvals),
  y = yvals,
  ymin = lower,
  ymax = upper
))

p + geom_hline(yintercept = 0, color = &amp;quot;gray80&amp;quot;) +
  geom_pointrange() + coord_flip() +
  labs(x = NULL, y = &amp;quot;Conditional Effect&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-data-vis-chapter-6_files/figure-html/unnamed-chunk-14-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;plots-for-surveys&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Plots for Surveys&lt;/h2&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(survey)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Loading required package: grid&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Loading required package: Matrix&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
## Attaching package: &amp;#39;Matrix&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## The following objects are masked from &amp;#39;package:tidyr&amp;#39;:
## 
##     expand, pack, unpack&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
## Attaching package: &amp;#39;survey&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## The following object is masked from &amp;#39;package:graphics&amp;#39;:
## 
##     dotchart&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(srvyr)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
## Attaching package: &amp;#39;srvyr&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## The following object is masked from &amp;#39;package:stats&amp;#39;:
## 
##     filter&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;options(survey.lonely.psu = &amp;quot;adjust&amp;quot;)
options(na.action = &amp;quot;na.pass&amp;quot;)

gss_wt &amp;lt;- subset(gss_lon, year &amp;gt; 1974) %&amp;gt;%
  mutate(stratvar = interaction(year, vstrat)) %&amp;gt;%
  as_survey_design(
    ids = vpsu,
    strata = stratvar,
    weights = wtssall,
    nest = TRUE
  )

out_grp &amp;lt;- gss_wt %&amp;gt;%
  filter(year %in% seq(1976, 2016, by = 4)) %&amp;gt;%
  group_by(year, race, degree) %&amp;gt;%
  summarize(prop = survey_mean(na.rm = TRUE)) # calculate  properly weighted survey means&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: Factor `degree` contains implicit NA, consider using
## `forcats::fct_explicit_na`&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;out_mrg &amp;lt;- gss_wt %&amp;gt;%
  filter(year %in% seq(1976, 2016, by = 4)) %&amp;gt;%
  mutate(racedeg = interaction(race, degree)) %&amp;gt;%
  group_by(year, racedeg) %&amp;gt;%
  summarize(prop = survey_mean(na.rm = TRUE))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: Factor `racedeg` contains implicit NA, consider using
## `forcats::fct_explicit_na`&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;out_mrg &amp;lt;- gss_wt %&amp;gt;%  filter(year %in% seq(1976, 2016, by = 4)) %&amp;gt;%
  mutate(racedeg = interaction(race, degree)) %&amp;gt;% group_by(year,
                                                           racedeg) %&amp;gt;% 
  summarize(prop = survey_mean(na.rm = TRUE)) %&amp;gt;%
  separate(racedeg, sep = &amp;quot;\\.&amp;quot;, into = c(&amp;quot;race&amp;quot;, &amp;quot;degree&amp;quot;)) &lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: Factor `racedeg` contains implicit NA, consider using
## `forcats::fct_explicit_na`&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(
  data = subset(out_grp, race %nin% &amp;quot;Other&amp;quot;),
  mapping = aes(
    x = degree,
    y = prop,
    ymin = prop - 2 * prop_se,
    ymax = prop + 2 * prop_se,
    fill = race,
    color = race,
    group = race
  )
)

dodge &amp;lt;- position_dodge(width = 0.9)

p + geom_col(position = dodge, alpha = 0.2) +
  geom_errorbar(position = dodge, width = 0.2) +
  scale_x_discrete(labels = scales::wrap_format(10)) +
  scale_y_continuous(labels = scales::percent) +
  scale_color_brewer(type = &amp;quot;qual&amp;quot;, palette = &amp;quot;Dark2&amp;quot;) +
  scale_fill_brewer(type = &amp;quot;qual&amp;quot;, palette = &amp;quot;Dark2&amp;quot;) +
  labs(
    title = &amp;quot;Educational Attainment by Race&amp;quot;,
    subtitle = &amp;quot;GSS 1976-2016&amp;quot;,
    fill = &amp;quot;Race&amp;quot;,
    color = &amp;quot;Race&amp;quot;,
    x = NULL,
    y = &amp;quot;Percent&amp;quot;
  ) +
  facet_wrap( ~ year, ncol = 2) +
  theme(legend.position = &amp;quot;top&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: Removed 13 rows containing missing values (geom_col).&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: Removed 13 rows containing missing values (geom_errorbar).&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-data-vis-chapter-6_files/figure-html/unnamed-chunk-15-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(
  data = subset(out_grp, race %nin% &amp;quot;Other&amp;quot;),
  mapping = aes(
    x = year,
    y = prop,
    ymin = prop - 2 * prop_se,
    ymax = prop + 2 * prop_se,
    fill = race,
    color = race,
    group = race
  )
)

p + geom_ribbon(alpha = 0.3, aes(color = NULL)) + #Use ribbon to show the error range
  geom_line() + #Use line to show a time trend
  facet_wrap( ~ degree, ncol = 1) +
  scale_y_continuous(labels = scales::percent) +
  scale_color_brewer(type = &amp;quot;qual&amp;quot;, palette = &amp;quot;Dark2&amp;quot;) +
  scale_fill_brewer(type = &amp;quot;qual&amp;quot;, palette = &amp;quot;Dark2&amp;quot;) +
  labs(
    title = &amp;quot;Educational Attainment by Race&amp;quot;,
    subtitle = &amp;quot;GSS 1976-2016&amp;quot;,
    fill = &amp;quot;Race&amp;quot;,
    color = &amp;quot;Race&amp;quot;,
    x = NULL,
    y = &amp;quot;Percent&amp;quot;
  ) +
  theme(legend.position = &amp;quot;top&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: Removed 13 rows containing missing values (geom_path).&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-data-vis-chapter-6_files/figure-html/unnamed-chunk-16-1.png&#34; width=&#34;672&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Other useful packages: &lt;code&gt;infer&lt;/code&gt;, &lt;code&gt;ggally&lt;/code&gt;&lt;/p&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Data Visualization Chapter 2-4</title>
      <link>/post/test/</link>
      <pubDate>Thu, 26 Sep 2019 00:00:00 +0000</pubDate>
      <guid>/post/test/</guid>
      <description>

&lt;div id=&#34;TOC&#34;&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#chapter-2&#34;&gt;Chapter 2&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#chapter-3&#34;&gt;Chapter 3&lt;/a&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#wrong-way-to-set-color&#34;&gt;Wrong way to set color&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#aesthetics-can-be-mapped-per-geom&#34;&gt;Aesthetics Can Be Mapped per Geom&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#save-plots&#34;&gt;Save plots&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#chapter-4&#34;&gt;Chapter 4&lt;/a&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#group-data-and-the-group-aesthetic&#34;&gt;Group data and the “Group” Aesthetic&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#facet-to-make-small-multiples&#34;&gt;Facet to make small multiples&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#geoms-can-transform-data&#34;&gt;Geoms can transform data&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#histgrams-and-density-plots&#34;&gt;Histgrams and Density Plots&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#avoid-transformations-when-necessary&#34;&gt;Avoid Transformations When Necessary&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;

&lt;div id=&#34;chapter-2&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Chapter 2&lt;/h2&gt;
&lt;p&gt;&lt;code&gt;geom_point&lt;/code&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = gapminder,
            mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-Data-Visualization-Chapter-2-4_files/figure-html/cars-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;chapter-3&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Chapter 3&lt;/h2&gt;
&lt;p&gt;&lt;code&gt;geom_smooth&lt;/code&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;## `geom_smooth()` using method = &amp;#39;gam&amp;#39; and formula &amp;#39;y ~ s(x, bs = &amp;quot;cs&amp;quot;)&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-Data-Visualization-Chapter-2-4_files/figure-html/pressure-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point() + geom_smooth()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## `geom_smooth()` using method = &amp;#39;gam&amp;#39; and formula &amp;#39;y ~ s(x, bs = &amp;quot;cs&amp;quot;)&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-Data-Visualization-Chapter-2-4_files/figure-html/unnamed-chunk-1-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;scale_x_log10&lt;/code&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point() + geom_smooth(method = &amp;quot;gam&amp;quot;) + scale_x_log10()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-Data-Visualization-Chapter-2-4_files/figure-html/unnamed-chunk-2-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;scales::dollar&lt;/code&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point() +
geom_smooth(method = &amp;quot;gam&amp;quot;) +
scale_x_log10(labels = scales::dollar)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-Data-Visualization-Chapter-2-4_files/figure-html/unnamed-chunk-3-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;div id=&#34;wrong-way-to-set-color&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Wrong way to set color&lt;/h3&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp,
color = &amp;quot;purple&amp;quot;))
p + geom_point() + geom_smooth(method = &amp;quot;loess&amp;quot;) + scale_x_log10()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-Data-Visualization-Chapter-2-4_files/figure-html/unnamed-chunk-4-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;aes()&lt;/code&gt; function is for mappings only. Do not use it to change properties to a particular value. If we want to set a property, we do it in the geom_ we are using, and outside the &lt;code&gt;mapping =aes(...)&lt;/code&gt;step.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point(color = &amp;quot;purple&amp;quot;) + geom_smooth(method = &amp;quot;loess&amp;quot;) + scale_x_log10()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-Data-Visualization-Chapter-2-4_files/figure-html/unnamed-chunk-5-1.png&#34; width=&#34;768&#34; /&gt;
The various &lt;code&gt;geom_&lt;/code&gt; functions can take many other arguments that will affect how the plot looks but do not involve mapping variables to aesthetic elements.
“alpha” is an aesthetic property that points (and some other plot elements) have, and to which variables can be mapped. It controls how transparent the object will appear when drawn. It’s measured on a scale of zero to one.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point(alpha = 0.3) + geom_smooth(color = &amp;quot;orange&amp;quot;, se = FALSE,
                                          size = 8, method = &amp;quot;lm&amp;quot;) + scale_x_log10()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-Data-Visualization-Chapter-2-4_files/figure-html/unnamed-chunk-6-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y=lifeExp))
p + geom_point(alpha = 0.3) +
  geom_smooth(method = &amp;quot;gam&amp;quot;) +
  scale_x_log10(labels = scales::dollar) +
  labs(x = &amp;quot;GDP Per Capita&amp;quot;, y = &amp;quot;Life Expectancy in Years&amp;quot;,
       title = &amp;quot;Economic Growth and Life Expectancy&amp;quot;,
       subtitle = &amp;quot;Data points are country-years&amp;quot;,
       caption = &amp;quot;Source: Gapminder.&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-Data-Visualization-Chapter-2-4_files/figure-html/unnamed-chunk-7-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp,
                                            color = continent))
p + geom_point() + geom_smooth(method = &amp;quot;loess&amp;quot;) + scale_x_log10()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-Data-Visualization-Chapter-2-4_files/figure-html/unnamed-chunk-8-1.png&#34; width=&#34;768&#34; /&gt;
The color of the standard error ribbon is controlled by the fill aesthetic.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp,
                                            color = continent, fill = continent))
p + geom_point() + geom_smooth(method = &amp;quot;loess&amp;quot;) + scale_x_log10()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-Data-Visualization-Chapter-2-4_files/figure-html/unnamed-chunk-9-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;aesthetics-can-be-mapped-per-geom&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Aesthetics Can Be Mapped per Geom&lt;/h3&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point(mapping = aes(color = factor(year))) + 
  geom_smooth(method = &amp;quot;loess&amp;quot;) +
  scale_x_log10()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-Data-Visualization-Chapter-2-4_files/figure-html/unnamed-chunk-10-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Order doesn’t matter!!!
Besides &lt;code&gt;scale_x_log10()&lt;/code&gt;, you can try &lt;code&gt;scale_x_sqrt()&lt;/code&gt; and &lt;code&gt;scale_x_reverse()&lt;/code&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = gapminder, mapping = aes(x = pop, y = lifeExp))
p + geom_smooth(method = &amp;quot;loess&amp;quot;) + 
  geom_point(mapping = aes(color = continent)) + 
  scale_x_reverse(labels = scales::number)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-Data-Visualization-Chapter-2-4_files/figure-html/unnamed-chunk-11-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point(mapping = aes(color = log(pop))) + scale_x_log10()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-Data-Visualization-Chapter-2-4_files/figure-html/unnamed-chunk-12-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;save-plots&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Save plots&lt;/h3&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p_out &amp;lt;-  p + geom_point() + geom_smooth(method = &amp;quot;loess&amp;quot;) + scale_x_log10()
ggsave(&amp;quot;my_figure.pdf&amp;quot;, plot = p_out)&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;chapter-4&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Chapter 4&lt;/h2&gt;
&lt;div id=&#34;group-data-and-the-group-aesthetic&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Group data and the “Group” Aesthetic&lt;/h3&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = gapminder, mapping = aes(x = year, y = gdpPercap))
p + geom_line()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-Data-Visualization-Chapter-2-4_files/figure-html/unnamed-chunk-14-1.png&#34; width=&#34;768&#34; /&gt;
use the &lt;code&gt;group&lt;/code&gt; aesthetic to tell ggplot explicitly about this country-level structure&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = gapminder, mapping = aes(x = year, y = gdpPercap))
p + geom_line(aes(group = country))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-Data-Visualization-Chapter-2-4_files/figure-html/unnamed-chunk-15-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;facet-to-make-small-multiples&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Facet to make small multiples&lt;/h3&gt;
&lt;p&gt;use &lt;code&gt;facet_wrap()&lt;/code&gt; to split our plot by &lt;code&gt;continent&lt;/code&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = gapminder, mapping = aes(x = year, y = gdpPercap))
p + geom_line(aes(group = country)) + facet_wrap(~continent)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-Data-Visualization-Chapter-2-4_files/figure-html/unnamed-chunk-16-1.png&#34; width=&#34;768&#34; /&gt;
Add another enhancements&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = gapminder, mapping = aes(x = year, y = gdpPercap))
p + geom_line(color=&amp;quot;gray70&amp;quot;, aes(group = country)) + 
  geom_smooth(size= 1.1, method = &amp;quot;loess&amp;quot;, se = FALSE) +
  scale_y_log10(labels=scales::dollar) +
  facet_wrap(~continent , ncol = 5) +
  labs(x = &amp;quot;Year&amp;quot;,
       y = &amp;quot;GDP per capita on Five Continents&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-Data-Visualization-Chapter-2-4_files/figure-html/unnamed-chunk-17-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;p&gt;Use &lt;code&gt;facet_grid&lt;/code&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = gss_sm, mapping = aes(x = age, y = childs))
p + geom_point(alpha = 0.2) +
  geom_smooth() + 
  facet_grid(sex ~ race)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## `geom_smooth()` using method = &amp;#39;gam&amp;#39; and formula &amp;#39;y ~ s(x, bs = &amp;quot;cs&amp;quot;)&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: Removed 18 rows containing non-finite values (stat_smooth).&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: Removed 18 rows containing missing values (geom_point).&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-Data-Visualization-Chapter-2-4_files/figure-html/unnamed-chunk-18-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = gss_sm, mapping = aes(x = age, y = childs))
p + geom_point(alpha = 0.2) +
  geom_smooth() + 
  facet_grid(sex ~ race + degree)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## `geom_smooth()` using method = &amp;#39;loess&amp;#39; and formula &amp;#39;y ~ x&amp;#39;&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: Removed 18 rows containing non-finite values (stat_smooth).&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : span too small. fewer data values than degrees of freedom.&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at 62.87&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 2.13&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 0&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 582.26&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : span too small.
## fewer data values than degrees of freedom.&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used
## at 62.87&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius
## 2.13&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : reciprocal
## condition number 0&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : There are other
## near singularities as well. 582.26&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: Removed 18 rows containing missing values (geom_point).&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-Data-Visualization-Chapter-2-4_files/figure-html/unnamed-chunk-19-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;geoms-can-transform-data&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Geoms can transform data&lt;/h3&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = gss_sm, mapping = aes(x = bigregion))
p + geom_bar()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-Data-Visualization-Chapter-2-4_files/figure-html/unnamed-chunk-20-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;geom_bar&lt;/code&gt; called the default &lt;code&gt;stat_&lt;/code&gt; function associated with it,&lt;code&gt;stat_count()&lt;/code&gt;.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = gss_sm, mapping = aes(x = bigregion))
p + geom_bar(mapping = aes(y = ..prop..))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-Data-Visualization-Chapter-2-4_files/figure-html/unnamed-chunk-21-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = gss_sm, mapping = aes(x = bigregion))
p + geom_bar(mapping = aes(y = ..prop.., group = 1))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-Data-Visualization-Chapter-2-4_files/figure-html/unnamed-chunk-22-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;table(gss_sm$religion)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
## Protestant   Catholic     Jewish       None      Other 
##       1371        649         51        619        159&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = gss_sm, mapping = aes(x = religion, color = religion))
p + geom_bar()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-Data-Visualization-Chapter-2-4_files/figure-html/unnamed-chunk-24-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = gss_sm, mapping = aes(x = religion, fill = religion))
p + geom_bar() + guides(fill = FALSE)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-Data-Visualization-Chapter-2-4_files/figure-html/unnamed-chunk-24-2.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p + geom_bar()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-Data-Visualization-Chapter-2-4_files/figure-html/unnamed-chunk-24-3.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = gss_sm, mapping = aes(x = bigregion, fill = religion))
p + geom_bar()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-Data-Visualization-Chapter-2-4_files/figure-html/unnamed-chunk-25-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = gss_sm, mapping = aes(x = bigregion, fill = religion))
p + geom_bar(position = &amp;quot;fill&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-Data-Visualization-Chapter-2-4_files/figure-html/unnamed-chunk-26-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;p&gt;if you want separate bars&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = gss_sm, mapping = aes(x = bigregion, fill = religion))
p + geom_bar(position = &amp;quot;dodge&amp;quot;, mapping = aes(y = ..prop..,
                                               group = religion))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-Data-Visualization-Chapter-2-4_files/figure-html/unnamed-chunk-27-1.png&#34; width=&#34;768&#34; /&gt;
However, they don’t sum to one within each region. They sum to one across regions.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = gss_sm, mapping = aes(x = religion))
p + geom_bar(position = &amp;quot;dodge&amp;quot;, mapping = aes(y = ..prop..,
                                               group = bigregion)) +
  facet_wrap(~bigregion, ncol=1)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-Data-Visualization-Chapter-2-4_files/figure-html/unnamed-chunk-28-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;histgrams-and-density-plots&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Histgrams and Density Plots&lt;/h3&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = midwest, mapping = aes( x = area))
p + geom_histogram()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-Data-Visualization-Chapter-2-4_files/figure-html/unnamed-chunk-29-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = midwest, mapping = aes( x = area))
p + geom_histogram(bins = 10)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-Data-Visualization-Chapter-2-4_files/figure-html/unnamed-chunk-30-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;oh_wi &amp;lt;- c(&amp;quot;OH&amp;quot;, &amp;quot;WI&amp;quot;)
p &amp;lt;- ggplot(data = subset(midwest, subset = state %in% oh_wi),
            mapping = aes(x = percollege, fill = state))
p + geom_histogram(alpha = 0.4, bins = 20)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-Data-Visualization-Chapter-2-4_files/figure-html/unnamed-chunk-31-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = midwest, mapping = aes( x = area))
p + geom_density()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-Data-Visualization-Chapter-2-4_files/figure-html/unnamed-chunk-32-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = midwest, mapping = aes( x = area, fill = state,
                                           color = state))
p + geom_density(alpha = 0.3)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-Data-Visualization-Chapter-2-4_files/figure-html/unnamed-chunk-33-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;avoid-transformations-when-necessary&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Avoid Transformations When Necessary&lt;/h3&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = titanic, mapping = aes(x = fate, y = percent,
                                          fill = sex))
p + geom_bar(position = &amp;quot;dodge&amp;quot;, stat = &amp;quot;identity&amp;quot;) + theme(legend.position = &amp;quot;top&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-Data-Visualization-Chapter-2-4_files/figure-html/unnamed-chunk-34-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = oecd_sum,
            mapping = aes(x = year, y = diff, fill = hi_lo))
p + geom_col() + guides(fill = FALSE) + 
  labs(x = NULL, y = &amp;quot;Difference in Years&amp;quot;,
       title = &amp;quot;The US Life Expectancy Gap&amp;quot;,
       subtitle = &amp;quot;Difference between US and OECD 
       average life expectancies, 1960-2015&amp;quot;,
       caption = &amp;quot;Data: OECD. After a chart by Christopher Ingraham,
       Washington Post, December 27th 2017.&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: Removed 1 rows containing missing values (position_stack).&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-Data-Visualization-Chapter-2-4_files/figure-html/unnamed-chunk-35-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Data Visualization Chapter 5</title>
      <link>/post/data-visualization-chapter-5/</link>
      <pubDate>Thu, 26 Sep 2019 00:00:00 +0000</pubDate>
      <guid>/post/data-visualization-chapter-5/</guid>
      <description>

&lt;div id=&#34;TOC&#34;&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#chapter-5&#34;&gt;Chapter 5&lt;/a&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#use-pipes-to-summerize-data&#34;&gt;Use Pipes to Summerize Data&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#continuous-variables-by-group-or-category&#34;&gt;Continuous Variables by Group or Category&lt;/a&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#write-and-draw-in-the-plot-area&#34;&gt;Write and Draw in the Plot Area&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#scales-guides-and-themes&#34;&gt;Scales, Guides, and Themes&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;

&lt;div id=&#34;chapter-5&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Chapter 5&lt;/h2&gt;
&lt;div id=&#34;use-pipes-to-summerize-data&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Use Pipes to Summerize Data&lt;/h3&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;rel_by_region &amp;lt;- gss_sm %&amp;gt;%
  group_by(bigregion, religion) %&amp;gt;%
  summarize(N = n()) %&amp;gt;%
  mutate(freq = N / sum(N),
         pct = round((freq*100), 0))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: Factor `religion` contains implicit NA, consider using
## `forcats::fct_explicit_na`&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;rel_by_region&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 24 x 5
## # Groups:   bigregion [4]
##    bigregion religion       N    freq   pct
##    &amp;lt;fct&amp;gt;     &amp;lt;fct&amp;gt;      &amp;lt;int&amp;gt;   &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;
##  1 Northeast Protestant   158 0.324      32
##  2 Northeast Catholic     162 0.332      33
##  3 Northeast Jewish        27 0.0553      6
##  4 Northeast None         112 0.230      23
##  5 Northeast Other         28 0.0574      6
##  6 Northeast &amp;lt;NA&amp;gt;           1 0.00205     0
##  7 Midwest   Protestant   325 0.468      47
##  8 Midwest   Catholic     172 0.247      25
##  9 Midwest   Jewish         3 0.00432     0
## 10 Midwest   None         157 0.226      23
## # … with 14 more rows&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;rel_by_region %&amp;gt;% group_by(bigregion) %&amp;gt;% summarize(total = sum(pct))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 4 x 2
##   bigregion total
##   &amp;lt;fct&amp;gt;     &amp;lt;dbl&amp;gt;
## 1 Northeast   100
## 2 Midwest     101
## 3 South       100
## 4 West        101&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(rel_by_region, aes(x = bigregion, y = pct, fill = religion))
p + geom_col(position = &amp;quot;dodge2&amp;quot;) +
  labs(x = &amp;quot;Region&amp;quot;,y = &amp;quot;Percent&amp;quot;, fill = &amp;quot;Religion&amp;quot;) +
  theme(legend.position = &amp;quot;top&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-data-visualization-chapter-5_files/figure-html/unnamed-chunk-3-1.png&#34; width=&#34;768&#34; /&gt;
Use &lt;code&gt;coord_flip()&lt;/code&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(rel_by_region, aes(x = bigregion, y = pct, fill = religion))
p + geom_col(position = &amp;quot;dodge2&amp;quot;) +
  labs(x = &amp;quot;Region&amp;quot;,y = &amp;quot;Percent&amp;quot;, fill = &amp;quot;Religion&amp;quot;) +
  guides(fill = FALSE) + 
  coord_flip() + 
  facet_grid(~ bigregion)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-data-visualization-chapter-5_files/figure-html/unnamed-chunk-4-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(rel_by_region, aes(x = religion, y = pct, fill = religion))
p + geom_col(position = &amp;quot;dodge2&amp;quot;) +
  labs(x = NULL,y = &amp;quot;Percent&amp;quot;, fill = &amp;quot;Religion&amp;quot;) +
  guides(fill = FALSE) + 
  coord_flip() + 
  facet_grid(~ bigregion)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-data-visualization-chapter-5_files/figure-html/unnamed-chunk-5-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;continuous-variables-by-group-or-category&#34; class=&#34;section level2&#34;&gt;
&lt;h2&gt;Continuous Variables by Group or Category&lt;/h2&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = organdata, mapping = aes(x = year, y = donors))
p + geom_line(aes(group = country)) + facet_wrap(~country)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: Removed 34 rows containing missing values (geom_path).&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-data-visualization-chapter-5_files/figure-html/unnamed-chunk-6-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = organdata, mapping = aes(x = country, y = donors))
p + geom_boxplot() + coord_flip()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: Removed 34 rows containing non-finite values (stat_boxplot).&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-data-visualization-chapter-5_files/figure-html/unnamed-chunk-7-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = organdata, mapping = aes(x = reorder(country,
                                                        donors, na.rm = TRUE), y = donors))
p + geom_boxplot() + labs(x = NULL) + coord_flip()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: Removed 34 rows containing non-finite values (stat_boxplot).&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-data-visualization-chapter-5_files/figure-html/unnamed-chunk-8-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = organdata, mapping = aes(x = reorder(country, donors, na.rm = TRUE), 
                                            y = donors, fill = world))
p + geom_boxplot() + labs(x = NULL) + 
  coord_flip() + theme(legend.position = &amp;quot;top&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: Removed 34 rows containing non-finite values (stat_boxplot).&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-data-visualization-chapter-5_files/figure-html/unnamed-chunk-9-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = organdata, mapping = aes(x = reorder(country, donors, na.rm = TRUE), 
                                            y = donors, color = world))
p + geom_point() + labs(x = NULL) + 
  coord_flip() + theme(legend.position = &amp;quot;top&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: Removed 34 rows containing missing values (geom_point).&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-data-visualization-chapter-5_files/figure-html/unnamed-chunk-10-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;-
  ggplot(data = organdata,
         mapping = aes(
           x = reorder(country, donors, na.rm = TRUE),
           y = donors,
           color = world
         ))
p + geom_jitter() + labs(x = NULL) +
  coord_flip() + theme(legend.position = &amp;quot;top&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: Removed 34 rows containing missing values (geom_point).&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-data-visualization-chapter-5_files/figure-html/unnamed-chunk-11-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;-
  ggplot(data = organdata,
         mapping = aes(
           x = reorder(country, donors, na.rm = TRUE),
           y = donors,
           color = world
         ))
p + geom_jitter(position = position_jitter(width = 0.15)) + labs(x = NULL) +
  coord_flip() + theme(legend.position = &amp;quot;top&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: Removed 34 rows containing missing values (geom_point).&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-data-visualization-chapter-5_files/figure-html/unnamed-chunk-12-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;by_country &amp;lt;-
  organdata %&amp;gt;% group_by(consent_law, country) %&amp;gt;% summarize(
    donors_mean = mean(donors, na.rm = TRUE),
    donors_sd = sd(donors, na.rm = TRUE),
    gdp_mean = mean(gdp, na.rm = TRUE),
    health_mean = mean(health, na.rm = TRUE),
    roads_mean = mean(roads, na.rm = TRUE),
    cerebvas_mean = mean(cerebvas, na.rm = TRUE)
  )&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;by_country&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 17 x 8
## # Groups:   consent_law [2]
##    consent_law country donors_mean donors_sd gdp_mean health_mean
##    &amp;lt;chr&amp;gt;       &amp;lt;chr&amp;gt;         &amp;lt;dbl&amp;gt;     &amp;lt;dbl&amp;gt;    &amp;lt;dbl&amp;gt;       &amp;lt;dbl&amp;gt;
##  1 Informed    Austra…        10.6     1.14    22179.       1958.
##  2 Informed    Canada         14.0     0.751   23711.       2272.
##  3 Informed    Denmark        13.1     1.47    23722.       2054.
##  4 Informed    Germany        13.0     0.611   22163.       2349.
##  5 Informed    Ireland        19.8     2.48    20824.       1480.
##  6 Informed    Nether…        13.7     1.55    23013.       1993.
##  7 Informed    United…        13.5     0.775   21359.       1561.
##  8 Informed    United…        20.0     1.33    29212.       3988.
##  9 Presumed    Austria        23.5     2.42    23876.       1875.
## 10 Presumed    Belgium        21.9     1.94    22500.       1958.
## 11 Presumed    Finland        18.4     1.53    21019.       1615.
## 12 Presumed    France         16.8     1.60    22603.       2160.
## 13 Presumed    Italy          11.1     4.28    21554.       1757 
## 14 Presumed    Norway         15.4     1.11    26448.       2217.
## 15 Presumed    Spain          28.1     4.96    16933        1289.
## 16 Presumed    Sweden         13.1     1.75    22415.       1951.
## 17 Presumed    Switze…        14.2     1.71    27233        2776.
## # … with 2 more variables: roads_mean &amp;lt;dbl&amp;gt;, cerebvas_mean &amp;lt;dbl&amp;gt;&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;by_country &amp;lt;- organdata %&amp;gt;% group_by(consent_law, country) %&amp;gt;%
  summarize_if(is.numeric, lst(mean, sd), na.rm = TRUE) %&amp;gt;%
  ungroup()
by_country&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 17 x 28
##    consent_law country donors_mean pop_mean pop_dens_mean gdp_mean
##    &amp;lt;chr&amp;gt;       &amp;lt;chr&amp;gt;         &amp;lt;dbl&amp;gt;    &amp;lt;dbl&amp;gt;         &amp;lt;dbl&amp;gt;    &amp;lt;dbl&amp;gt;
##  1 Informed    Austra…        10.6   18318.         0.237   22179.
##  2 Informed    Canada         14.0   29608.         0.297   23711.
##  3 Informed    Denmark        13.1    5257.        12.2     23722.
##  4 Informed    Germany        13.0   80255.        22.5     22163.
##  5 Informed    Ireland        19.8    3674.         5.23    20824.
##  6 Informed    Nether…        13.7   15548.        37.4     23013.
##  7 Informed    United…        13.5   58187.        24.0     21359.
##  8 Informed    United…        20.0  269330.         2.80    29212.
##  9 Presumed    Austria        23.5    7927.         9.45    23876.
## 10 Presumed    Belgium        21.9   10153.        30.7     22500.
## 11 Presumed    Finland        18.4    5112.         1.51    21019.
## 12 Presumed    France         16.8   58056.        10.5     22603.
## 13 Presumed    Italy          11.1   57360.        19.0     21554.
## 14 Presumed    Norway         15.4    4386.         1.35    26448.
## 15 Presumed    Spain          28.1   39666.         7.84    16933 
## 16 Presumed    Sweden         13.1    8789.         1.95    22415.
## 17 Presumed    Switze…        14.2    7037.        17.0     27233 
## # … with 22 more variables: gdp_lag_mean &amp;lt;dbl&amp;gt;, health_mean &amp;lt;dbl&amp;gt;,
## #   health_lag_mean &amp;lt;dbl&amp;gt;, pubhealth_mean &amp;lt;dbl&amp;gt;, roads_mean &amp;lt;dbl&amp;gt;,
## #   cerebvas_mean &amp;lt;dbl&amp;gt;, assault_mean &amp;lt;dbl&amp;gt;, external_mean &amp;lt;dbl&amp;gt;,
## #   txp_pop_mean &amp;lt;dbl&amp;gt;, donors_sd &amp;lt;dbl&amp;gt;, pop_sd &amp;lt;dbl&amp;gt;, pop_dens_sd &amp;lt;dbl&amp;gt;,
## #   gdp_sd &amp;lt;dbl&amp;gt;, gdp_lag_sd &amp;lt;dbl&amp;gt;, health_sd &amp;lt;dbl&amp;gt;, health_lag_sd &amp;lt;dbl&amp;gt;,
## #   pubhealth_sd &amp;lt;dbl&amp;gt;, roads_sd &amp;lt;dbl&amp;gt;, cerebvas_sd &amp;lt;dbl&amp;gt;,
## #   assault_sd &amp;lt;dbl&amp;gt;, external_sd &amp;lt;dbl&amp;gt;, txp_pop_sd &amp;lt;dbl&amp;gt;&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = by_country,
            mapping = aes(
              x = donors_mean,
              y = reorder(country, donors_mean),
              color = consent_law
            ))
p + geom_point(size = 3) +
  labs(x = &amp;quot;Donor Procurement Rate&amp;quot;,
       y = &amp;quot;&amp;quot;, color = &amp;quot;Consent Law&amp;quot;) +
  theme(legend.position = &amp;quot;top&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-data-visualization-chapter-5_files/figure-html/unnamed-chunk-16-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = by_country,
            mapping = aes(x = donors_mean,
                          y = reorder(country, donors_mean)))

p + geom_point(size = 3) +
  facet_wrap( ~ consent_law, scales = &amp;quot;free_y&amp;quot;, ncol = 1) +
  labs(x = &amp;quot;Donor Procurement Rate&amp;quot;,
       y = &amp;quot;&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-data-visualization-chapter-5_files/figure-html/unnamed-chunk-17-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = by_country,
            mapping = aes(x = reorder(country,
                                      donors_mean), y = donors_mean))

p + geom_pointrange(mapping = aes(ymin = donors_mean - donors_sd,
                                  ymax = donors_mean + donors_sd)) +
  labs(x = &amp;quot;&amp;quot;, y = &amp;quot;Donor Procurement Rate&amp;quot;) + coord_flip()&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-data-visualization-chapter-5_files/figure-html/unnamed-chunk-18-1.png&#34; width=&#34;768&#34; /&gt;
### Plot Text Directly&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = by_country,
            mapping = aes(x = roads_mean,
                          y = donors_mean))
p + geom_point() + geom_text(mapping = aes(label = country))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-data-visualization-chapter-5_files/figure-html/unnamed-chunk-19-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = by_country,
            mapping = aes(x = roads_mean,
                          y = donors_mean))
p + geom_point() + geom_text(mapping = aes(label = country), hjust = 0)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-data-visualization-chapter-5_files/figure-html/unnamed-chunk-20-1.png&#34; width=&#34;768&#34; /&gt;
ggrepel is better than &lt;code&gt;geom_text()&lt;/code&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;library(ggrepel)&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p_title &amp;lt;-
  &amp;quot;Presidential Elections: Popular &amp;amp; Electoral College Margins&amp;quot;
p_subtitle &amp;lt;- &amp;quot;1824-2016&amp;quot;
p_caption &amp;lt;- &amp;quot;Data for 2016 are provisional.&amp;quot;
x_label &amp;lt;- &amp;quot;Winner&amp;#39;s share of Popular Vote&amp;quot;
y_label &amp;lt;- &amp;quot;Winner&amp;#39;s share of Electoral College Votes&amp;quot;

p &amp;lt;- ggplot(elections_historic,
            aes(x = popular_pct, y = ec_pct,
                label = winner_label))

p + geom_hline(yintercept = 0.5,
               size = 1.4,
               color = &amp;quot;gray80&amp;quot;) +
  geom_vline(xintercept = 0.5,
             size = 1.4,
             color = &amp;quot;gray80&amp;quot;) +
  geom_point() +
  geom_text_repel() +
  scale_x_continuous(labels = scales::percent) +
  scale_y_continuous(labels = scales::percent) +
  labs(
    x = x_label,
    y = y_label,
    title = p_title,
    subtitle = p_subtitle,
    caption = p_caption
  )&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-data-visualization-chapter-5_files/figure-html/unnamed-chunk-22-1.png&#34; width=&#34;768&#34; /&gt;
### Label Outliers&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = by_country,
            mapping = aes(x = gdp_mean, y = health_mean))

p + geom_point() +
  geom_text_repel(data = subset(by_country, gdp_mean &amp;gt; 25000),
                  mapping = aes(label = country))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-data-visualization-chapter-5_files/figure-html/unnamed-chunk-23-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = by_country,
            mapping = aes(x = gdp_mean, y = health_mean))

p + geom_point() +
  geom_text_repel(
    data = subset(
      by_country,
      gdp_mean &amp;gt; 25000 | health_mean &amp;lt; 1500 |
        country %in% &amp;quot;Belgium&amp;quot;
    ),
    mapping = aes(label = country)
  )&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-data-visualization-chapter-5_files/figure-html/unnamed-chunk-23-2.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;organdata$ind &amp;lt;- organdata$ccode %in% c(&amp;quot;Ita&amp;quot;, &amp;quot;Spa&amp;quot;) &amp;amp;
  organdata$year &amp;gt; 1998

p &amp;lt;- ggplot(data = organdata,
            mapping = aes(x = roads,
                          y = donors, color = ind))
p + geom_point() +
  geom_text_repel(data = subset(organdata, ind),
                  mapping = aes(label = ccode)) +
  guides(label = FALSE, color = FALSE)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: Removed 34 rows containing missing values (geom_point).&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-data-visualization-chapter-5_files/figure-html/unnamed-chunk-24-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;div id=&#34;write-and-draw-in-the-plot-area&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Write and Draw in the Plot Area&lt;/h3&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = organdata, mapping = aes(x = roads, y = donors))
p + geom_point() + annotate(
  geom = &amp;quot;text&amp;quot;,
  x = 91,
  y = 33,
  label = &amp;quot;A surprisingly high \n recovery rate.&amp;quot;,
  hjust = 0
)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: Removed 34 rows containing missing values (geom_point).&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-data-visualization-chapter-5_files/figure-html/unnamed-chunk-25-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = organdata,
            mapping = aes(x = roads, y = donors))
p + geom_point() +
  annotate(
    geom = &amp;quot;rect&amp;quot;,
    xmin = 125,
    xmax = 155,
    ymin = 30,
    ymax = 35,
    fill = &amp;quot;red&amp;quot;,
    alpha = 0.2
  ) +
  annotate(
    geom = &amp;quot;text&amp;quot;,
    x = 157,
    y = 33,
    label = &amp;quot;A surprisingly high \n recovery rate.&amp;quot;,
    hjust = 0
  )&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: Removed 34 rows containing missing values (geom_point).&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-data-visualization-chapter-5_files/figure-html/unnamed-chunk-26-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;scales-guides-and-themes&#34; class=&#34;section level3&#34;&gt;
&lt;h3&gt;Scales, Guides, and Themes&lt;/h3&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = organdata,
            mapping = aes(x = roads,
                          y = donors,
                          color = world))
p + geom_point()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: Removed 34 rows containing missing values (geom_point).&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-data-visualization-chapter-5_files/figure-html/unnamed-chunk-27-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = organdata,
            mapping = aes(x = roads,
                          y = donors,
                          color = world))
p + geom_point() + scale_x_log10() + scale_y_continuous(breaks = c(5,
                                                                   15, 25),
                                                        labels = c(&amp;quot;Five&amp;quot;, &amp;quot;Fifteen&amp;quot;, &amp;quot;Twenty Five&amp;quot;))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: Removed 34 rows containing missing values (geom_point).&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-data-visualization-chapter-5_files/figure-html/unnamed-chunk-28-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = organdata,
            mapping = aes(x = roads, y = donors,
                          color = world))
p + geom_point() + scale_color_discrete(labels = c(&amp;quot;Corporatist&amp;quot;,
                                                   &amp;quot;Liberal&amp;quot;, &amp;quot;Social Democratic&amp;quot;, &amp;quot;Unclassified&amp;quot;)) + 
  labs(x = &amp;quot;Road Deaths&amp;quot;,
       y = &amp;quot;Donor Procurement&amp;quot;, color = &amp;quot;Welfare State&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: Removed 34 rows containing missing values (geom_point).&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-data-visualization-chapter-5_files/figure-html/unnamed-chunk-29-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;p &amp;lt;- ggplot(data = organdata,
            mapping = aes(x = roads, y = donors,
                          color = world))
p + geom_point() + labs(x = &amp;quot;Road Deaths&amp;quot;, y = &amp;quot;Donor Procurement&amp;quot;) +
  guides(color = FALSE)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: Removed 34 rows containing missing values (geom_point).&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;../../post/2019-09-26-data-visualization-chapter-5_files/figure-html/unnamed-chunk-30-1.png&#34; width=&#34;768&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Panel data in R vs in Stata</title>
      <link>/post/panel-data-in-r-vs-in-stata/</link>
      <pubDate>Tue, 27 Aug 2019 00:00:00 +0000</pubDate>
      <guid>/post/panel-data-in-r-vs-in-stata/</guid>
      <description>&lt;h2 id=&#34;panel-data-with-one-way-fixed-effect&#34;&gt;Panel data with one way fixed effect&lt;/h2&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-R&#34; data-lang=&#34;R&#34;&gt;mm1 &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; invforward &lt;span style=&#34;color:#f92672&#34;&gt;~&lt;/span&gt; TOBINQ &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; inv &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; top3 &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; size &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; lev &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; cash &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; loss &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; lnage &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; cfo &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; sd &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; ic &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;factor&lt;/span&gt;(year)
zzz &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;plm&lt;/span&gt;(mm1,data&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;sample,model&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;within&amp;#34;&lt;/span&gt;,index&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;c&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;stkcd&amp;#34;&lt;/span&gt;))
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;same as xtreg i.year fe , without robust vcetype
用这种方法算出来$R^2$和Stata报告$R^2$ within的一致&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-R&#34; data-lang=&#34;R&#34;&gt;m1 &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; invforward &lt;span style=&#34;color:#f92672&#34;&gt;~&lt;/span&gt; TOBINQ &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; inv &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; top3 &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; size &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; lev &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; cash &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; loss &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; lnage &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; cfo &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; sd &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; ic
zz &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;plm&lt;/span&gt;(m1,data&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;sample,model&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;within&amp;#34;&lt;/span&gt;,index&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;c&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;stkcd&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;year&amp;#34;&lt;/span&gt;),effect &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;twoways&amp;#34;&lt;/span&gt;)
&lt;span style=&#34;color:#a6e22e&#34;&gt;summary&lt;/span&gt;(zz)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;same sa xtreg i.year, fe , without robust vcetype，但$R^2$较Stata报告$R^2$ within小&lt;/p&gt;
&lt;h2 id=&#34;vcetype-robust&#34;&gt;vcetype robust&lt;/h2&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-R&#34; data-lang=&#34;R&#34;&gt;zz_r &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;coeftest&lt;/span&gt;(zz, vcov.&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;function&lt;/span&gt;(x) &lt;span style=&#34;color:#a6e22e&#34;&gt;vcovHC&lt;/span&gt;(x, type&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;sss&amp;#34;&lt;/span&gt;)) &lt;span style=&#34;color:#75715e&#34;&gt;# same as stata xtreg i.year, fe r&lt;/span&gt;
&lt;span style=&#34;color:#75715e&#34;&gt;# OR&lt;/span&gt;
zzz_r &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;coeftest&lt;/span&gt;(zzz, vcov.&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;function&lt;/span&gt;(x) &lt;span style=&#34;color:#a6e22e&#34;&gt;vcovHC&lt;/span&gt;(x, type&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;sss&amp;#34;&lt;/span&gt;))
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;组间系数比较&#34;&gt;组间系数比较&lt;/h2&gt;
&lt;p&gt;OLS可用&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4&#34;&gt;&lt;code class=&#34;language-R&#34; data-lang=&#34;R&#34;&gt;sur_diff &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt;  MVBV &lt;span style=&#34;color:#f92672&#34;&gt;~&lt;/span&gt; (Dm &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; Dh &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; EBV &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; DmEBV &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt;DhEBV)&lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt;g_layer
h2t &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; h2 &lt;span style=&#34;color:#f92672&#34;&gt;%&amp;gt;%&lt;/span&gt;
  &lt;span style=&#34;color:#a6e22e&#34;&gt;filter&lt;/span&gt;(g_layer &lt;span style=&#34;color:#f92672&#34;&gt;!=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt;)&lt;span style=&#34;color:#f92672&#34;&gt;%&amp;gt;%&lt;/span&gt;
  &lt;span style=&#34;color:#a6e22e&#34;&gt;mutate&lt;/span&gt;(g_layer &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;ifelse&lt;/span&gt;(g_layer &lt;span style=&#34;color:#f92672&#34;&gt;==&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;, &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;, &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;))
mm &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;lm&lt;/span&gt;(sur_diff,data&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;h2t)
ttt &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;-&lt;/span&gt;  &lt;span style=&#34;color:#a6e22e&#34;&gt;coeftest&lt;/span&gt;(mm, vcov.&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;function&lt;/span&gt;(x) &lt;span style=&#34;color:#a6e22e&#34;&gt;vcovHC&lt;/span&gt;(x, cluster&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;group&amp;#34;&lt;/span&gt;, type&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;HC1&amp;#34;&lt;/span&gt;))

&lt;span style=&#34;color:#a6e22e&#34;&gt;stargazer&lt;/span&gt;(fpm,models_growth_layer,type &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;text&amp;#34;&lt;/span&gt;, column.labels &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; table4_label)
&lt;span style=&#34;color:#a6e22e&#34;&gt;stargazer&lt;/span&gt;(fpm_r,robusts_growth_layer,type &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;text&amp;#34;&lt;/span&gt;, column.labels &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; table4_label,
          add.lines&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#a6e22e&#34;&gt;c&lt;/span&gt;(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;DhEBV(4)-(2)&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#a6e22e&#34;&gt;str_c&lt;/span&gt;(&lt;span style=&#34;color:#a6e22e&#34;&gt;round&lt;/span&gt;(ttt[12,&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;],&lt;span style=&#34;color:#ae81ff&#34;&gt;3&lt;/span&gt;),&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;**(p=&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#a6e22e&#34;&gt;round&lt;/span&gt;(ttt[12,&lt;span style=&#34;color:#ae81ff&#34;&gt;4&lt;/span&gt;],&lt;span style=&#34;color:#ae81ff&#34;&gt;3&lt;/span&gt;),&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;)&amp;#34;&lt;/span&gt;)))
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Panel Data不行！One way, two way fixed effect都不行！
建议直接加interaction&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Difference in Difference</title>
      <link>/post/difference-in-difference/</link>
      <pubDate>Wed, 10 Jul 2019 00:00:00 +0000</pubDate>
      <guid>/post/difference-in-difference/</guid>
      <description>&lt;h2 id=&#34;效應評估模型&#34;&gt;效應評估模型&lt;/h2&gt;

&lt;p&gt;“提高最低工資是否會減少就業？”&lt;/p&gt;

&lt;p&gt;“最低工資提高是否餐廳的全職員工數會減少？”&lt;/p&gt;

&lt;p&gt;假設 $MinWage$為「最低工資有提高」的虛擬變數， $FEmp$為餐廳全職員工數。&lt;/p&gt;

&lt;p&gt;&lt;span  class=&#34;math&#34;&gt;\[
FEmp_i=FEmp_{0,i}+\beta^*MinWage_i
\]&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;&lt;span  class=&#34;math&#34;&gt;\[
FEmp_i=\beta_0+\beta_1 MinWage_i+\epsilon_i
\]&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;「沒有受到最低工資提高影響下的員工數」$FEmp_{0,i}$與「有無受到最低工資提高影響」无關时OLS是一致估计。&lt;/p&gt;

&lt;p&gt;令 $s$表示餐廳所屬的州，則原本的效應模型可以寫成：
&lt;span  class=&#34;math&#34;&gt;\(
\begin{eqnarray}
FEmp_{is}=FEmp_{0,is}+\beta^*MinWage_{s}
\tag{7.1}
\end{eqnarray}
\)&lt;/span&gt;&lt;/p&gt;

&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Pre&lt;/th&gt;
&lt;th&gt;Post&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;

&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Control&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;$MinWage=1$:PA&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;Treatment&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;$MinWage=1$:NJ&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;h2 id=&#34;複迴歸模型&#34;&gt;複迴歸模型&lt;/h2&gt;

&lt;p&gt;餐廳的型態（大型連鎖、咖啡店、小吃店等等）會影響員工僱用量。
&lt;span  class=&#34;math&#34;&gt;\(
\begin{eqnarray}
FEmp_{is} =FEmp_{0,-type,is}+\beta^*MinWage_s+\gamma&#39;type_{is}
\tag{7.2}
\end{eqnarray}
\)&lt;/span&gt;
其中
&lt;span  class=&#34;math&#34;&gt;\(
FEmp_{0,-type,is}=FEmp_{0,is}-\mathbb{E}(FEmp_{0,is}|type_{is})
\)&lt;/span&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;在思考怱略變數偏誤(omitted variable bias)時，可能的confounder都必需放在（依實驗組/控制組分的）加總層級來思考。&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2 id=&#34;固定效果&#34;&gt;固定效果&lt;/h2&gt;

&lt;h3 id=&#34;組固定效果&#34;&gt;組固定效果&lt;/h3&gt;

&lt;p&gt;&lt;span  class=&#34;math&#34;&gt;\[
FEmp_{is}=FEmp_{0,is}+\beta^*MinWage_{s}
\]&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;多數時候實驗組/控制組在政策還沒施行前，他們就存在組間的特質差異，也就是
&lt;span  class=&#34;math&#34;&gt;\(
FEmp_{0,is}=FEmp_{0,-\alpha_s,is}+\alpha_s
\)&lt;/span&gt;
其中$\alpha_s$ 代表因組而異的confounder效果。&lt;/p&gt;

&lt;p&gt;若沒有其他confounder，我們可以估計以下迴歸模型：
&lt;span  class=&#34;math&#34;&gt;\(
FEmp_{ist}=\alpha_s+\beta^* MinWage_{st}+\epsilon_{ist}
\)&lt;/span&gt;&lt;/p&gt;

&lt;h3 id=&#34;時間固定效果&#34;&gt;時間固定效果&lt;/h3&gt;

&lt;p&gt;&lt;span  class=&#34;math&#34;&gt;\[
FEmp_{ist}=FEmp_{0,-(\alpha_s,\delta_t),ist}+\alpha_s+\delta_t+\beta^*MinWage_{st}
\]&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;所對應的迴歸模型為：
&lt;span  class=&#34;math&#34;&gt;\(
FEmp_{ist}=\alpha_s+\delta_t+\beta^* MinWage_{st}+\epsilon_{ist}
\)&lt;/span&gt;&lt;/p&gt;

&lt;h3 id=&#34;資料追踪不追踪&#34;&gt;資料追踪/不追踪&lt;/h3&gt;

&lt;p&gt;雖然$FEmp_{ist}$ 有到個別餐廳（即有下標 $i$），然而固定效果只到組層級（即下標 $s$)，因此在估計上我們並不需要追踪同一家餐廳——各期抽樣的餐廳可以不同。&lt;/p&gt;

&lt;h2 id=&#34;did-估计法&#34;&gt;DiD 估计法&lt;/h2&gt;

&lt;p&gt;&lt;span  class=&#34;math&#34;&gt;\[
\begin{eqnarray}
FEmp_{ist}=\alpha_s+\delta_t+\beta^*MinWage_{st}+\epsilon_{ist}
\tag{7.3}
\end{eqnarray}
\]&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;&lt;span  class=&#34;math&#34;&gt;\[
FEmp_{ist}=\beta_0+\alpha_1D1_s+\delta_1B1_t+\beta_1MinWage_{st}+\epsilon_{ist}
\]&lt;/span&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;令$D1=1$代表來自第1個州（NJ）的虛擬變數。&lt;/li&gt;
&lt;li&gt;令$B1 = 1$代表政策施行「後」的虛擬變數。&lt;/li&gt;
&lt;li&gt;$MinWage_{st}=D1_s\times B1_t$&lt;/li&gt;
&lt;/ul&gt;

&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;State&lt;/th&gt;
&lt;th&gt;t=0&lt;/th&gt;
&lt;th&gt;T=1&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;

&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;NJ&lt;/td&gt;
&lt;td&gt;D1=1,B1=0&lt;/td&gt;
&lt;td&gt;D1=1,B1=1&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;PA&lt;/td&gt;
&lt;td&gt;D1=0,B1=0&lt;/td&gt;
&lt;td&gt;D1=0,B1=1&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;h2 id=&#34;cluster-standard-error&#34;&gt;cluster standard error&lt;/h2&gt;

&lt;p&gt;我們有G1-G4共四群誤差項的變異數及跨群間的共變異數需要去留意，當誤差項有聚類（clustering）可能時，必需要適當的調整估計式標準誤。&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Panel Data</title>
      <link>/post/panel-data/</link>
      <pubDate>Wed, 10 Jul 2019 00:00:00 +0000</pubDate>
      <guid>/post/panel-data/</guid>
      <description>&lt;h2 id=&#34;效應評估模型&#34;&gt;效應評估模型&lt;/h2&gt;

&lt;p&gt;&lt;span  class=&#34;math&#34;&gt;\[
mrall=mrall_{-BeerTax}+\beta^*BeerTax
\]&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;提高啤酒稅（BeerTax）是否有助減低車禍死亡率（mrall）？&lt;/p&gt;

&lt;h2 id=&#34;固定效應模型&#34;&gt;固定效應模型&lt;/h2&gt;

&lt;p&gt;令 $W$代表「州愛喝酒程度」。&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;$W$與 $mrall_{-BeerTax}+$有關&lt;/li&gt;
&lt;li&gt;$W$與 $BeerTax$有關&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;span  class=&#34;math&#34;&gt;\[
mrall=(mrall_{-BT}-\mathbb{E}(mrall_{-BT}|W))+\mathbb{E}(mrall_{-BT}|W) + \beta^*BeerTax
\]&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;&lt;span  class=&#34;math&#34;&gt;\[
mrall_{-BT,-W}\equiv mrall_{-BT}-\mathbb{E}(mrall_{-BT}|W)
\]&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;&lt;span  class=&#34;math&#34;&gt;\[
mrall=mrall_{-BT,-W}+\mathbb{E}(mrall_{-BT}|W)+\beta^*BeerTax
\]&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;$mrall_{-BT,-W}$為「去除」 $W$影響的「非啤酒稅造成的車禍死亡因素」：&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;它與 $W$無關。&lt;/li&gt;
&lt;li&gt;若兩筆obs有相同飲酒文化，即$W$相同，他們的 $\mathbb{E}(mrall_{-BT}|W)$
會相同。&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;「假設」一個地方的飲酒文化「不隨時間改變」，即同一州在不同時點的$W$相同。&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;令&lt;span  class=&#34;math&#34;&gt;\(\mathbb{E}(mrall_{-BT,it}|W_i)=\alpha_i\)&lt;/span&gt;， 故我們的效應模型可以寫成：
&lt;span  class=&#34;math&#34;&gt;\(
mrall_{it}=mrall_{-BT,-W,it}+\alpha_i+\beta^*BeerTax_{it}
\)&lt;/span&gt;
其中$\alpha_i$為第 $i$ 個州的固定效果：&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;$BearTax$與$mrall_{-BT,-W}$無關&lt;/li&gt;
&lt;li&gt;$BearTax$與$\alpha$有關&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;組內差異最小平方法&#34;&gt;組內差異最小平方法&lt;/h2&gt;

&lt;p&gt;差分OLS解决$\alpha_i$不可得的阻碍&lt;/p&gt;

&lt;p&gt;&lt;span  class=&#34;math&#34;&gt;\[
mrall_{i1}-mrall_{i0}=\beta^* (BeerTax_{i1}-BearTax_{i0})+(mrall_{-BT,-W,i1}-mrall_{-BT,-W,i0})
\]&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;如果$t$超過兩期，考慮用組內平均為差分比較的點。&lt;/p&gt;

&lt;p&gt;即&lt;span  class=&#34;math&#34;&gt;\(x_1-\bar{x},x_2-\bar{x},...,x_n-\bar{x}, \bar{x}=\sum_{i=1}^n x_i/n\)&lt;/span&gt;
&lt;span  class=&#34;math&#34;&gt;\(
\bar{mrall}_i=\sum_{t=1}^T mrall_{it}/T \\
\bar{BeerTax}_i=\sum_{t=1}^T BeerTax_{it}/T\\
\bar{mrall}_{-BT,-W,i}=\sum_{t=1}^T mrall_{-BT,-W,it}/T
\)&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;&lt;span  class=&#34;math&#34;&gt;\[
mrall_{it}-\bar{mrall}_i=\beta^*\left( BeerTax_{it}-\bar{BeerTax}_i\right)+(mrall_{-BT,-W,it}-\bar{mrall}_{-BT,-W,i})
\]&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;固定效果模型下，我們可以以最小平方法估計下面的迴歸式：
&lt;span  class=&#34;math&#34;&gt;\(
mrall_{it}-\bar{mrall}_i=\beta_0+\beta_1\left( BeerTax_{it}-\bar{BeerTax}_i\right)+\epsilon_{it}
\)&lt;/span&gt;
其中$\hat{\beta}_1$即為$\beta^*$的一致性估計&lt;/p&gt;

&lt;h2 id=&#34;常見的固定效果模型&#34;&gt;常見的固定效果模型&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Identity fixed effect:$\alpha_i$&lt;/li&gt;
&lt;li&gt;Time fixed effect:  $\delta_i$&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;span  class=&#34;math&#34;&gt;\[
mrall_{-BT,it}=mrall_{-BT,-W_i,-Z_t}+\alpha_i+\delta_t
\]&lt;/span&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;$W_i$為造成效應係數估計偏誤的變數，它在$i$面向固定不變。&lt;/li&gt;
&lt;li&gt;$Z_t$為造成效應係數估計偏誤的變數，它在$t$面向固定不變。&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;如$Z_t$為全美國的景氣狀況。&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;對應的迴歸模型：
&lt;span  class=&#34;math&#34;&gt;\(
mrall_{it}=\alpha_i+\delta_t+\beta_1 BeerTax_{it}+\epsilon_{it}
\)&lt;/span&gt;&lt;/p&gt;

&lt;h2 id=&#34;廣義的固定效果模型&#34;&gt;廣義的固定效果模型&lt;/h2&gt;

&lt;p&gt;&lt;span  class=&#34;math&#34;&gt;\[
mrall=mrall_{-BeerTax}+\beta^*BeerTax
\]&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;但
&lt;span  class=&#34;math&#34;&gt;\(
\begin{equation}
  mrall_{-BT,it}\not\perp BeerTax_{it}
  \tag{5.1}
\end{equation}
\)&lt;/span&gt;&lt;/p&gt;

&lt;h3 id=&#34;複迴歸控制&#34;&gt;複迴歸控制&lt;/h3&gt;

&lt;p&gt;先思考造成(5.1)的變數有哪些——統計上稱這些變數為混淆變數(confounder)。Confounder中有資料的（令為$Z$）可進一步用來擴充模型成為：
&lt;span  class=&#34;math&#34;&gt;\(
mrall_{it}=mrall_{-BT,-Z,it}+\beta^*BeerTax_{it}+\gamma&#39;Z_{it}
\)&lt;/span&gt;
其中：
&lt;span  class=&#34;math&#34;&gt;\(
mrall_{-BT,-Z}=mrall_{-BT}-\mathbb{E}(mrall_{-BT}|Z)
\)&lt;/span&gt;&lt;/p&gt;

&lt;h3 id=&#34;固定效果模型&#34;&gt;固定效果模型&lt;/h3&gt;

&lt;p&gt;Confounder中沒有資料但在某些面向固定的，假設分成以下兩類：&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;$W_i$：在同個identity下固定。&lt;/li&gt;
&lt;li&gt;$V_t$：在同個time下固定。&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;span  class=&#34;math&#34;&gt;\[
\begin{eqnarray}
mrall_{it}=mrall_{-BT,-(Z,W,V),it}+\beta^*BeerTax_{it}+\\
\alpha_i+\delta_t+\gamma&#39;Z_{it}
\tag{5.2}
\end{eqnarray}
\]&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;(5.2)是相當廣義的固定效果效應模型——有兩個面向的固定效果及控制變數。&lt;/p&gt;

&lt;h2 id=&#34;隨機效果模型&#34;&gt;隨機效果模型&lt;/h2&gt;

&lt;p&gt;&lt;span  class=&#34;math&#34;&gt;\[
mrall_{it}=mrall_{-BT,-Z,it}+\beta^*BeerTax_{it}+\gamma&#39;Z_{it}
\]&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;隨機效果模型(Random Effect model)的設定：&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;使用迴歸模型：&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;span  class=&#34;math&#34;&gt;\[
\begin{eqnarray}
  mrall_{it}=\beta_0+\beta_{1}BeerTax_{it}+\gamma&#39;Z_{it}+\nu_{it}
  \tag{5.3}
\end{eqnarray}
\]&lt;/span&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;假設$\nu_{it}$ 具有某種結構。&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;其中假设：&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;$\nu_{it}\perp BeerTax_{it}$&lt;/li&gt;
&lt;li&gt;&lt;span  class=&#34;math&#34;&gt;\(var(\alpha_i|X)=\sigma_{\alpha}^2\)&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;$var(\epsilon_{it}|X)=\sigma^2$&lt;/li&gt;
&lt;li&gt;$cov(\epsilon_{it},\epsilon_{is}|X)=0$&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;隨機效果模型帶有高度誤差項假設，故不建議使用。&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2 id=&#34;hausman檢定&#34;&gt;Hausman檢定&lt;/h2&gt;

&lt;h3 id=&#34;固定效果模型fe&#34;&gt;固定效果模型(FE)&lt;/h3&gt;

&lt;p&gt;表示使用組內差異最小平法方去估算以下迴歸模型中的&lt;span  class=&#34;math&#34;&gt;\(\beta_1\)&lt;/span&gt;:
&lt;span  class=&#34;math&#34;&gt;\(
mrall_{it}=\beta_0+\beta_{1}BeerTax_{it}+\gamma&#39;Z_{it}+\alpha_i+\epsilon_{it}
\)&lt;/span&gt;&lt;/p&gt;

&lt;h3 id=&#34;隨機效果模型re&#34;&gt;隨機效果模型(RE)&lt;/h3&gt;

&lt;p&gt;表示使用GLS去估算以下迴歸模型中的&lt;span  class=&#34;math&#34;&gt;\(\beta_1\)&lt;/span&gt;:
&lt;span  class=&#34;math&#34;&gt;\(
mrall_{it}=\beta_0+\beta_{1}BeerTax_{it}+\gamma&#39;Z_{it}+\nu_{it}
\)&lt;/span&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;span  class=&#34;math&#34;&gt;\(\nu_{it}=\alpha_i+\epsilon_{it}\)&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;假設&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;RE下「關於variance、covariance的假設」都成立。&lt;/li&gt;
&lt;li&gt;&lt;span  class=&#34;math&#34;&gt;\(\epsilon_{it} \perp BeerTax_{it} | \alpha_i,Z_{it}\)&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;H0:&lt;/strong&gt; &lt;span  class=&#34;math&#34;&gt;\(\alpha_i \perp BeerTax_{it} |Z_{it}\)&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;H0为RE，拒绝则为FE&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Linear Regression</title>
      <link>/post/linear-regression/</link>
      <pubDate>Thu, 04 Jul 2019 00:00:00 +0000</pubDate>
      <guid>/post/linear-regression/</guid>
      <description>&lt;h2 id=&#34;ols-estimator&#34;&gt;OLS estimator&lt;/h2&gt;

&lt;p&gt;The method to compute (or &lt;em&gt;estimate&lt;/em&gt;) $b_0$ and $b_1$ we illustrated above is called &lt;em&gt;Ordinary Least Squares&lt;/em&gt;, or OLS. $b_0$ and $b_1$ are therefore also often called the &lt;em&gt;OLS coefficients&lt;/em&gt;. By solving problem&lt;/p&gt;

&lt;p&gt;&lt;span  class=&#34;math&#34;&gt;\[
\begin{align}
e_i &amp; = y_i - \hat{y}_i = y_i - \underbrace{\left(b_0 + b_1 x_i\right)}_\text{prediction}\\
e_1^2 + \dots + e_N^2 &amp;= \sum_{i=1}^N e_i^2 \equiv \text{SSR}(b_0,b_1) \\
(b_0,b_1) &amp;= \arg \min_{\text{int},\text{slope}} \sum_{i=1}^N \left[y_i - \left(\text{int} + \text{slope } x_i\right)\right]^2 
\end{align}
\]&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;one can derive an explicit formula for them:&lt;/p&gt;

&lt;p&gt;&lt;span  class=&#34;math&#34;&gt;\(
\begin{equation}
b_1 = \frac{cov(x,y)}{var(x)}
\end{equation}
\)&lt;/span&gt;
i.e. the estimate of the slope coefficient is the covariance between $x$ and $y$ divided by the variance of $x$, both computed from our sample of data. With $b_1$ in hand, we can get the estimate for the intercept as&lt;/p&gt;

&lt;p&gt;&lt;span  class=&#34;math&#34;&gt;\[\begin{equation}
b_0 = \bar{y} - b_1 \bar{x}
\end{equation}\]&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;where $\bar{z}$ denotes the sample mean of variable $z$. The interpretation of the OLS slope coefficient $b_1$ is as follows. Given a line as in $y = b_0 + b_1 x$,&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;$b_1 = \frac{d y}{d x}$ measures the change in $y$ resulting from a one unit change in $x$&lt;/li&gt;
&lt;li&gt;For example, if $y$ is wage and $x$ is years of education, $b_1$ would measure the effect of an additional year of education on wages.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There is an alternative representation for the OLS slope coefficient which relates to the &lt;em&gt;correlation coefficient&lt;/em&gt; $r$. Remember that $r = \frac{cov(x,y)}{s_x s_y}$, where $s_z$ is the standard deviation of variable $z$. With this in hand, we can derive the OLS slope coefficient as&lt;/p&gt;

&lt;p&gt;$$
\begin{align}
b_1 &amp;amp;= \frac{cov(x,y)}{var(x)}\&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;= \frac{cov(x,y)}{s_x s_x} \\
&amp;= r\frac{s_y}{s_x} \end{align}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;$$&lt;/p&gt;

&lt;p&gt;In other words, the slope coefficient is equal to the correlation coefficient $r$ times the ratio of standard deviations of $y$ and $x$.&lt;/p&gt;

&lt;h3 id=&#34;linear-regression-without-regressor&#34;&gt;Linear Regression without Regressor&lt;/h3&gt;

&lt;p&gt;&lt;span  class=&#34;math&#34;&gt;\[
\begin{equation}
y = b_0
\end{equation}
\]&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;This means that our minimization problem becomes very simple: We only have to choose $b_0$! We have&lt;/p&gt;

&lt;p&gt;&lt;span  class=&#34;math&#34;&gt;\(
b_0 = \arg\min_{\text{int}} \sum_{i=1}^N \left[y_i - \text{int}\right]^2,
\)&lt;/span&gt;
which is a quadratic equation with a unique optimum such that
&lt;span  class=&#34;math&#34;&gt;\(
b_0 = \frac{1}{N} \sum_{i=1}^N y_i = \overline{y}.
\)&lt;/span&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Least Squares &lt;strong&gt;without regressor&lt;/strong&gt; $x$ estimates the sample mean of the outcome variable $y$, i.e. it produces $\overline{y}$.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3 id=&#34;regression-without-an-intercept&#34;&gt;Regression without an Intercept&lt;/h3&gt;

&lt;p&gt;We follow the same logic here, just that we miss another bit from our initial equation and the minimisation problem now becomes:
&lt;span  class=&#34;math&#34;&gt;\(
\begin{align}
b_1 &amp;= \arg\min_{\text{slope}} \sum_{i=1}^N \left[y_i - \text{slope } x_i \right]^2\\
\mapsto b_1 &amp;= \frac{\frac{1}{N}\sum_{i=1}^N x_i y_i}{\frac{1}{N}\sum_{i=1}^N x_i^2} = \frac{\bar{x} \bar{y}}{\overline{x^2}} 
\end{align}
\)&lt;/span&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Least Squares &lt;strong&gt;without intercept&lt;/strong&gt; (i.e. with $b_0=0$) is a line that passes through the origin.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In this case we only get to choose the slope $b_1$ of this anchored line.&lt;sup class=&#34;footnote-ref&#34; id=&#34;fnref:fn1&#34;&gt;&lt;a class=&#34;footnote&#34; href=&#34;#fn:fn1&#34;&gt;1&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;h3 id=&#34;centering-a-regression&#34;&gt;Centering A Regression&lt;/h3&gt;

&lt;p&gt;By &lt;em&gt;centering&lt;/em&gt; or &lt;em&gt;demeaning&lt;/em&gt; a regression, we mean to substract from both $y$ and $x$ their respective averages to obtain $\tilde{y}_i = y_i - \bar{y}$ and $\tilde{x}_i = x_i - \bar{x}$. We then run a regression &lt;em&gt;without intercept&lt;/em&gt; as above. That is, we use $\tilde{x}_i,\tilde{y}_i$ instead of $x_i,y_i$ in&lt;/p&gt;

&lt;p&gt;&lt;span  class=&#34;math&#34;&gt;\[
\begin{align}
b_1 &amp;= \arg\min_{\text{slope}} \sum_{i=1}^N \left[y_i - \text{slope } x_i \right]^2\\
\mapsto b_1 &amp;= \frac{\frac{1}{N}\sum_{i=1}^N x_i y_i}{\frac{1}{N}\sum_{i=1}^N x_i^2} = \frac{\bar{x} \bar{y}}{\overline{x^2}} 
\end{align}
\]&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;to obtain our slope estimate &lt;span  class=&#34;math&#34;&gt;\(b_1\)&lt;/span&gt;:&lt;/p&gt;

&lt;p&gt;$$
\begin{align}
b&lt;em&gt;1 &amp;amp;= \frac{\frac{1}{N}\sum&lt;/em&gt;^N \tilde{x}_i \tilde{y}&lt;em&gt;i}{\frac{1}{N}\sum&lt;/em&gt;^N \tilde{x}_i^2}\&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;= \frac{\frac{1}{N}\sum_{i=1}^N (x_i - \bar{x}) (y_i - \bar{y})}{\frac{1}{N}\sum_{i=1}^N (x_i - \bar{x})^2} \\
&amp;= \frac{cov(x,y)}{var(x)}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;\end{align}
$$&lt;/p&gt;

&lt;p&gt;This last expression is &lt;em&gt;identical&lt;/em&gt; to the one in OLS estimate! It&#39;s the standard OLS estimate for the slope coefficient. We note the following:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Adding a constant to a regression produces the same result as centering all variables and estimating without intercept. So, unless all variables are centered, &lt;strong&gt;always&lt;/strong&gt; include an intercept in the regression.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3 id=&#34;reg-standard&#34;&gt;Standardizing A Regression&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Standardizing&lt;/em&gt; a variable $z$ means to demean as above, but in addition to divide the demeaned value by its own standard deviation. Similarly to what we did above for &lt;em&gt;centering&lt;/em&gt;, we define transformed variables $\breve{y}_i = \frac{y_i-\bar{y}}{\sigma_y}$ and $\breve{x}_i = \frac{x_i-\bar{x}}{\sigma_x}$ where $\sigma_z$ is the standard deviation of variable $z$. From here on, you should by now be used to what comes next! As above, we use $\breve{x}_i,\breve{y}_i$ instead of $x_i,y_i$:&lt;/p&gt;

&lt;p&gt;$$
\begin{align}
b&lt;em&gt;1 &amp;amp;= \frac{\frac{1}{N}\sum&lt;/em&gt;^N \breve{x}_i \breve{y}&lt;em&gt;i}{\frac{1}{N}\sum&lt;/em&gt;^N \breve{x}_i^2}\&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;= \frac{\frac{1}{N}\sum_{i=1}^N \frac{x_i - \bar{x}}{\sigma_x} \frac{y_i - \bar{y}}{\sigma_y}}{\frac{1}{N}\sum_{i=1}^N \left(\frac{x_i - \bar{x}}{\sigma_x}\right)^2} \\
&amp;= \frac{Cov(x,y)}{\sigma_x \sigma_y} \\
&amp;= Corr(x,y)  &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;\end{align}
$$&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;After we standardize both $y$ and $x$, the slope coefficient $b_1$ in the regression without intercept is equal to the &lt;strong&gt;correlation coefficient&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2 id=&#34;pred-resids&#34;&gt;Predictions and Residuals&lt;/h2&gt;

&lt;p&gt;Now we want to ask how our residuals $e_i$ relate to the prediction $\hat{y_i}$. Let us first think about the average of all predictions &lt;span  class=&#34;math&#34;&gt;\(\hat{y_i}\)&lt;/span&gt;, i.e. the number &lt;span  class=&#34;math&#34;&gt;\(\frac{1}{N} \sum_{i=1}^N \hat{y_i}\)&lt;/span&gt;. Let&#39;s just take&lt;/p&gt;

&lt;p&gt;&lt;span  class=&#34;math&#34;&gt;\[
\begin{equation}
\hat{y}_i = b_0 + b_1 x_i 
\end{equation}
\]&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;and plug this into this average, so that we get&lt;/p&gt;

&lt;p&gt;&lt;span  class=&#34;math&#34;&gt;\[
\begin{align}
\frac{1}{N} \sum_{i=1}^N \hat{y_i} &amp;= \frac{1}{N} \sum_{i=1}^N b_0 + b_1 x_i \\
&amp;= b_0 + b_1  \frac{1}{N} \sum_{i=1}^N x_i \\
&amp;= b_0 + b_1  \bar{x} \\
\end{align}
\]&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;But that last line is just equal to the formula for the OLS intercept  $b_0 = \bar{y} - b_1 \bar{x}$! That means of course that&lt;/p&gt;

&lt;p&gt;&lt;span  class=&#34;math&#34;&gt;\(
\frac{1}{N} \sum_{i=1}^N \hat{y_i}  = b_0 + b_1  \bar{x} = \bar{y}
\)&lt;/span&gt;
in other words:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The average of our predictions $\hat{y_i}$ is identically equal to the mean of the outcome $y$. This implies that the average of the residuals is equal to zero.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Related to this result, we can show that the prediction $\hat{y}$ and the residuals are &lt;em&gt;uncorrelated&lt;/em&gt;, something that is often called &lt;strong&gt;orthogonality&lt;/strong&gt; between $\hat{y}_i$ and $e_i$. We would write this as&lt;/p&gt;

&lt;p&gt;&lt;span  class=&#34;math&#34;&gt;\[
\begin{align}
Cov(\hat{y},e) &amp;=\frac{1}{N} \sum_{i=1}^N (\hat{y}_i-\bar{y})(e_i-\bar{e}) =   \frac{1}{N} \sum_{i=1}^N (\hat{y}_i-\bar{y})e_i \\
&amp;=  \frac{1}{N} \sum_{i=1}^N \hat{y}_i e_i-\bar{y} \frac{1}{N} \sum_{i=1}^N e_i = 0
\end{align}
\]&lt;/span&gt;&lt;/p&gt;

&lt;h2 id=&#34;correlation-covariance-and-linearity&#34;&gt;Correlation, Covariance and Linearity&lt;/h2&gt;

&lt;p&gt;It is important to keep in mind that Correlation and Covariance relate to a &lt;em&gt;linear&lt;/em&gt; relationship between &lt;code&gt;x&lt;/code&gt; and &lt;code&gt;y&lt;/code&gt;. Given how the regression line is estimated by OLS (see just above), you can see that the regression line inherits this property from the Covariance.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Always &lt;strong&gt;visually inspect&lt;/strong&gt; your data, and don&#39;t rely exclusively on summary statistics like &lt;em&gt;mean, variance, correlation and regression line&lt;/em&gt;. All of those assume a &lt;strong&gt;linear&lt;/strong&gt; relationship between the variables in your data.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2 id=&#34;analysing-vary&#34;&gt;Analysing $Var(y)$&lt;/h2&gt;

&lt;p&gt;Analysis of Variance (ANOVA) refers to a method to decompose variation in one variable as a function of several others. We can use this idea on our outcome $y$. Suppose we wanted to know the variance of $y$, keeping in mind that, by definition, $y_i = \hat{y}_i + e_i$. We would write&lt;/p&gt;

&lt;p&gt;&lt;span  class=&#34;math&#34;&gt;\[
\begin{align}Var(y) &amp;= Var(\hat{y} + e)\\ &amp;= Var(\hat{y}) + Var(e) + 2 Cov(\hat{y},e)\\ &amp;= Var(\hat{y}) + Var(e) \end{align}
\]&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;We have seen that the covariance between prediction $\hat{y}$ and error $e$ is zero, that&#39;s why we have $Cov(\hat{y},e)=0$. What this tells us in words is that we can decompose the variance in the observed outcome $y$ into a part that relates to variance as &lt;em&gt;explained by the model&lt;/em&gt; and a part that comes from unexplained variation. Finally, we know the definition of &lt;em&gt;variance&lt;/em&gt;, and can thus write down the respective formulae for each part:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;span  class=&#34;math&#34;&gt;\[Var(y) = \frac{1}{N}\sum_{i=1}^N (y_i - \bar{y})^2\]&lt;/span&gt;&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;span  class=&#34;math&#34;&gt;\(Var(\hat{y}) = \frac{1}{N}\sum_{i=1}^N (\hat{y_i} - \bar{y})^2\)&lt;/span&gt;, because the mean of $\hat{y}$ is $\bar{y}$ as we know.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Finally, &lt;span  class=&#34;math&#34;&gt;\(Var(e) = \frac{1}{N}\sum_{i=1}^N e_i^2\)&lt;/span&gt;, because the mean of $e$ is zero.
We can thus formulate how the total variation in outcome $y$ is apportioned between model and unexplained variation:&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;The total variation in outcome $y$ (often called SST, or &lt;em&gt;total sum of squares&lt;/em&gt;) is equal to the sum of explained squares (SSE) plus the sum of residuals (SSR). We have thus &lt;strong&gt;SST = SSE + SSR&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2 id=&#34;assessing-the-goodness-of-fit&#34;&gt;Assessing the &lt;em&gt;Goodness of Fit&lt;/em&gt;&lt;/h2&gt;

&lt;p&gt;In our setup, there exists a convenient measure for how good a particular statistical model fits the data. It is called $R^2$ (&lt;em&gt;R squared&lt;/em&gt;), also called the &lt;em&gt;coefficient of determination&lt;/em&gt;. We make use of the just introduced decomposition of variance, and write the formula as&lt;/p&gt;

&lt;p&gt;&lt;span  class=&#34;math&#34;&gt;\[
\begin{equation}R^2 = \frac{\text{variance explained}}{\text{total variance}} = \frac{SSE}{SST} = 1 - \frac{SSR}{SST}\in[0,1]  \end{equation}
\]&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;It is easy to see that a &lt;em&gt;good fit&lt;/em&gt; is one where the sum of &lt;em&gt;explained&lt;/em&gt; squares (SSE) is large relative to the total variation (SST). In such a case, we observe an $R^2$ close to one. In the opposite case, we will see an $R^2$ close to zero. Notice that a small $R^2$ does not imply that the model is useless, just that it explains a small fraction of the observed variation.&lt;/p&gt;
&lt;div class=&#34;footnotes&#34;&gt;

&lt;hr&gt;

&lt;ol&gt;
&lt;li id=&#34;fn:fn1&#34;&gt;This slope is related to the angle between vectors $\mathbf{a} =(\overline{x},\overline{y})$, and $\mathbf{b} = (\overline{x},0)$. Hence, it&#39;s related to the &lt;a href=&#34;https://en.wikipedia.org/wiki/Scalar_projection&#34;&gt;scalar projection&lt;/a&gt; of $\mathbf{a}$ on $\mathbf{b}$]
 &lt;a class=&#34;footnote-return&#34; href=&#34;#fnref:fn1&#34;&gt;&lt;sup&gt;^&lt;/sup&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>工具变量</title>
      <link>/post/iv/</link>
      <pubDate>Thu, 04 Jul 2019 00:00:00 +0000</pubDate>
      <guid>/post/iv/</guid>
      <description>&lt;h2 id=&#34;效應評估模型&#34;&gt;效應評估模型&lt;/h2&gt;

&lt;p&gt;&lt;span  class=&#34;math&#34;&gt;\[Y_{i}={Y}_{-p,i}+\beta_i P_{i}\]&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;&lt;span  class=&#34;math&#34;&gt;\[
Y_i=Y_{-P,i}+\beta^* P_i
\]&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;&lt;span  class=&#34;math&#34;&gt;\[
\begin{equation}
Y_i=\beta_0+\beta_1P_i+w_i&#39;\gamma+\varepsilon
\tag{3.2}
\end{equation}
\]&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;在$w_{i}$條件下，「香煙售價」$P_{i}$必需要與「非價格效應的香煙銷售量」$Y_{-P}$獨立，即：&lt;span  class=&#34;math&#34;&gt;\(P_i\perp Y_{-p,i} | w_i\)&lt;/span&gt; 另一個同義說法是：「香煙售價」$P_{i}$必需要與「控制$w_{i}$條件後的非價格效應香煙銷售量」獨立。&lt;/p&gt;

&lt;p&gt;对$Y_{-P}$进行$rincome$下分解
&lt;span  class=&#34;math&#34;&gt;\(
\begin{equation}
Y_{i}=Y_{-P,i}-\mathbb{E}(Y_{-P,i}|rincome_{i})+\beta^{*}P_{i}+\mathbb{E}(Y_{-P,i}|rincome_{i})
\tag{3.3}
\end{equation}
\)&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;把資料依$w_{i}$條件變數不同, 分群觀察「香煙售價」$P_{i}$與「香煙銷售量」$Y_{i}$之間的斜率。如果$w_{i}$變數選得好，同一群資料$P_{i}$與$Y_{i}$間的關連會反映應有的效應斜率——雖然有時$Y_{i}$會因為$Y_{-P,i}$的干擾影響我們對斜率高低的觀察，但因為$Y_{-P,i}$不會與$P_{i}$有關了，這些觀察干擾在大樣本下會互相抵消掉而還原應有的效應斜率值。&lt;/p&gt;

&lt;p&gt;如果不管我們怎麼選擇$w_{i}$還是無法控制住$Y_{-P,i}$對與關連$Y_{i}$的干擾，那我們就要進行【資料轉換】直接從原始資料中【去除這些干擾】，其中最常見的兩種去除法為：工具變數法、追蹤資料固定效果模型。&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;工具變數法：透過工具變數&lt;strong&gt;留下&lt;/strong&gt;$P_{i}$&lt;strong&gt;不與&lt;/strong&gt;$Y_{-P,i}$相關的部份。&lt;/li&gt;
&lt;li&gt;追蹤資料：透過變數轉換&lt;strong&gt;去除&lt;/strong&gt;$P_{i}$中&lt;strong&gt;與&lt;/strong&gt;$Y_{-P,i}$相關的部份。&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;span  class=&#34;math&#34;&gt;\[
Y_i=Y_{-p,i}+\beta\mathbb{E}(P_i|z_i)+\beta (P_i-\mathbb{E}(P_i|z_i))
\]&lt;/span&gt;&lt;/p&gt;

&lt;h3 id=&#34;relevance-condition&#34;&gt;Relevance condition&lt;/h3&gt;

&lt;p&gt;$\mathbb{E}(P|z)\neq 常数$即$z$对$P$具有解释力&lt;/p&gt;

&lt;h3 id=&#34;exclusion-condition&#34;&gt;Exclusion condition&lt;/h3&gt;

&lt;p&gt;&lt;span  class=&#34;math&#34;&gt;\(Y_{-p,i}+\beta(P_i-\mathbb{E}(P_i|z_i))\)&lt;/span&gt;与&lt;span  class=&#34;math&#34;&gt;\(z_{i}\)&lt;/span&gt;无关&lt;/p&gt;

&lt;h2 id=&#34;三个假设&#34;&gt;三个假设&lt;/h2&gt;

&lt;p&gt;&lt;span  class=&#34;math&#34;&gt;\[
\begin{equation}
Y_i=\beta_0+\beta_1 P_i + \gamma_1 rincome_i + \epsilon_i
\tag{3.5}
\end{equation}
\]&lt;/span&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Q1: 我的工具變數有滿足排除條件（或外生條件）嗎?&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;香煙稅是否與控制條件下的「非售價因素銷售」無關？&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;span  class=&#34;math&#34;&gt;\[
Y =\underset{(\times k)}{X}\beta+\underset{(\times p)}{W}\gamma +\epsilon
\]&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;其中$X$為要進行效應評估的變數群，$W$為控制變數群，故$ϵ$為「$W$控制條件下排除$X$效果的Y值」。另外，我們額外找了工具變數: $\underset{\times m)}{Z}$, 要驗證：&lt;/p&gt;

&lt;p&gt;$H_{0}$: 工具變數$Z$與迴歸模型誤差項$ϵ$無關&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;進行TSLS，取得 &lt;span  class=&#34;math&#34;&gt;\( \hat{\epsilon}_{_{TSLS}}=Y-\hat{Y}_{TSLS} \)&lt;/span&gt;.&lt;/li&gt;
&lt;li&gt;將 &lt;span  class=&#34;math&#34;&gt;\( \hat{\epsilon}_{_{TSLS}} \)&lt;/span&gt; 迴歸在總工具變數群（即$Z$與$W$）並進行所有係數為0的聯立檢定，計算檢定量 $J=mF\sim\chi^{2}(m-k)$，其中F係數聯立檢定的F檢定值。&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;此檢定的自由度為$m−k$，所以$m$要&lt;strong&gt;大於&lt;/strong&gt;$k$。“等於”時是無法進行檢定的。&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;Q2: 我的工具變數關聯性夠強嗎？&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;香煙稅真的與「售價」很有關連嗎？&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;工具變數$Z$必需要與效應解釋變數$X$有「足夠強」的關聯，否則&lt;span  class=&#34;math&#34;&gt;\(\hat{\beta}_{_{TSLS}}\)&lt;/span&gt;的大樣本漸近分配不會是常態分配。&lt;/p&gt;

&lt;p&gt;考慮TSLS中的第一階段迴歸模型：$X=Z\alpha_z+W\alpha_w+u$我們希望$\alpha_z$聯立夠顯著。&lt;/p&gt;

&lt;p&gt;檢定原則&lt;/p&gt;

&lt;p&gt;$H_0$:$Z$ 工具變數只有微弱關聯性。&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;$X$迴歸在「總」工具變數群($Z$,$W$)，進行$\alpha_z=0$的聯立F檢定。&lt;/li&gt;
&lt;li&gt;$F&amp;gt;10$拒絕$H_0$。&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Q3: 我對遺漏變數偏誤(OVB)的擔心是否多餘？&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;或許根本沒有必要用工具變數，在&lt;a href=&#34;https://bookdown.org/tpemartin/econometric_analysis/iv.html#eq:ch3-test&#34;&gt;(3.5)&lt;/a&gt;迴歸模型下，PP早已和ϵϵ（即「控制條件下的非售價因素銷售」）無關——直接對&lt;a href=&#34;https://bookdown.org/tpemartin/econometric_analysis/iv.html#eq:ch3-test&#34;&gt;(3.5)&lt;/a&gt;進行最小平方法估計即可。
&lt;span  class=&#34;math&#34;&gt;\(
\begin{equation}
Y   =X\beta+W\gamma +\epsilon
\tag{3.6}
\end{equation}
\)&lt;/span&gt;
$H_0 $: 迴歸模型&lt;a href=&#34;https://bookdown.org/tpemartin/econometric_analysis/iv.html#eq:ch3-model71&#34;&gt;(3.6)&lt;/a&gt;中的$\beta$係數估計「沒有」面臨OVB: 用OLS或TSLS都可以: 在大樣本下，&lt;span  class=&#34;math&#34;&gt;\(\\hat{\beta}_{OLS}\approx\hat{\beta}_{TSLS}\)&lt;/span&gt;。&lt;/p&gt;

&lt;p&gt;$H_1 $: 迴歸模型&lt;a href=&#34;https://bookdown.org/tpemartin/econometric_analysis/iv.html#eq:ch3-model71&#34;&gt;(3.6)&lt;/a&gt;中的$\beta$係數估計「有」面臨OVB: 只能用TSLS :在大樣本下，&lt;span  class=&#34;math&#34;&gt;\(\\hat{\beta}_{OLS}\neq \hat{\beta}_{TSLS}\)&lt;/span&gt;。&lt;/p&gt;

&lt;p&gt;Hausman檢定統計量:
&lt;span  class=&#34;math&#34;&gt;\(
H\equiv\left(\hat{\beta}_{IV}-\hat{\beta}_{OLS}\right)^{&#39;}\left[V(\hat{\beta}_{IV}-\hat{\beta}_{OLS})\right]^{-1}\left(\hat{\beta}_{IV}-\hat{\beta}_{OLS}\right)\sim\chi_{(df)}^{2}.
\)&lt;/span&gt;
– df： $\beta$係數個數.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;當$H&amp;gt;\chi_{(df)}^{2}(\alpha)$才拒絕$H_0$。&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>
    
  </channel>
</rss>
