econometrics-with-r | Little World

Difference in Difference

Wed, 10 Jul 2019 00:00:00 +0000

效應評估模型

“提高最低工資是否會減少就業？”

“最低工資提高是否餐廳的全職員工數會減少？”

假設 $MinWage$為「最低工資有提高」的虛擬變數， $FEmp$為餐廳全職員工數。

\[ FEmp_i=FEmp_{0,i}+\beta^*MinWage_i \]

\[ FEmp_i=\beta_0+\beta_1 MinWage_i+\epsilon_i \]

「沒有受到最低工資提高影響下的員工數」$FEmp_{0,i}$與「有無受到最低工資提高影響」无關时OLS是一致估计。

令 $s$表示餐廳所屬的州，則原本的效應模型可以寫成： $ \begin{eqnarray} FEmp_{is}=FEmp_{0,is}+\beta^*MinWage_{s} \tag{7.1} \end{eqnarray} $

	Pre	Post
Control		$MinWage=1$:PA
Treatment		$MinWage=1$:NJ

複迴歸模型

餐廳的型態（大型連鎖、咖啡店、小吃店等等）會影響員工僱用量。 $ \begin{eqnarray} FEmp_{is} =FEmp_{0,-type,is}+\beta^*MinWage_s+\gamma'type_{is} \tag{7.2} \end{eqnarray} $ 其中 $ FEmp_{0,-type,is}=FEmp_{0,is}-\mathbb{E}(FEmp_{0,is}|type_{is}) $

在思考怱略變數偏誤(omitted variable bias)時，可能的confounder都必需放在（依實驗組/控制組分的）加總層級來思考。

固定效果

組固定效果

\[ FEmp_{is}=FEmp_{0,is}+\beta^*MinWage_{s} \]

多數時候實驗組/控制組在政策還沒施行前，他們就存在組間的特質差異，也就是 $ FEmp_{0,is}=FEmp_{0,-\alpha_s,is}+\alpha_s $ 其中$\alpha_s$ 代表因組而異的confounder效果。

若沒有其他confounder，我們可以估計以下迴歸模型： $ FEmp_{ist}=\alpha_s+\beta^* MinWage_{st}+\epsilon_{ist} $

時間固定效果

\[ FEmp_{ist}=FEmp_{0,-(\alpha_s,\delta_t),ist}+\alpha_s+\delta_t+\beta^*MinWage_{st} \]

所對應的迴歸模型為： $ FEmp_{ist}=\alpha_s+\delta_t+\beta^* MinWage_{st}+\epsilon_{ist} $

資料追踪/不追踪

雖然$FEmp_{ist}$ 有到個別餐廳（即有下標 $i$），然而固定效果只到組層級（即下標 $s$)，因此在估計上我們並不需要追踪同一家餐廳——各期抽樣的餐廳可以不同。

DiD 估计法

\[ \begin{eqnarray} FEmp_{ist}=\alpha_s+\delta_t+\beta^*MinWage_{st}+\epsilon_{ist} \tag{7.3} \end{eqnarray} \]

\[ FEmp_{ist}=\beta_0+\alpha_1D1_s+\delta_1B1_t+\beta_1MinWage_{st}+\epsilon_{ist} \]

令$D1=1$代表來自第1個州（NJ）的虛擬變數。
令$B1 = 1$代表政策施行「後」的虛擬變數。
$MinWage_{st}=D1_s\times B1_t$

State	t=0	T=1
NJ	D1=1,B1=0	D1=1,B1=1
PA	D1=0,B1=0	D1=0,B1=1

cluster standard error

我們有G1-G4共四群誤差項的變異數及跨群間的共變異數需要去留意，當誤差項有聚類（clustering）可能時，必需要適當的調整估計式標準誤。

Panel Data

Wed, 10 Jul 2019 00:00:00 +0000

效應評估模型

\[ mrall=mrall_{-BeerTax}+\beta^*BeerTax \]

提高啤酒稅（BeerTax）是否有助減低車禍死亡率（mrall）？

固定效應模型

令 $W$代表「州愛喝酒程度」。

$W$與 $mrall_{-BeerTax}+$有關
$W$與 $BeerTax$有關

\[ mrall=(mrall_{-BT}-\mathbb{E}(mrall_{-BT}|W))+\mathbb{E}(mrall_{-BT}|W) + \beta^*BeerTax \]

\[ mrall_{-BT,-W}\equiv mrall_{-BT}-\mathbb{E}(mrall_{-BT}|W) \]

\[ mrall=mrall_{-BT,-W}+\mathbb{E}(mrall_{-BT}|W)+\beta^*BeerTax \]

$mrall_{-BT,-W}$為「去除」 $W$影響的「非啤酒稅造成的車禍死亡因素」：

它與 $W$無關。
若兩筆obs有相同飲酒文化，即$W$相同，他們的 $\mathbb{E}(mrall_{-BT}|W)$ 會相同。

「假設」一個地方的飲酒文化「不隨時間改變」，即同一州在不同時點的$W$相同。

令$\mathbb{E}(mrall_{-BT,it}|W_i)=\alpha_i$，故我們的效應模型可以寫成： $ mrall_{it}=mrall_{-BT,-W,it}+\alpha_i+\beta^*BeerTax_{it} $ 其中$\alpha_i$為第 $i$ 個州的固定效果：

$BearTax$與$mrall_{-BT,-W}$無關
$BearTax$與$\alpha$有關

組內差異最小平方法

差分OLS解决$\alpha_i$不可得的阻碍

\[ mrall_{i1}-mrall_{i0}=\beta^* (BeerTax_{i1}-BearTax_{i0})+(mrall_{-BT,-W,i1}-mrall_{-BT,-W,i0}) \]

如果$t$超過兩期，考慮用組內平均為差分比較的點。

即$x_1-\bar{x},x_2-\bar{x},...,x_n-\bar{x}, \bar{x}=\sum_{i=1}^n x_i/n$ $ \bar{mrall}_i=\sum_{t=1}^T mrall_{it}/T \\ \bar{BeerTax}_i=\sum_{t=1}^T BeerTax_{it}/T\\ \bar{mrall}_{-BT,-W,i}=\sum_{t=1}^T mrall_{-BT,-W,it}/T $

\[ mrall_{it}-\bar{mrall}_i=\beta^*\left( BeerTax_{it}-\bar{BeerTax}_i\right)+(mrall_{-BT,-W,it}-\bar{mrall}_{-BT,-W,i}) \]

固定效果模型下，我們可以以最小平方法估計下面的迴歸式： $ mrall_{it}-\bar{mrall}_i=\beta_0+\beta_1\left( BeerTax_{it}-\bar{BeerTax}_i\right)+\epsilon_{it} $ 其中$\hat{\beta}_1$即為$\beta^*$的一致性估計

常見的固定效果模型

Identity fixed effect:$\alpha_i$
Time fixed effect: $\delta_i$

\[ mrall_{-BT,it}=mrall_{-BT,-W_i,-Z_t}+\alpha_i+\delta_t \]

$W_i$為造成效應係數估計偏誤的變數，它在$i$面向固定不變。
$Z_t$為造成效應係數估計偏誤的變數，它在$t$面向固定不變。

如$Z_t$為全美國的景氣狀況。

對應的迴歸模型： $ mrall_{it}=\alpha_i+\delta_t+\beta_1 BeerTax_{it}+\epsilon_{it} $

廣義的固定效果模型

\[ mrall=mrall_{-BeerTax}+\beta^*BeerTax \]

但 $ \begin{equation} mrall_{-BT,it}\not\perp BeerTax_{it} \tag{5.1} \end{equation} $

複迴歸控制

先思考造成(5.1)的變數有哪些——統計上稱這些變數為混淆變數(confounder)。Confounder中有資料的（令為$Z$）可進一步用來擴充模型成為： $ mrall_{it}=mrall_{-BT,-Z,it}+\beta^*BeerTax_{it}+\gamma'Z_{it} $ 其中： $ mrall_{-BT,-Z}=mrall_{-BT}-\mathbb{E}(mrall_{-BT}|Z) $

固定效果模型

Confounder中沒有資料但在某些面向固定的，假設分成以下兩類：

$W_i$：在同個identity下固定。
$V_t$：在同個time下固定。

\[ \begin{eqnarray} mrall_{it}=mrall_{-BT,-(Z,W,V),it}+\beta^*BeerTax_{it}+\\ \alpha_i+\delta_t+\gamma'Z_{it} \tag{5.2} \end{eqnarray} \]

(5.2)是相當廣義的固定效果效應模型——有兩個面向的固定效果及控制變數。

隨機效果模型

\[ mrall_{it}=mrall_{-BT,-Z,it}+\beta^*BeerTax_{it}+\gamma'Z_{it} \]

隨機效果模型(Random Effect model)的設定：

使用迴歸模型：

\[ \begin{eqnarray} mrall_{it}=\beta_0+\beta_{1}BeerTax_{it}+\gamma'Z_{it}+\nu_{it} \tag{5.3} \end{eqnarray} \]

假設$\nu_{it}$ 具有某種結構。

其中假设：

$\nu_{it}\perp BeerTax_{it}$
$var(\alpha_i|X)=\sigma_{\alpha}^2$
$var(\epsilon_{it}|X)=\sigma^2$
$cov(\epsilon_{it},\epsilon_{is}|X)=0$

隨機效果模型帶有高度誤差項假設，故不建議使用。

Hausman檢定

固定效果模型(FE)

表示使用組內差異最小平法方去估算以下迴歸模型中的$\beta_1$: $ mrall_{it}=\beta_0+\beta_{1}BeerTax_{it}+\gamma'Z_{it}+\alpha_i+\epsilon_{it} $

隨機效果模型(RE)

表示使用GLS去估算以下迴歸模型中的$\beta_1$: $ mrall_{it}=\beta_0+\beta_{1}BeerTax_{it}+\gamma'Z_{it}+\nu_{it} $

$\nu_{it}=\alpha_i+\epsilon_{it}$

假設

RE下「關於variance、covariance的假設」都成立。
$\epsilon_{it} \perp BeerTax_{it} | \alpha_i,Z_{it}$

H0: $\alpha_i \perp BeerTax_{it} |Z_{it}$

H0为RE，拒绝则为FE

Linear Regression

Thu, 04 Jul 2019 00:00:00 +0000

OLS estimator

The method to compute (or estimate) $b_0$ and $b_1$ we illustrated above is called Ordinary Least Squares, or OLS. $b_0$ and $b_1$ are therefore also often called the OLS coefficients. By solving problem

\[ \begin{align} e_i & = y_i - \hat{y}_i = y_i - \underbrace{\left(b_0 + b_1 x_i\right)}_\text{prediction}\\ e_1^2 + \dots + e_N^2 &= \sum_{i=1}^N e_i^2 \equiv \text{SSR}(b_0,b_1) \\ (b_0,b_1) &= \arg \min_{\text{int},\text{slope}} \sum_{i=1}^N \left[y_i - \left(\text{int} + \text{slope } x_i\right)\right]^2 \end{align} \]

one can derive an explicit formula for them:

$ \begin{equation} b_1 = \frac{cov(x,y)}{var(x)} \end{equation} $ i.e. the estimate of the slope coefficient is the covariance between $x$ and $y$ divided by the variance of $x$, both computed from our sample of data. With $b_1$ in hand, we can get the estimate for the intercept as

\[\begin{equation} b_0 = \bar{y} - b_1 \bar{x} \end{equation}\]

where $\bar{z}$ denotes the sample mean of variable $z$. The interpretation of the OLS slope coefficient $b_1$ is as follows. Given a line as in $y = b_0 + b_1 x$,

$b_1 = \frac{d y}{d x}$ measures the change in $y$ resulting from a one unit change in $x$
For example, if $y$ is wage and $x$ is years of education, $b_1$ would measure the effect of an additional year of education on wages.

There is an alternative representation for the OLS slope coefficient which relates to the correlation coefficient $r$. Remember that $r = \frac{cov(x,y)}{s_x s_y}$, where $s_z$ is the standard deviation of variable $z$. With this in hand, we can derive the OLS slope coefficient as

$$ \begin{align} b_1 &= \frac{cov(x,y)}{var(x)}\

&= \frac{cov(x,y)}{s_x s_x} \\
&= r\frac{s_y}{s_x} \end{align}

In other words, the slope coefficient is equal to the correlation coefficient $r$ times the ratio of standard deviations of $y$ and $x$.

Linear Regression without Regressor

\[ \begin{equation} y = b_0 \end{equation} \]

This means that our minimization problem becomes very simple: We only have to choose $b_0$! We have

$ b_0 = \arg\min_{\text{int}} \sum_{i=1}^N \left[y_i - \text{int}\right]^2, $ which is a quadratic equation with a unique optimum such that $ b_0 = \frac{1}{N} \sum_{i=1}^N y_i = \overline{y}. $

Least Squares without regressor $x$ estimates the sample mean of the outcome variable $y$, i.e. it produces $\overline{y}$.

Regression without an Intercept

We follow the same logic here, just that we miss another bit from our initial equation and the minimisation problem now becomes: $ \begin{align} b_1 &= \arg\min_{\text{slope}} \sum_{i=1}^N \left[y_i - \text{slope } x_i \right]^2\\ \mapsto b_1 &= \frac{\frac{1}{N}\sum_{i=1}^N x_i y_i}{\frac{1}{N}\sum_{i=1}^N x_i^2} = \frac{\bar{x} \bar{y}}{\overline{x^2}} \end{align} $

Least Squares without intercept (i.e. with $b_0=0$) is a line that passes through the origin.

In this case we only get to choose the slope $b_1$ of this anchored line.¹

Centering A Regression

By centering or demeaning a regression, we mean to substract from both $y$ and $x$ their respective averages to obtain $\tilde{y}_i = y_i - \bar{y}$ and $\tilde{x}_i = x_i - \bar{x}$. We then run a regression without intercept as above. That is, we use $\tilde{x}_i,\tilde{y}_i$ instead of $x_i,y_i$ in

\[ \begin{align} b_1 &= \arg\min_{\text{slope}} \sum_{i=1}^N \left[y_i - \text{slope } x_i \right]^2\\ \mapsto b_1 &= \frac{\frac{1}{N}\sum_{i=1}^N x_i y_i}{\frac{1}{N}\sum_{i=1}^N x_i^2} = \frac{\bar{x} \bar{y}}{\overline{x^2}} \end{align} \]

to obtain our slope estimate $b_1$:

$$ \begin{align} b1 &= \frac{\frac{1}{N}\sum^N \tilde{x}_i \tilde{y}i}{\frac{1}{N}\sum^N \tilde{x}_i^2}\

&= \frac{\frac{1}{N}\sum_{i=1}^N (x_i - \bar{x}) (y_i - \bar{y})}{\frac{1}{N}\sum_{i=1}^N (x_i - \bar{x})^2} \\
&= \frac{cov(x,y)}{var(x)}

\end{align} $$

This last expression is identical to the one in OLS estimate! It's the standard OLS estimate for the slope coefficient. We note the following:

Adding a constant to a regression produces the same result as centering all variables and estimating without intercept. So, unless all variables are centered, always include an intercept in the regression.

Standardizing A Regression

Standardizing a variable $z$ means to demean as above, but in addition to divide the demeaned value by its own standard deviation. Similarly to what we did above for centering, we define transformed variables $\breve{y}_i = \frac{y_i-\bar{y}}{\sigma_y}$ and $\breve{x}_i = \frac{x_i-\bar{x}}{\sigma_x}$ where $\sigma_z$ is the standard deviation of variable $z$. From here on, you should by now be used to what comes next! As above, we use $\breve{x}_i,\breve{y}_i$ instead of $x_i,y_i$:

$$ \begin{align} b1 &= \frac{\frac{1}{N}\sum^N \breve{x}_i \breve{y}i}{\frac{1}{N}\sum^N \breve{x}_i^2}\

&= \frac{\frac{1}{N}\sum_{i=1}^N \frac{x_i - \bar{x}}{\sigma_x} \frac{y_i - \bar{y}}{\sigma_y}}{\frac{1}{N}\sum_{i=1}^N \left(\frac{x_i - \bar{x}}{\sigma_x}\right)^2} \\
&= \frac{Cov(x,y)}{\sigma_x \sigma_y} \\
&= Corr(x,y)

\end{align} $$

After we standardize both $y$ and $x$, the slope coefficient $b_1$ in the regression without intercept is equal to the correlation coefficient.

Predictions and Residuals

Now we want to ask how our residuals $e_i$ relate to the prediction $\hat{y_i}$. Let us first think about the average of all predictions $\hat{y_i}$, i.e. the number $\frac{1}{N} \sum_{i=1}^N \hat{y_i}$. Let's just take

\[ \begin{equation} \hat{y}_i = b_0 + b_1 x_i \end{equation} \]

and plug this into this average, so that we get

\[ \begin{align} \frac{1}{N} \sum_{i=1}^N \hat{y_i} &= \frac{1}{N} \sum_{i=1}^N b_0 + b_1 x_i \\ &= b_0 + b_1 \frac{1}{N} \sum_{i=1}^N x_i \\ &= b_0 + b_1 \bar{x} \\ \end{align} \]

But that last line is just equal to the formula for the OLS intercept $b_0 = \bar{y} - b_1 \bar{x}$! That means of course that

$ \frac{1}{N} \sum_{i=1}^N \hat{y_i} = b_0 + b_1 \bar{x} = \bar{y} $ in other words:

The average of our predictions $\hat{y_i}$ is identically equal to the mean of the outcome $y$. This implies that the average of the residuals is equal to zero.

Related to this result, we can show that the prediction $\hat{y}$ and the residuals are uncorrelated, something that is often called orthogonality between $\hat{y}_i$ and $e_i$. We would write this as

\[ \begin{align} Cov(\hat{y},e) &=\frac{1}{N} \sum_{i=1}^N (\hat{y}_i-\bar{y})(e_i-\bar{e}) = \frac{1}{N} \sum_{i=1}^N (\hat{y}_i-\bar{y})e_i \\ &= \frac{1}{N} \sum_{i=1}^N \hat{y}_i e_i-\bar{y} \frac{1}{N} \sum_{i=1}^N e_i = 0 \end{align} \]

Correlation, Covariance and Linearity

It is important to keep in mind that Correlation and Covariance relate to a linear relationship between x and y. Given how the regression line is estimated by OLS (see just above), you can see that the regression line inherits this property from the Covariance.

Always visually inspect your data, and don't rely exclusively on summary statistics like mean, variance, correlation and regression line. All of those assume a linear relationship between the variables in your data.

Analysing $Var(y)$

Analysis of Variance (ANOVA) refers to a method to decompose variation in one variable as a function of several others. We can use this idea on our outcome $y$. Suppose we wanted to know the variance of $y$, keeping in mind that, by definition, $y_i = \hat{y}_i + e_i$. We would write

\[ \begin{align}Var(y) &= Var(\hat{y} + e)\\ &= Var(\hat{y}) + Var(e) + 2 Cov(\hat{y},e)\\ &= Var(\hat{y}) + Var(e) \end{align} \]

We have seen that the covariance between prediction $\hat{y}$ and error $e$ is zero, that's why we have $Cov(\hat{y},e)=0$. What this tells us in words is that we can decompose the variance in the observed outcome $y$ into a part that relates to variance as explained by the model and a part that comes from unexplained variation. Finally, we know the definition of variance, and can thus write down the respective formulae for each part:

\[Var(y) = \frac{1}{N}\sum_{i=1}^N (y_i - \bar{y})^2\]
$Var(\hat{y}) = \frac{1}{N}\sum_{i=1}^N (\hat{y_i} - \bar{y})^2$, because the mean of $\hat{y}$ is $\bar{y}$ as we know.
Finally, $Var(e) = \frac{1}{N}\sum_{i=1}^N e_i^2$, because the mean of $e$ is zero. We can thus formulate how the total variation in outcome $y$ is apportioned between model and unexplained variation:

The total variation in outcome $y$ (often called SST, or total sum of squares) is equal to the sum of explained squares (SSE) plus the sum of residuals (SSR). We have thus SST = SSE + SSR.

Assessing the Goodness of Fit

In our setup, there exists a convenient measure for how good a particular statistical model fits the data. It is called $R^2$ (R squared), also called the coefficient of determination. We make use of the just introduced decomposition of variance, and write the formula as

\[ \begin{equation}R^2 = \frac{\text{variance explained}}{\text{total variance}} = \frac{SSE}{SST} = 1 - \frac{SSR}{SST}\in[0,1] \end{equation} \]

It is easy to see that a good fit is one where the sum of explained squares (SSE) is large relative to the total variation (SST). In such a case, we observe an $R^2$ close to one. In the opposite case, we will see an $R^2$ close to zero. Notice that a small $R^2$ does not imply that the model is useless, just that it explains a small fraction of the observed variation.

This slope is related to the angle between vectors $\mathbf{a} =(\overline{x},\overline{y})$, and $\mathbf{b} = (\overline{x},0)$. Hence, it's related to the scalar projection of $\mathbf{a}$ on $\mathbf{b}$] ^{^}

工具变量

Thu, 04 Jul 2019 00:00:00 +0000

效應評估模型

\[Y_{i}={Y}_{-p,i}+\beta_i P_{i}\]

\[ Y_i=Y_{-P,i}+\beta^* P_i \]

\[ \begin{equation} Y_i=\beta_0+\beta_1P_i+w_i'\gamma+\varepsilon \tag{3.2} \end{equation} \]

在$w_{i}$條件下，「香煙售價」$P_{i}$必需要與「非價格效應的香煙銷售量」$Y_{-P}$獨立，即：$P_i\perp Y_{-p,i} | w_i$ 另一個同義說法是：「香煙售價」$P_{i}$必需要與「控制$w_{i}$條件後的非價格效應香煙銷售量」獨立。

对$Y_{-P}$进行$rincome$下分解 $ \begin{equation} Y_{i}=Y_{-P,i}-\mathbb{E}(Y_{-P,i}|rincome_{i})+\beta^{*}P_{i}+\mathbb{E}(Y_{-P,i}|rincome_{i}) \tag{3.3} \end{equation} $

把資料依$w_{i}$條件變數不同, 分群觀察「香煙售價」$P_{i}$與「香煙銷售量」$Y_{i}$之間的斜率。如果$w_{i}$變數選得好，同一群資料$P_{i}$與$Y_{i}$間的關連會反映應有的效應斜率——雖然有時$Y_{i}$會因為$Y_{-P,i}$的干擾影響我們對斜率高低的觀察，但因為$Y_{-P,i}$不會與$P_{i}$有關了，這些觀察干擾在大樣本下會互相抵消掉而還原應有的效應斜率值。

如果不管我們怎麼選擇$w_{i}$還是無法控制住$Y_{-P,i}$對與關連$Y_{i}$的干擾，那我們就要進行【資料轉換】直接從原始資料中【去除這些干擾】，其中最常見的兩種去除法為：工具變數法、追蹤資料固定效果模型。

工具變數法：透過工具變數留下$P_{i}$不與$Y_{-P,i}$相關的部份。
追蹤資料：透過變數轉換去除$P_{i}$中與$Y_{-P,i}$相關的部份。

\[ Y_i=Y_{-p,i}+\beta\mathbb{E}(P_i|z_i)+\beta (P_i-\mathbb{E}(P_i|z_i)) \]

Relevance condition

$\mathbb{E}(P|z)\neq 常数$即$z$对$P$具有解释力

Exclusion condition

$Y_{-p,i}+\beta(P_i-\mathbb{E}(P_i|z_i))$与$z_{i}$无关

三个假设

\[ \begin{equation} Y_i=\beta_0+\beta_1 P_i + \gamma_1 rincome_i + \epsilon_i \tag{3.5} \end{equation} \]

Q1: 我的工具變數有滿足排除條件（或外生條件）嗎?

香煙稅是否與控制條件下的「非售價因素銷售」無關？

\[ Y =\underset{(\times k)}{X}\beta+\underset{(\times p)}{W}\gamma +\epsilon \]

其中$X$為要進行效應評估的變數群，$W$為控制變數群，故$ϵ$為「$W$控制條件下排除$X$效果的Y值」。另外，我們額外找了工具變數: $\underset{\times m)}{Z}$, 要驗證：

$H_{0}$: 工具變數$Z$與迴歸模型誤差項$ϵ$無關

進行TSLS，取得 $ \hat{\epsilon}_{_{TSLS}}=Y-\hat{Y}_{TSLS} $.
將 $ \hat{\epsilon}_{_{TSLS}} $ 迴歸在總工具變數群（即$Z$與$W$）並進行所有係數為0的聯立檢定，計算檢定量 $J=mF\sim\chi^{2}(m-k)$，其中F係數聯立檢定的F檢定值。

此檢定的自由度為$m−k$，所以$m$要大於$k$。“等於”時是無法進行檢定的。

Q2: 我的工具變數關聯性夠強嗎？

香煙稅真的與「售價」很有關連嗎？

工具變數$Z$必需要與效應解釋變數$X$有「足夠強」的關聯，否則$\hat{\beta}_{_{TSLS}}$的大樣本漸近分配不會是常態分配。

考慮TSLS中的第一階段迴歸模型：$X=Z\alpha_z+W\alpha_w+u$我們希望$\alpha_z$聯立夠顯著。

檢定原則

$H_0$:$Z$ 工具變數只有微弱關聯性。

$X$迴歸在「總」工具變數群($Z$,$W$)，進行$\alpha_z=0$的聯立F檢定。
$F>10$拒絕$H_0$。

Q3: 我對遺漏變數偏誤(OVB)的擔心是否多餘？

或許根本沒有必要用工具變數，在(3.5)迴歸模型下，PP早已和ϵϵ（即「控制條件下的非售價因素銷售」）無關——直接對(3.5)進行最小平方法估計即可。 $ \begin{equation} Y =X\beta+W\gamma +\epsilon \tag{3.6} \end{equation} $ $H_0 $: 迴歸模型(3.6)中的$\beta$係數估計「沒有」面臨OVB: 用OLS或TSLS都可以: 在大樣本下，$\\hat{\beta}_{OLS}\approx\hat{\beta}_{TSLS}$。

$H_1 $: 迴歸模型(3.6)中的$\beta$係數估計「有」面臨OVB: 只能用TSLS :在大樣本下，$\\hat{\beta}_{OLS}\neq \hat{\beta}_{TSLS}$。

Hausman檢定統計量: $ H\equiv\left(\hat{\beta}_{IV}-\hat{\beta}_{OLS}\right)^{'}\left[V(\hat{\beta}_{IV}-\hat{\beta}_{OLS})\right]^{-1}\left(\hat{\beta}_{IV}-\hat{\beta}_{OLS}\right)\sim\chi_{(df)}^{2}. $ – df： $\beta$係數個數.

當$H>\chi_{(df)}^{2}(\alpha)$才拒絕$H_0$。