這是用戶在 2025-4-30 5:46 為 https://app.immersivetranslate.com/pdf-pro/efd1f165-aa57-4d67-afda-0ca071dcea7b/ 保存的雙語快照頁面,由 沉浸式翻譯 提供雙語支持。了解如何保存?

Categorical and Limited Dependent Variable Models
類別與受限依變數模型

Patrick T. Brandt派翠克·T·勃蘭特University of Texas, Dallas
德克薩斯大學達拉斯分校

2025-04-29
Theory理論

Maximum Likelihood Principle
最大似然原理

Today we will work with the basics of the likelhood principle for estimation.
今天我們將探討估計中的似然原理基礎。
  • Estimation methods for ML
    最大似然估計方法
  • Properties of ML methods
    最大似然方法的性質
  • Generalized linear models as a mix of OLS and ML methods.
    廣義線性模型作為 OLS 與 ML 方法的混合體。

ML Estimation: Likelihood function
最大似然估計:似然函數

Sir Ronald A. Fisher (for whom the F distribution is named) recast the problem of finding the parameters from the data by asking:
以 F 分布命名的羅納德·A·費雪爵士重新構建了從數據中尋找參數的問題,他問道:

“What values of the unknown density / distribution parameters would be most likely to produce the observed sample?”
「未知密度/分布參數的哪些值最有可能產生觀察到的樣本?」
To answer this question, we first need to define the likelihood function.
要回答這個問題,我們首先需要定義似然函數。

Let y 1 , y 2 , , y n y 1 , y 2 , , y n y_(1),y_(2),dots,y_(n)y_{1}, y_{2}, \ldots, y_{n} be a random sample from the density function f ( y θ ) f ( y θ ) f(y∣theta)f(y \mid \theta). If L ( ) L ( ) L(*)L(\cdot) is the joint probability density of the y i s y i s y_(i)^(')sy_{i}^{\prime} s is a function of θ θ theta\theta, the likelihood function is defined as
y 1 , y 2 , , y n y 1 , y 2 , , y n y_(1),y_(2),dots,y_(n)y_{1}, y_{2}, \ldots, y_{n} 為從密度函數 f ( y θ ) f ( y θ ) f(y∣theta)f(y \mid \theta) 中抽取的隨機樣本。若 L ( ) L ( ) L(*)L(\cdot) y i s y i s y_(i)^(')sy_{i}^{\prime} s 的聯合概率密度函數,且為 θ θ theta\theta 的函數,則似然函數定義為
L ( θ y ) = i = 1 n f ( y i θ ) L ( θ y ) = i = 1 n f y i θ L(theta∣y)=prod_(i=1)^(n)f(y_(i)∣theta)L(\theta \mid y)=\prod_{i=1}^{n} f\left(y_{i} \mid \theta\right)
Note we can get the above from Bayes Theorem!
注意,我們可以從貝葉斯定理推導出上述結果!

More comments更多評論

Notice that this is not the same thing as saying that the joint probability density of the y i y i y_(i)y_{i} is the likelihood function.
請注意,這與說 y i y i y_(i)y_{i} 的聯合概率密度就是似然函數並不相同。

The joint probability density of the y i y i y_(i)y_{i} is only proportionate to the likelihood function, or
y i y i y_(i)y_{i} 的聯合概率密度僅與似然函數成比例,或者
P ( y 1 , y 2 , y n θ ) L ( θ y ) P y 1 , y 2 , y n θ L ( θ y ) P(y_(1),y_(2),dotsy_(n)∣theta)prop L(theta∣y)P\left(y_{1}, y_{2}, \ldots y_{n} \mid \theta\right) \propto L(\theta \mid y)
With a likelihood function, parameters are a function of the observed data. The rationale of the likelihood function is to recover the parameters that make the function as “large as possible.”
在似然函數中,參數是觀測數據的函數。似然函數的基本原理是恢復能使函數「盡可能大」的參數。

MLE definition最大似然估計定義

Maximum likelihood estimation: Assume the sample sample and specifications as in the earlier definition of a likelihood function. Then the value of θ ^ θ ^ hat(theta)\hat{\theta} that maximizes L ( θ y ) L ( θ y ) L(theta∣y)L(\theta \mid y) such that
最大似然估計:假設樣本與先前定義似然函數時的設定相同。那麼能使 L ( θ y ) L ( θ y ) L(theta∣y)L(\theta \mid y) 最大化的 θ ^ θ ^ hat(theta)\hat{\theta} 值即稱為
L ( θ ^ y ) L ( θ ) θ θ ^ L ( θ ^ y ) L ( θ ) θ θ ^ L( hat(theta)∣y) >= L(theta)quad AA theta!= hat(theta)L(\hat{\theta} \mid y) \geq L(\theta) \quad \forall \theta \neq \hat{\theta}
is said to be the maximum likelihood estimator (MLE) for θ θ theta\theta.
該參數 θ θ theta\theta 的最大似然估計量(MLE)。

Often we will maximize log ( L ( θ ) ) log ( L ( θ ) ) log(L(theta))\log (L(\theta)) rather than L ( θ ) L ( θ ) L(theta)L(\theta) directly. This is because the natural logarithm is a monotonic transformation of th function L L LL so the same value will maximize ln L ( θ ) ln L ( θ ) ln L(theta)\ln L(\theta) and L ( θ ) L ( θ ) L(theta)L(\theta).
通常我們會最大化 log ( L ( θ ) ) log ( L ( θ ) ) log(L(theta))\log (L(\theta)) 而非直接處理 L ( θ ) L ( θ ) L(theta)L(\theta) 。這是因為自然對數是函數 L L LL 的單調轉換,因此相同數值將同時最大化 ln L ( θ ) ln L ( θ ) ln L(theta)\ln L(\theta) L ( θ ) L ( θ ) L(theta)L(\theta)
Constructing ML estimators
構建最大似然估計量

Steps in Maximum Likelihood Estimation
最大似然估計的步驟

  1. Select a conditional probability distribution for the data, f ( y θ ) f ( y θ ) f(y∣theta)f(y \mid \theta).
    為數據選擇一個條件概率分佈, f ( y θ ) f ( y θ ) f(y∣theta)f(y \mid \theta)
  2. Construct the (log-) likelihood function, log ( f ( y θ ) ) log ( f ( y θ ) ) log(f(y∣theta))\log (f(y \mid \theta)).
    構建(對數)似然函數, log ( f ( y θ ) ) log ( f ( y θ ) ) log(f(y∣theta))\log (f(y \mid \theta))
  3. Find the parameters that maximize the likelihood function.
    找到使似然函數最大化的參數。

Numerical optimization數值優化

  • Goal is to compute θ θ theta\theta that maximizes L ( θ y ) = i = 1 n f ( y θ ) L ( θ y ) = i = 1 n f ( y θ ) L(theta∣y)=prod_(i=1)^(n)f(y∣theta)L(\theta \mid y)=\prod_{i=1}^{n} f(y \mid \theta).
    目標是計算出使 L ( θ y ) = i = 1 n f ( y θ ) L ( θ y ) = i = 1 n f ( y θ ) L(theta∣y)=prod_(i=1)^(n)f(y∣theta)L(\theta \mid y)=\prod_{i=1}^{n} f(y \mid \theta) 最大化的 θ θ theta\theta
  • The MLE is the θ ^ θ ^ hat(theta)\hat{\theta} that maximizes this function. It is then the value where
    最大似然估計(MLE)就是使此函數最大化的 θ ^ θ ^ hat(theta)\hat{\theta} 。因此,它是在該值處
L ( θ ^ y ) L ( θ y ) θ Θ L ( θ ^ y ) L ( θ y ) θ Θ L( hat(theta)∣y) >= L(theta∣y)AA theta in ThetaL(\hat{\theta} \mid y) \geq L(\theta \mid y) \forall \theta \in \Theta
where Θ Θ Theta\Theta is an admissible set.
其中 Θ Θ Theta\Theta 是一個可容許集。
  • Define ( θ y ) = log ( L ( θ y ) ) ( θ y ) = log ( L ( θ y ) ) ℓ(theta∣y)=log(L(theta∣y))\ell(\theta \mid y)=\log (L(\theta \mid y)) as the log-likelihood function.
    ( θ y ) = log ( L ( θ y ) ) ( θ y ) = log ( L ( θ y ) ) ℓ(theta∣y)=log(L(theta∣y))\ell(\theta \mid y)=\log (L(\theta \mid y)) 定義為對數似然函數。
  • The score function, or first derivative of \ell is
    得分函數,即 \ell 的一階導數是
˙ = θ ( θ y ) ˙ = θ ( θ y ) ℓ^(˙)=(del)/(del theta)ℓ(theta∣y)\dot{\ell}=\frac{\partial}{\partial \theta} \ell(\theta \mid y)
  • Setting this to zero will produce the optimum of the MLE.
    將其設為零將產生最大似然估計的最優解。

How is this done in the computer?
這在計算機中是如何實現的?

  • Computers do not do calculus well.
    電腦並不擅長處理微積分。
  • Instead, the computer tries candidate values θ j θ j theta^(j)\theta^{j} :
    相反地,電腦會嘗試候選值 θ j θ j theta^(j)\theta^{j}
θ j + 1 = θ j θ j ( θ j y ) ( 2 θ θ ( θ j y ) ) 1 θ j + 1 = θ j θ j θ j y 2 θ θ θ j y 1 theta^(j+1)=theta^(j)-(del)/(deltheta^(j))ℓ(theta^(j)∣y)((del^(2))/(del theta del theta)ℓ(theta^(j)∣y))^(-1)\theta^{j+1}=\theta^{j}-\frac{\partial}{\partial \theta^{j}} \ell\left(\theta^{j} \mid y\right)\left(\frac{\partial^{2}}{\partial \theta \partial \theta} \ell\left(\theta^{j} \mid y\right)\right)^{-1}
  • Continue these iterations until either θ j + 1 θ j < ϵ θ j + 1 θ j < ϵ theta^(j+1)-theta^(j) < epsilon\theta^{j+1}-\theta^{j}<\epsilon or ( θ j + 1 y ) ( θ j y ) < ϵ θ j + 1 y θ j y < ϵ ℓ(theta^(j+1)∣y)-ℓ(theta^(j)∣y) < epsilon\ell\left(\theta^{j+1} \mid y\right)-\ell\left(\theta^{j} \mid y\right)<\epsilon.
    持續這些迭代,直到滿足 θ j + 1 θ j < ϵ θ j + 1 θ j < ϵ theta^(j+1)-theta^(j) < epsilon\theta^{j+1}-\theta^{j}<\epsilon ( θ j + 1 y ) ( θ j y ) < ϵ θ j + 1 y θ j y < ϵ ℓ(theta^(j+1)∣y)-ℓ(theta^(j)∣y) < epsilon\ell\left(\theta^{j+1} \mid y\right)-\ell\left(\theta^{j} \mid y\right)<\epsilon 的條件。
  • This, and its variants, are versions of Newton-Raphson optimization.
    這個方法及其變體,都是牛頓-拉弗森優化法的版本。

MLE Properties最大似然估計的性質

The following properties hold for L ( θ Y ) L ( θ Y ) L(theta∣Y)L(\theta \mid Y) (under suitable “regularity” conditions, and as n n n rarr oon \rightarrow \infty ):
在適當的「正則性」條件下,且當 n n n rarr oon \rightarrow \infty 時,以下性質適用於 L ( θ Y ) L ( θ Y ) L(theta∣Y)L(\theta \mid Y)
  • Normality: Estimated parameters ( θ ^ ) ( θ ^ ) ( hat(theta))(\hat{\theta}) are normal: θ ^ N ( θ , V [ θ ] ) θ ^ N ( θ , V [ θ ] ) hat(theta)∼N(theta,V[theta])\hat{\theta} \sim N(\theta, V[\theta]).
    常態性:估計參數 ( θ ^ ) ( θ ^ ) ( hat(theta))(\hat{\theta}) 服從常態分佈: θ ^ N ( θ , V [ θ ] ) θ ^ N ( θ , V [ θ ] ) hat(theta)∼N(theta,V[theta])\hat{\theta} \sim N(\theta, V[\theta])
  • Efficiency / Minimum Variance: The estimated parameters have minimum variance mong admmissible estimators, the Cramer-Rao lower bound for the variance.
    效率性/最小變異性:估計參數在所有可容許估計量中具有最小變異數,達到 Cramer-Rao 方差下界。
  • Consistency: As the number of observations tends to infinity, the MLE θ ^ θ ^ hat(theta)\hat{\theta} converges in probability to the population values. That is, plim θ ^ θ plim θ ^ θ plim hat(theta)rarr theta\operatorname{plim} \hat{\theta} \rightarrow \theta.
    一致性:當觀察次數趨近於無限時,最大似然估計量 θ ^ θ ^ hat(theta)\hat{\theta} 會以概率收斂至總體參數值。也就是說, plim θ ^ θ plim θ ^ θ plim hat(theta)rarr theta\operatorname{plim} \hat{\theta} \rightarrow \theta
  • Sufficiency: The MLE completely characterizes the observed data. That is, there is no other parameter γ γ gamma\gamma that should be included in the model to characterize the distribution of the data.
    充分性:最大似然估計量能完整刻畫觀察數據。也就是說,不存在其他參數 γ γ gamma\gamma 需要納入模型以刻畫數據的分佈。
  • Asymptotic Unbiasedness: The probability that the expected values of the estimated parameters equal the population values is one as the sample size goes to infinity. That is, plim E [ θ ^ ] θ E [ θ ^ ] θ E[ hat(theta)]rarr thetaE[\hat{\theta}] \rightarrow \theta.
    漸近無偏性:當樣本量趨近於無限時,估計參數的期望值等於總體參數值的概率為一。即 plim E [ θ ^ ] θ E [ θ ^ ] θ E[ hat(theta)]rarr thetaE[\hat{\theta}] \rightarrow \theta

THESE PROPERTIES ONLY HOLD IF THE MODEL IS SPECIFIED CORRECTLY!
上述性質僅在模型設定正確時成立!

More properties更多性質

  • For the ML estimators, these properties are all asymptotic. This means that they only hold for large samples. In fact, in most cases the MLE estimators are biased and inefficient in small samples.
    對於最大似然估計量而言,這些屬性都是漸近的。這意味著它們僅適用於大樣本。事實上,在多數情況下,最大似然估計量在小樣本中是有偏且低效的。
  • These properties hold for “correctly specified models.” In many cases if your model is incorrect some or all of these properties will be invalidated. See Gourieoux, Monfort and Trognon (1984a and 1984b) and White (1994) for when the model is misspecified, estimator can often be referred to as a quasi- or pseudo-MLE which may have some desirable properties.
    這些屬性適用於「正確設定的模型」。在許多情況下,如果模型設定不當,部分或全部這些屬性將失效。關於模型設定錯誤時的情況,可參見 Gourieoux、Monfort 和 Trognon(1984a 與 1984b)以及 White(1994)的研究,此時估計量常被稱為擬或偽最大似然估計量,可能仍具有某些理想屬性。

An important non-asymptotic result
一個重要的非漸近結果

MLE has one other useful (and non-statistical property). This is referred to as invariance:
最大概似估計(MLE)還有一個其他有用(且非統計學上的)特性,這被稱為不變性:

θ ^ θ ^ hat(theta)\hat{\theta} is the ML estimate of θ θ theta\theta, and if g ( θ ) g ( θ ) g(theta)g(\theta) is a continuous function of θ θ theta\theta, the MLE of g ( θ ) g ( θ ) g(theta)g(\theta) is g ( θ ^ ) g ( θ ^ ) g( hat(theta))g(\hat{\theta}).
θ ^ θ ^ hat(theta)\hat{\theta} θ θ theta\theta 的最大概似估計量,且如果 g ( θ ) g ( θ ) g(theta)g(\theta) θ θ theta\theta 的連續函數,那麼 g ( θ ) g ( θ ) g(theta)g(\theta) 的最大概似估計量就是 g ( θ ^ ) g ( θ ^ ) g( hat(theta))g(\hat{\theta})
The real benefit of this final property is that we can reparameterize the likelihood (such as taking logarithms) and simplify the estimation.
這個最終特性的真正好處在於我們可以重新參數化概似函數(例如取對數)並簡化估計過程。

Computing coefficients and their variances
計算係數及其變異數

Here we present a generic ML model and then derive its estimator and associated statistics.
在此我們提出一個通用的機器學習模型,並推導出其估計量及相關統計量。
Suppose that we have observations x x xx from the following density that is characterized by the parameter θ θ theta\theta :
假設我們有以下由參數 θ θ theta\theta 所表徵的密度函數中的觀測值 x x xx
Pr ( x i = X ) = f ( x i ; θ ) where X f ( x i ; θ ) = 1 Pr x i = X = f x i ; θ  where  X f x i ; θ = 1 Pr(x_(i)=X)=f(x_(i);theta)quad" where "quadint_(X)f(x_(i);theta)=1\operatorname{Pr}\left(x_{i}=X\right)=f\left(x_{i} ; \theta\right) \quad \text { where } \quad \int_{X} f\left(x_{i} ; \theta\right)=1
The likelihood function is
似然函數為
L ( x ; θ ) = i n f ( x i ; θ ) L ( x ; θ ) = i n f x i ; θ L(x;theta)=prod_(i)^(n)f(x_(i);theta)L(x ; \theta)=\prod_{i}^{n} f\left(x_{i} ; \theta\right)

Log Likelihood defined對數概似定義

ln L ( x ; θ ) = i = 1 n ln f ( x i ; θ ) . ln L ( x ; θ ) = i = 1 n ln f x i ; θ . ln L(x;theta)=sum_(i=1)^(n)ln f(x_(i);theta).\ln L(x ; \theta)=\sum_{i=1}^{n} \ln f\left(x_{i} ; \theta\right) .
The log-likelihood is the sum of the natural logarithm of the probability of each observation under the specified density.
對數似然函數是指在指定密度下,各觀測值概率的自然對數總和。

Log likeihood optimization
對數概似優化

The ML estimates are computed by optimizing the log-likelihood function.
最大似然估計是通過優化對數似然函數計算得出的。

The first and second derivatives are:
一階與二階導數分別為:
ln L ( x ; θ ) θ = i = 1 n ln f ( x i ; θ ) θ 2 ln L ( x ; θ ) θ θ = 2 i = 1 n ln f ( x i ; θ ) θ θ = H ( θ ) ln L ( x ; θ ) θ = i = 1 n ln f x i ; θ θ 2 ln L ( x ; θ ) θ θ = 2 i = 1 n ln f x i ; θ θ θ = H ( θ ) {:[(del ln L(x;theta))/(del theta)=(delsum_(i=1)^(n)ln f(x_(i);theta))/(del theta)],[(del^(2)ln L(x;theta))/(del theta deltheta^('))=(del^(2)sum_(i=1)^(n)ln f(x_(i);theta))/(del theta deltheta^('))],[=H(theta)]:}\begin{aligned} \frac{\partial \ln L(x ; \theta)}{\partial \theta} & =\frac{\partial \sum_{i=1}^{n} \ln f\left(x_{i} ; \theta\right)}{\partial \theta} \\ \frac{\partial^{2} \ln L(x ; \theta)}{\partial \theta \partial \theta^{\prime}} & =\frac{\partial^{2} \sum_{i=1}^{n} \ln f\left(x_{i} ; \theta\right)}{\partial \theta \partial \theta^{\prime}} \\ & =H(\theta) \end{aligned}
Note that if θ θ theta\theta is a vector with k k kk elements, then the first derivatives are a vector of k k kk elements and the second derivatives are a k × k k × k k xx kk \times k matrix of second partial derivatives.
請注意,如果 θ θ theta\theta 是一個具有 k k kk 個元素的向量,那麼一階導數是一個由 k k kk 個元素組成的向量,而二階導數則是一個由二階偏導數構成的 k × k k × k k xx kk \times k 矩陣。

Information matrix信息矩陣

The matrix of second derivatives is also referred to the Hessian matrix. The negative of the expected value of this matrix, I ( θ ) I ( θ ) I(theta)I(\theta) is known as the information matrix:
二階導數矩陣也被稱為海森矩陣。該矩陣期望值的負值,即 I ( θ ) I ( θ ) I(theta)I(\theta) ,被稱為信息矩陣:
Information Matrix : I ( θ ) = E [ H ( θ ) ]  Information Matrix :  I ( θ ) = E [ H ( θ ) ] " Information Matrix : "quad I(theta)=-E[H(theta)]\text { Information Matrix : } \quad I(\theta)=-E[H(\theta)]
The inverse of the information matrix gives the variance-covariance matrix for the ML parameters:
信息矩陣的逆矩陣給出了最大似然參數的方差-協方差矩陣:
Variance : Var ( θ ^ ) = E [ H ( θ ) ] 1  Variance :  Var ( θ ^ ) = E [ H ( θ ) ] 1 " Variance : "quad Var( hat(theta))=-E[H(theta)]^(-1)\text { Variance : } \quad \operatorname{Var}(\hat{\theta})=-E[H(\theta)]^{-1}
This matrix is often approximated in a variety of ways:
這個矩陣常以多種方式進行近似:

These approximations are all asymptotically equivalent to the information matrix and are often much easier to compute.
這些近似方法在漸近意義上都與信息矩陣等價,且通常更易於計算。
Generalized Linear Models (GLM)
廣義線性模型 (GLM)

GLMs廣義線性模型

  • For many maximum likelihood problems, the solution that maximizes the likelihood can be found by a repeated application of generalized least squares.
    對於許多最大概似問題,可以通過反覆應用廣義最小平方法來找到使概似最大化的解。
  • This is a generalized linear model and works in many cases where the dependent variable follow exponential family distributions (other than a normal).
    這是一個廣義線性模型,適用於許多情況下依變量遵循指數族分佈(非正態分佈)的情形。
  • This set of results was established by Nelder and Wedderburn (1972) with important generalizations later by Jorgensen (1987)
    這組結果由 Nelder 和 Wedderburn(1972 年)確立,後由 Jorgensen(1987 年)進行了重要推廣。

Basic Notation基本符號

Let f ( z ζ ) f ( z ζ ) f(z∣zeta)f(z \mid \zeta) be the conditional pdf (pmf) of interest. If it has the form
f ( z ζ ) f ( z ζ ) f(z∣zeta)f(z \mid \zeta) 為感興趣的條件概率密度函數(或概率質量函數)。若其形式為
f ( z ζ ) = exp [ t ( z ) u ( ζ ) ] r ( z ) s ( ζ ) f ( z ζ ) = exp [ t ( z ) u ( ζ ) ] r ( z ) s ( ζ ) f(z∣zeta)=exp[t(z)u(zeta)]r(z)s(zeta)f(z \mid \zeta)=\exp [t(z) u(\zeta)] r(z) s(\zeta)
where r r rr and t t tt are real-valued function of only z z zz and u u uu and s s ss are real-valued functions of only ζ ζ zeta\zeta, and r ( z ) > 0 , s ( ζ ) > 0 , z , ζ r ( z ) > 0 , s ( ζ ) > 0 , z , ζ r(z) > 0,s(zeta) > 0,AA z,zetar(z)>0, s(\zeta)>0, \forall z, \zeta, then it is in the exponential family.
其中 r r rr t t tt 是僅關於 z z zz u u uu 的實值函數,而 s s ss 是僅關於 ζ ζ zeta\zeta r ( z ) > 0 , s ( ζ ) > 0 , z , ζ r ( z ) > 0 , s ( ζ ) > 0 , z , ζ r(z) > 0,s(zeta) > 0,AA z,zetar(z)>0, s(\zeta)>0, \forall z, \zeta 的實值函數,則該分佈屬於指數族。

Canonical form標準形式

  • Canonical form is a 1-1 transformation that reduces the complexity of the model.
    標準形式是一種一對一變換,能降低模型的複雜度。
  • If t ( z ) = z t ( z ) = z t(z)=zt(z)=z, the model is in canonical form for z z zz. If not, redefine y = t ( z ) y = t ( z ) y=t(z)y=t(z) to do so.
    t ( z ) = z t ( z ) = z t(z)=zt(z)=z ,則該模型對於 z z zz 呈現標準形式。若非如此,則重新定義 y = t ( z ) y = t ( z ) y=t(z)y=t(z) 以達成之。
  • If u ( ζ ) = ζ u ( ζ ) = ζ u(zeta)=zetau(\zeta)=\zeta, the model is in canonical form for ζ ζ zeta\zeta. If not, force the canonical form by defining θ = u ( ζ ) θ = u ( ζ ) theta=u(zeta)\theta=u(\zeta).
    u ( ζ ) = ζ u ( ζ ) = ζ u(zeta)=zetau(\zeta)=\zeta ,則該模型對於 ζ ζ zeta\zeta 呈現標準形式。若非如此,則透過定義 θ = u ( ζ ) θ = u ( ζ ) theta=u(zeta)\theta=u(\zeta) 強制轉換為標準形式。
  • The final canonical form will be f ( y θ ) = exp ( y θ b ( θ ) + c ( y ) ) f ( y θ ) = exp ( y θ b ( θ ) + c ( y ) ) f(y∣theta)=exp(y theta-b(theta)+c(y))f(y \mid \theta)=\exp (y \theta-b(\theta)+c(y))
    最終的標準形式將是 f ( y θ ) = exp ( y θ b ( θ ) + c ( y ) ) f ( y θ ) = exp ( y θ b ( θ ) + c ( y ) ) f(y∣theta)=exp(y theta-b(theta)+c(y))f(y \mid \theta)=\exp (y \theta-b(\theta)+c(y))

Why do this?為何要這麼做?

  • Defines a family of models that can be estimated by the same methods.
    定義了一組可以用相同方法估計的模型家族。
  • Simplifies a class of ML models based on pdfs and pmfs in the exponential family.
    簡化了基於指數族概率密度函數(pdfs)和概率質量函數(pmfs)的一類機器學習模型。
  • This is common group of models: normal, Poisson, negative binomial, logit, probit, gamma, etc.
    這是常見的模型群組:常態分佈、泊松分佈、負二項分佈、邏輯迴歸、概率單位迴歸、伽瑪分佈等。
  • Score functions for the MLE of GLMs have a simple form:
    廣義線性模型(GLM)的最大似然估計(MLE)的得分函數具有簡單形式:
i = 1 n t ( x i ) = n θ E [ x ] i = 1 n t x i = n θ E [ x ] sum_(i=1)^(n)t(x_(i))=n(del)/(del theta)E[x]\sum_{i=1}^{n} t\left(x_{i}\right)=n \frac{\partial}{\partial \theta} E[x]
  • Standard ML properties apply to the estimates.
    標準最大似然估計性質適用於這些估計值。
  • Start with a standard linear model
    從標準線性模型開始
V = X β + ϵ , E ( V ) = θ = X β V = X β + ϵ , E ( V ) = θ = X β V=X beta+epsilon,quad E(V)=theta=X betaV=X \beta+\epsilon, \quad E(V)=\theta=X \beta
V V VV is just a linear representation of the right hand side variables, not the dependent variable.
V V VV 僅是右側變量的線性表示,而非因變量。
  • What we want is some mapping from V V VV to the mean of the outcome of interest, or E [ y ] = μ : E [ y ] = μ : E[y]=mu:E[y]=\mu:
    我們需要的是某種從 V V VV 映射到感興趣結果的平均值,即 E [ y ] = μ : E [ y ] = μ : E[y]=mu:E[y]=\mu: 的關係
g ( μ ) = θ = X β g ( μ ) = θ = X β g(mu)=theta=X betag(\mu)=\theta=X \beta
  • The link function g ( ) g ( ) g()g() connects the structure of the explanatory variables and their coefficients to the linear predictor g ( μ ) = θ g ( μ ) = θ g(mu)=thetag(\mu)=\theta.
    連結函數 g ( ) g ( ) g()g() 將解釋變量及其係數的結構與線性預測變量 g ( μ ) = θ g ( μ ) = θ g(mu)=thetag(\mu)=\theta 連接起來
  • Gill (2007: 542) `'The link function connects the stochastic component that describes some response variable from a wide variety of forms to all the standard normal theory supporting the systematic component through the mean function"
    吉爾(2007: 542)所述:「連結函數通過均值函數,將描述某種反應變量的隨機成分與支持系統性成分的所有標準正態理論連接起來」
  • Huh?什麼?
g ( μ ) = θ = X β g 1 ( g ( μ ) ) == g 1 ( θ ) = g 1 ( X β ) = μ = E [ y ] g ( μ ) = θ = X β g 1 ( g ( μ ) ) == g 1 ( θ ) = g 1 ( X β ) = μ = E [ y ] {:[g(mu)=theta=X beta],[g^(-1)(g(mu))==g^(-1)(theta)=g^(-1)(X beta)=mu=E[y]]:}\begin{aligned} g(\mu) & =\theta=X \beta \\ g^{-1}(g(\mu)) & ==g^{-1}(\theta)=g^{-1}(X \beta)=\mu=E[y] \end{aligned}
  • Ah! So it ties together the non-normal response to a linear regression model.
    啊!原來它是將非正態響應與線性回歸模型聯繫起來的。

Components of GLMs廣義線性模型的組成部分

  1. Stochastic component: y y yy is the random of stochastic component that is iid with mean μ μ mu\mu.
    隨機成分: y y yy 是獨立同分布且均值為 μ μ mu\mu 的隨機或隨機成分。
  2. Systematic component: θ = X β θ = X β theta=X beta\theta=X \beta with Gauss-Markov basis.
    系統性成分:以高斯-馬可夫基為基礎的 θ = X β θ = X β theta=X beta\theta=X \beta
  3. Link function: stochastic and systematic parts of the model are linked by a function of θ θ theta\theta, which is the canonical link that lets us “trick” a linear model.
    連結函數:模型的隨機部分與系統性部分通過 θ θ theta\theta 的函數相連結,這是讓我們能夠「欺騙」線性模型的典型連結。
  4. Residuals: can construct deviance or weighted residuals to evaluate the model.
    殘差:可以構建偏差或加權殘差來評估模型。
Distribution分佈 Link連結 Inverse反函數
θ = g ( μ ) θ = g ( μ ) theta=g(mu)\theta=g(\mu) μ = g 1 ( θ ) μ = g 1 ( θ ) mu=g^(-1)(theta)\mu=g^{-1}(\theta)
Poisson泊松 log ( μ ) log ( μ ) log(mu)\log (\mu) exp ( θ ) exp ( θ ) exp(theta)\exp (\theta)
Normal常態 μ μ mu\mu θ θ theta\theta
Gamma伽瑪 1 μ 1 μ -(1)/(mu)-\frac{1}{\mu} 1 θ 1 θ -(1)/(theta)-\frac{1}{\theta}
Negative binomial負二項式 log ( 1 μ ) log ( 1 μ ) log(1-mu)\log (1-\mu) 1 exp ( θ ) 1 exp ( θ ) 1-exp(theta)1-\exp (\theta)
Logit邏輯特 log ( μ 1 μ ) log μ 1 μ log((mu)/(1-mu))\log \left(\frac{\mu}{1-\mu}\right) exp ( θ ) 1 + exp ( θ ) exp ( θ ) 1 + exp ( θ ) (exp(theta))/(1+exp(theta))\frac{\exp (\theta)}{1+\exp (\theta)}
Probit概率單位 Φ 1 ( μ ) Φ 1 ( μ ) Phi^(-1)(mu)\Phi^{-1}(\mu) Φ ( θ ) Φ ( θ ) Phi(theta)\Phi(\theta)
Distribution Link Inverse theta=g(mu) mu=g^(-1)(theta) Poisson log(mu) exp(theta) Normal mu theta Gamma -(1)/(mu) -(1)/(theta) Negative binomial log(1-mu) 1-exp(theta) Logit log((mu)/(1-mu)) (exp(theta))/(1+exp(theta)) Probit Phi^(-1)(mu) Phi(theta)| Distribution | Link | Inverse | | :--- | :--- | :--- | | | $\theta=g(\mu)$ | $\mu=g^{-1}(\theta)$ | | Poisson | $\log (\mu)$ | $\exp (\theta)$ | | Normal | $\mu$ | $\theta$ | | Gamma | $-\frac{1}{\mu}$ | $-\frac{1}{\theta}$ | | Negative binomial | $\log (1-\mu)$ | $1-\exp (\theta)$ | | Logit | $\log \left(\frac{\mu}{1-\mu}\right)$ | $\frac{\exp (\theta)}{1+\exp (\theta)}$ | | Probit | $\Phi^{-1}(\mu)$ | $\Phi(\theta)$ |

Estimation, IWLS, and Properties of Estimation of GLMs
估計、迭代加權最小平方法與廣義線性模型的估計特性

  • Estimation of GLMs is by iteratively weighted least squares (IWLS).
    廣義線性模型的估計採用迭代加權最小平方法(IWLS)。
  • Uses a repeated weighted least squares computation to adjust the values of X β = g ( μ ) X β = g ( μ ) X beta=g(mu)X \beta=g(\mu) until X β j X β j + 1 < ϵ X β j X β j + 1 < ϵ Xbeta^(j)-Xbeta^(j+1) < epsilonX \beta^{j}-X \beta^{j+1}<\epsilon.
    使用重複加權最小平方法計算來調整 X β = g ( μ ) X β = g ( μ ) X beta=g(mu)X \beta=g(\mu) 的值,直到 X β j X β j + 1 < ϵ X β j X β j + 1 < ϵ Xbeta^(j)-Xbeta^(j+1) < epsilonX \beta^{j}-X \beta^{j+1}<\epsilon
  • See Gill 2007: 549-551 for details.
    詳情請參閱 Gill 2007 年著作第 549 至 551 頁。

Properties of GLMs廣義線性模型的特性

  • For cases where the canonical link functions are used, GLMs are MLEs.
    在使用典型連結函數的情況下,廣義線性模型即為最大似然估計。
  • So in these cases, we can say that the results / properties of MLE hold.
    因此,在這些情況下,我們可以說最大似然估計的結果/特性成立。
  • In other cases, namely where the full parametric assumptions of GLM or MLE are not met, the estimates are quasi-maximum likelihood estimates.
    在其他情況下,即當廣義線性模型(GLM)或最大似然估計(MLE)的完整參數假設未被滿足時,這些估計值即為擬最大似然估計。
  • Quasi-MLEs are consistent and asymptotically normal, but not efficient (see McCullagh 1983, McCullagh and Nelder 1989, Gourieroux, Monfort and Trognon, 1984a,b).
    擬最大似然估計具有一致性與漸近常態性,但不具有效性(參見 McCullagh 1983 年文獻、McCullagh 與 Nelder 1989 年著作,以及 Gourieroux、Monfort 與 Trognon 1984 年 a、b 兩篇論文)。
Quantities of interest and tests
關注量與檢定方法

Interpreting GLM and MLE Coefficients
解讀廣義線性模型與最大似然估計係數

We want to be able to report several different quantities and results from statistical models. These involve conditional inferences:
我們希望能從統計模型中報告多種不同的量值與結果。這些涉及條件推論:
  • Quantities of interest: predicted means, probabilities, odds, etc.
    關注的量值:預測均值、概率、勝算比等。
  • Hypothesis tests: Wald, likelihood ratio, and Lagrangean multiplier (score) tests.
    假設檢定:沃爾德檢定、概似比檢定及拉格朗日乘數(得分)檢定。
  • Want to be able to compute these and report them in a consistent manner across different models.
    希望能夠計算這些量值,並在不同模型間以一致的方式進行報告。

The "Intuitive Version" of tests
「直觀版本」的檢定

  • Since inference in the ML models is based on asymptotic properties, we need to derive new test statistics for these models based on these properties and assumptions.
    由於 ML 模型的推論基於漸近性質,我們需要根據這些性質和假設為這些模型推導新的檢定統計量。
  • The nice thing about ML estimators though is that we only have to derive the testing framework once.
    不過,ML 估計量的一個優點是我們只需推導一次檢定框架。
  • Since all the MLEs share the same properties, hypothesis testing is done the same way in each of the models.
    由於所有 MLE 都具有相同的性質,因此在每個模型中進行假設檢定的方式都是相同的。
  • Tests are based on measuring scaled distances around the likelihood surface (function).
    檢驗基於對概似函數曲面周圍的尺度距離進行測量。

Three tests三種檢驗

  • There are three general hypothesis tests used with ML models.
    最大概似模型中使用三種通用的假設檢驗。
  • These are the Wald (W), likelihood ratio (LR) and Lagrangean multiplier (LM) tests.
    這些分別是沃爾德檢驗(W)、概似比檢驗(LR)和拉格朗日乘數檢驗(LM)。
  • Each test has a χ 2 χ 2 chi^(2)\chi^{2} distribution. The three tests are asymptotically equivalent for the same null hypothesis.
    每個檢定都有一個 χ 2 χ 2 chi^(2)\chi^{2} 分佈。對於相同的虛無假設,這三個檢定是漸進等價的。

Test Construction測驗建構

  • Assume that we have ML estimates such that
    假設我們有最大似然估計量使得
θ ^ N ( θ , V ( θ ) ) θ ^ N ( θ , V ( θ ) ) hat(theta)∼N(theta,V(theta))\hat{\theta} \sim N(\theta, \mathbf{V}(\theta))
  • Suppose that our hypothesis conjectures that θ = θ 0 θ = θ 0 theta=theta_(0)\theta=\theta_{0} - the θ 0 θ 0 theta_(0)\theta_{0} values being the ones under the null hypothesis.
    假設我們的假設推測 θ = θ 0 θ = θ 0 theta=theta_(0)\theta=\theta_{0} ——這些 θ 0 θ 0 theta_(0)\theta_{0} 值是在虛無假設下的數值。
  • What does this mean in terms of a log likelihood function? That is, what so the likelihoods of L ( θ Y ) L ( θ Y ) L(theta∣Y)L(\theta \mid Y) and L ( θ 0 Y ) L θ 0 Y L(theta_(0)∣Y)L\left(\theta_{0} \mid Y\right) look like?
    這對數似然函數意味著什麼?也就是說, L ( θ Y ) L ( θ Y ) L(theta∣Y)L(\theta \mid Y) L ( θ 0 Y ) L θ 0 Y L(theta_(0)∣Y)L\left(\theta_{0} \mid Y\right) 的似然函數看起來是什麼樣子?

Likelihood tests似然比檢驗

Distance Measured By Wald Test
沃爾德檢驗測量的距離

What can we compare?
我們可以比較什麼?

What will these tests depend on?
這些測試將取決於什麼?
  1. the value of the likelihood for θ θ theta\theta and θ 0 θ 0 theta_(0)\theta_{0}
    對於 θ θ theta\theta θ 0 θ 0 theta_(0)\theta_{0} 的概似值
  2. the curvature of the likelihood
    概似函數的曲率
  3. so we need a way to combine these two pieces of information
    因此我們需要一種方法來結合這兩部分信息

Formal derivations形式推導

So how might we measure the change in the likelihood from θ θ theta\theta to θ 0 θ 0 theta_(0)\theta_{0}. Three ways come to mind:
那麼我們該如何衡量從 θ θ theta\theta θ 0 θ 0 theta_(0)\theta_{0} 的似然變化呢?有三種方法浮現腦海:
  1. Look at the heights of the likelihood functions evaluated at θ θ theta\theta and θ 0 θ 0 theta_(0)\theta_{0}, weighted by the likelihood function’s curvature. This is the likelihood ratio.
    觀察在 θ θ theta\theta θ 0 θ 0 theta_(0)\theta_{0} 處評估的似然函數高度,並以似然函數的曲率加權。這就是似然比。
  2. Look at the distance between θ θ theta\theta and θ 0 θ 0 theta_(0)\theta_{0}, weighted by the likelihood function’s curvature. This is a Wald test.
    觀察 θ θ theta\theta θ 0 θ 0 theta_(0)\theta_{0} 之間的距離,並以似然函數的曲率加權。這便是沃爾德檢驗。
  3. Look at the slope of the likelihood function at θ 0 θ 0 theta_(0)\theta_{0}, denoted S ( θ 0 ) S θ 0 S(theta_(0))S\left(\theta_{0}\right) and see how far it is from zero, again weighted by the curvature of the likelihood function at this point. This is a score or Lagrangean test.
    觀察似然函數在 θ 0 θ 0 theta_(0)\theta_{0} 處的斜率,記為 S ( θ 0 ) S θ 0 S(theta_(0))S\left(\theta_{0}\right) ,看看它與零的距離有多遠,再次以該點似然函數的曲率加權。這是一個分數或拉格朗日檢驗。

How do we measure "curvature"?
我們如何衡量「曲率」?

  • The degree of curvature of a function is determined by the function’s second derivatives.
    函數的曲率程度由函數的二階導數決定。
  • For a multivariate problem, this is the same as the Hessian - the matrix of second derivatives with respect to all the parameters for the likelihood function.
    對於多變量問題,這與 Hessian 矩陣相同——即關於似然函數所有參數的二階導數矩陣。

Recall回顧

  • The negative of the expected value of Hessian matrix, is known as the information matrix:
    海森矩陣的期望值之負值,被稱為信息矩陣:
Information Matrix : I ( θ ) = E [ H ( θ ) ] .  Information Matrix :  I ( θ ) = E [ H ( θ ) ] . " Information Matrix : "quad I(theta)=-E[H(theta)].\text { Information Matrix : } \quad I(\theta)=-E[H(\theta)] .
  • The inverse of the information matrix gives the variance-covariance matrix for the ML parameters:
    信息矩陣的逆矩陣給出了最大似然參數的方差-協方差矩陣:
Variance : Var ( θ ^ ) = E [ H ( θ ) ] 1  Variance  : Var ( θ ^ ) = E [ H ( θ ) ] 1 " Variance ":quad Var( hat(theta))=-E[H(theta)]^(-1)\text { Variance }: \quad \operatorname{Var}(\hat{\theta})=-E[H(\theta)]^{-1}
  • Thus, we will be able to use the inverse of the variance-covariance of the maximum likelihood parameters to “weight” the distances we are measuring in these tests.
    因此,我們將能夠利用最大似然參數的方差-協方差矩陣的逆矩陣來「加權」這些測試中所測量的距離。

Notation符號表示

  • Log-likelihood function對數似然函數
( θ Y ) = log ( L ( Y θ ) ) = log ( p ( Y θ ) ) ( θ Y ) = log ( L ( Y θ ) ) = log ( p ( Y θ ) ) ℓ(theta∣Y)=log(L(Y∣theta))=log(p(Y∣theta))\ell(\theta \mid Y)=\log (L(Y \mid \theta))=\log (p(Y \mid \theta))
  • First derivative k × 1 k × 1 k xx1k \times 1 vector of \ell, scores:
    一階導數 k × 1 k × 1 k xx1k \times 1 向量 \ell ,得分:
θ ( θ Y ) = S ( θ ) θ ( θ Y ) = S ( θ ) (del)/(del theta)ℓ(theta∣Y)=S(theta)\frac{\partial}{\partial \theta} \ell(\theta \mid Y)=S(\theta)
  • Second derivative k × k k × k k xx kk \times k matrix of \ell, Hessian:
    二階導數 k × k k × k k xx kk \times k 矩陣的 \ell ,海森矩陣:
2 θ θ ( θ y ) = H ( θ ) 2 θ θ ( θ y ) = H ( θ ) (del^(2))/(del theta del theta)ℓ(theta∣y)=H(theta)\frac{\partial^{2}}{\partial \theta \partial \theta} \ell(\theta \mid y)=H(\theta)
  • Information matrix equality, evaluated at θ ^ θ ^ hat(theta)\hat{\theta} :
    信息矩陣等式,在 θ ^ θ ^ hat(theta)\hat{\theta} 處評估:
I ( θ ) = E [ H ( θ ) ] Var ( θ ^ ) = E [ H ( θ ) ] 1 = I ( θ ) 1 I ( θ ) = E [ H ( θ ) ] Var ( θ ^ ) = E [ H ( θ ) ] 1 = I ( θ ) 1 I(theta)=-E[H(theta)]Var( hat(theta))=-E[H(theta)]^(-1)=I(theta)^(-1)I(\theta)=-E[H(\theta)] \operatorname{Var}(\hat{\theta})=-E[H(\theta)]^{-1}=I(\theta)^{-1}

Likelihood ratio test概似比檢定

Compares the heights of two likelihoods at θ θ theta\theta (unrestricted) and θ 0 θ 0 theta_(0)\theta_{0} (restricted) for a test with m m mm restrictions:
比較兩個似然函數在 θ θ theta\theta (無限制)和 θ 0 θ 0 theta_(0)\theta_{0} (有限制)處的高度,用於具有 m m mm 個限制的檢驗:
L R = 2 log ( L ( θ 0 Y ) L ( θ Y ) ) = 2 ( log L ( θ 0 Y ) log L ( θ Y ) ) = 2 ( ( θ 0 Y ) ( θ Y ) ) = 2 ( ( θ Y ) ( θ 0 Y ) ) χ m 2 L R = 2 log L θ 0 Y L ( θ Y ) = 2 log L θ 0 Y log L ( θ Y ) = 2 θ 0 Y ( θ Y ) = 2 ( θ Y ) θ 0 Y χ m 2 {:[LR=-2log((L(theta_(0)∣Y))/(L(theta∣Y)))],[=-2(log L(theta_(0)∣Y)-log L(theta∣Y))],[=-2(ℓ(theta_(0)∣Y)-ℓ(theta∣Y))],[=2(ℓ(theta∣Y)-ℓ(theta_(0)∣Y))],[∼chi_(m)^(2)]:}\begin{aligned} L R & =-2 \log \left(\frac{L\left(\theta_{0} \mid Y\right)}{L(\theta \mid Y)}\right) \\ & =-2\left(\log L\left(\theta_{0} \mid Y\right)-\log L(\theta \mid Y)\right) \\ & =-2\left(\ell\left(\theta_{0} \mid Y\right)-\ell(\theta \mid Y)\right) \\ & =2\left(\ell(\theta \mid Y)-\ell\left(\theta_{0} \mid Y\right)\right) \\ & \sim \chi_{m}^{2} \end{aligned}
Note that the likelihood value of the restricted model L ( θ 0 Y ) L θ 0 Y L(theta_(0)∣Y)L\left(\theta_{0} \mid Y\right) will be larger, so this will always be a positive number (why?)
注意受限模型的似然值 L ( θ 0 Y ) L θ 0 Y L(theta_(0)∣Y)L\left(\theta_{0} \mid Y\right) 會較大,因此這始終是一個正數(為什麼?)

Wald test沃爾德檢定

Let C ( θ ) = C ( θ 0 ) C ( θ ) = C θ 0 C(theta)=C(theta_(0))C(\theta)=C\left(\theta_{0}\right) be some matrix function that defines the restrictions on the MLE θ θ theta\theta. The Wald test has the form
C ( θ ) = C ( θ 0 ) C ( θ ) = C θ 0 C(theta)=C(theta_(0))C(\theta)=C\left(\theta_{0}\right) 為某個矩陣函數,定義了對最大似然估計 θ θ theta\theta 的限制。沃爾德檢驗的形式為
W = ( C ( θ ) C ( θ 0 ) ) V ( θ ) 1 ( C ( θ ) C ( θ 0 ) ) χ m 2 W = C ( θ ) C θ 0 V ( θ ) 1 C ( θ ) C θ 0 χ m 2 W=(C(theta)-C(theta_(0)))^(')V(theta)^(-1)(C(theta)-C(theta_(0)))∼chi_(m)^(2)W=\left(C(\theta)-C\left(\theta_{0}\right)\right)^{\prime} V(\theta)^{-1}\left(C(\theta)-C\left(\theta_{0}\right)\right) \sim \chi_{m}^{2}
for m m mm restrictions.適用於 m m mm 個限制條件的情況。
Note that this allows for non-linear tests, and requires estimating the unconstrained model.
需要注意的是,這允許進行非線性檢驗,並且需要估計無約束模型。

Lagrangean multiplier or score test
拉格朗日乘數檢定或得分檢定

Let S ( θ 0 ) S θ 0 S(theta_(0))S\left(\theta_{0}\right) be the score function evaluated at θ 0 θ 0 theta_(0)\theta_{0} under the null hypothesis. Then the LM test has the form:
S ( θ 0 ) S θ 0 S(theta_(0))S\left(\theta_{0}\right) 為在虛無假設下於 θ 0 θ 0 theta_(0)\theta_{0} 處評估的得分函數。則 LM 檢驗的形式為:
L M = S ( θ 0 ) V ( θ 0 ) 1 S ( θ 0 ) χ m 2 L M = S θ 0 V θ 0 1 S θ 0 χ m 2 LM=S(theta_(0))^(')V(theta_(0))^(-1)S(theta_(0))∼chi_(m)^(2)L M=S\left(\theta_{0}\right)^{\prime} V\left(\theta_{0}\right)^{-1} S\left(\theta_{0}\right) \sim \chi_{m}^{2}
In this version of the test one only needs to estimate the restricted model.
在此版本的檢驗中,僅需估計受限模型。

Comparison of tests檢驗比較

  • In small samples it is generally the case that the three tests can give different results. Typically this means that the test statistics are ordered: W L R L M W L R L M W >= LR >= LMW \geq L R \geq L M.
    在小樣本情況下,通常會出現三種檢驗方法得出不同結果的情形。這通常意味著檢驗統計量存在排序關係: W L R L M W L R L M W >= LR >= LMW \geq L R \geq L M
  • For a further, very intuitive discussion of likelihood based tests, see A. Buse. 1982. “'Likelihood ratio, Wald, and Lagrange Multiplier Tests: An Expository Note.”
    關於基於概似比檢驗的進一步且非常直觀的討論,可參閱 A. Buse 於 1982 年發表的《概似比檢驗、沃爾德檢驗與拉格朗日乘數檢驗:一篇說明性註記》。

    American Statistician 36(3): 153-157.
    美國統計學家 36(3): 153-157.

Equivalences and choice of the tests
等效性與檢驗方法的選擇

Test測試 Compute計算 Compute計算
Unrestricted Model?無限制模型? Restricted Model?受限模型?
Wald沃爾德 Yes No
Likelihood Ratio概似比 Yes Yes
Lagrange Multiplier拉格朗日乘數 No Yes
- - -
Test Compute Compute Unrestricted Model? Restricted Model? Wald Yes No Likelihood Ratio Yes Yes Lagrange Multiplier No Yes - - -| Test | Compute | Compute | | :--- | :--- | :--- | | | Unrestricted Model? | Restricted Model? | | Wald | Yes | No | | Likelihood Ratio | Yes | Yes | | Lagrange Multiplier | No | Yes | | - | - | - |
Which computation you will want to use depends on the nature of your estimator. If the ML estimator is very hard to compute, computations using Wald tests are easiest. For cheap estimators, LR and LM tests can be more easily programmed in many cases.
選擇何種計算方式取決於估計量的性質。若最大似然估計量非常難以計算,使用沃爾德檢驗的計算最為簡便。對於計算成本較低的估計量,許多情況下似然比檢驗和拉格朗日乘數檢驗更容易編程實現。
These complexities may also depend on the linearity of the estimator and constraints. For OLS, all three tests are generally easy, although that depends on your regression outputs.
這些複雜性還可能取決於估計量及其約束條件的線性程度。對於普通最小二乘法而言,三種檢驗通常都較為簡單,但這也取決於回歸輸出的具體情況。

Things you can and cannot do
可行與不可行之事

  • LR, LM, and W W WW tests must be computed for the same data.
    LR、LM 及 W W WW 檢定必須基於相同數據計算。
  • Tests must be nested: linear or non-linear parametric restrictions of some larger parameter space.
    檢定必須具有嵌套性:是對更大參數空間的線性或非線性參數限制。
  • Results are asymptotic! Tests are equal only as samples are large.
    結果為漸近性質!僅當樣本量趨大時檢定結果才會相等。

Computing quantities of interest
計算感興趣的數量

  • What do we want to compute / report?
    我們想要計算/報告什麼?
  • What might be these be?
    這些可能是什麼?
  1. Hypothesis tests.假設檢驗
  2. Comparative statics about how changes in the data or model parameters affect the dependent variable or its probability.
    關於數據或模型參數變化如何影響依變量或其概率的比較靜態分析。
  3. Measures of our uncertainty about the parameters of the model and associated comparative statics.
    關於模型參數及其相關比較靜態分析的不確定性度量。

More specifically更具體地說

  • Predicted values: means. variances, probabilities, odds.
    預測值:均值、變異數、概率、勝算比。
  • Conditional predictions: based on specific values of the independent variables that are “representative” observations.
    條件預測:基於自變量的特定值,這些值是「代表性」的觀察結果。
  • Marginal effects, or the changes in the dependent variable that are expected for changes in the independent variable.
    邊際效應,或預期因自變量變化而導致的因變量變化。
  • First differences: effect of a unit change in an independent variable on the dependent variable.
    首次差異:自變量單位變化對因變量的影響。
The first two of these are generated by using the ML estimates to produce predictions of the dependent variable in the ML model. The third is computed by finding the derivative of the probability model quantity of interest with respect to a change in an independent variable. The last is the discrete change version of the derivative.
前兩者是通過使用最大似然估計來生成最大似然模型中因變量的預測值。第三種是通過計算概率模型感興趣量相對於自變量變化的導數來獲得。最後一種是導數的離散變化版本。

A running example一個運行的範例

  • To illustrate, use a logit model. In the logit model, we analyze the probability of observing a value of 1 on the dependent variable:
    舉例來說,使用一個 logit 模型。在 logit 模型中,我們分析觀察到依變量值為 1 的概率:
Pr ( Y = 1 X ) = F ( X β ) = exp ( X β ) 1 + exp ( X β ) Pr ( Y = 1 X ) = F ( X β ) = exp ( X β ) 1 + exp ( X β ) Pr(Y=1∣X)=F(X beta)=(exp(X beta))/(1+exp(X beta))\operatorname{Pr}(Y=1 \mid X)=F(X \beta)=\frac{\exp (X \beta)}{1+\exp (X \beta)}
where X X XX is a row vector of independent variables and β β beta\beta is the column vector of coefficients.
其中 X X XX 是自變量的行向量, β β beta\beta 是係數的列向量。
  • Suppose we have available the MLE, β ^ β ^ widehat(beta)\widehat{\beta}. The next sections demonstrate the different methods that can be used to interpret how changes in X X XX are then related to changes in Pr ( Y = 1 X ) Pr ( Y = 1 X ) Pr(Y=1∣X)\operatorname{Pr}(Y=1 \mid X).
    假設我們有可用的最大似然估計量 β ^ β ^ widehat(beta)\widehat{\beta} 。接下來的章節將展示可以用來解釋 X X XX 的變化如何與 Pr ( Y = 1 X ) Pr ( Y = 1 X ) Pr(Y=1∣X)\operatorname{Pr}(Y=1 \mid X) 的變化相關聯的不同方法。
  • Note that this generalizes using the link functions g ( ) g ( ) g()g() for any GLM.
    請注意,這推廣了使用連結函數 g ( ) g ( ) g()g() 於任何廣義線性模型(GLM)的做法。

Expected values, predicted outcomes
期望值,預測結果

As a first method, we can consider predicting the probability, Pr ( Y = 1 X , β ^ ) Pr ( Y = 1 X , β ^ ) Pr(Y=1∣X, widehat(beta))\operatorname{Pr}(Y=1 \mid X, \widehat{\beta}) :
作為第一種方法,我們可以考慮預測概率, Pr ( Y = 1 X , β ^ ) Pr ( Y = 1 X , β ^ ) Pr(Y=1∣X, widehat(beta))\operatorname{Pr}(Y=1 \mid X, \widehat{\beta})
Pr ( Y = 1 X ~ , β ^ ) = exp ( X ~ β ) 1 + exp ( X ~ β ) Pr ( Y = 1 X ~ , β ^ ) = exp ( X ~ β ) 1 + exp ( X ~ β ) Pr(Y=1∣ tilde(X), widehat(beta))=(exp(( tilde(X))beta))/(1+exp(( tilde(X))beta))\operatorname{Pr}(Y=1 \mid \tilde{X}, \widehat{\beta})=\frac{\exp (\tilde{X} \beta)}{1+\exp (\tilde{X} \beta)}
where X ~ X ~ tilde(X)\tilde{X} is a vector of X X XX 's at which we want to evaluate the prediction.
其中 X ~ X ~ tilde(X)\tilde{X} 是我們想要評估預測時所用的一組 X X XX 向量。

Choosing X ~ X ~ tilde(X)\tilde{X}選擇 X ~ X ~ tilde(X)\tilde{X}

How do we choose the vector X ~ X ~ tilde(X)\tilde{X} ? The following are possibilities.
我們如何選擇向量 X ~ X ~ tilde(X)\tilde{X} ?以下是可能的選項。
  • Use the mean values of the X’s.
    使用 X 的平均值。
  • Use the range of the X’s.
    使用 X 的範圍。
  • Use the extrema of the X’s.
    使用 X 的極值。
  • Select X’s based on in-sample cases of interest (as generated by hypotheses).
    根據假設生成的樣本內感興趣案例選擇 X。
  • Choose relevant percentile points of the X’s: 25%, 50%, 75%-iles, etc.
    選擇 X 的相關百分位點:25%、50%、75%等。
Whatever points are chosen should be data admissible.
無論選擇哪些點,都應是數據可接受的。

Partial changes or First derivatives
部分變更或一階導數

  • A second method is based on the instantaneous change in the probability for a change in X X XX, which is the first derivative of the quantity of interest.
    第二種方法基於概率對 X X XX 變化的瞬時變化率,即所關注量的一階導數。
  • For the logit model this would be
    對於 Logit 模型而言,這將是
Pr ( Y = 1 X ) X k = exp ( X β ) [ 1 + exp ( X β ) ] 2 β k Pr ( Y = 1 X ) X k = exp ( X β ) [ 1 + exp ( X β ) ] 2 β k (del Pr(Y=1∣X))/(delX_(k))=(exp(X beta))/([1+exp(X beta)]^(2))beta_(k)\frac{\partial \operatorname{Pr}(Y=1 \mid X)}{\partial X_{k}}=\frac{\exp (X \beta)}{[1+\exp (X \beta)]^{2}} \beta_{k}
which is the first derivative of the logit probability function with respect to the k t h k t h k^(')thk^{\prime} t h variable.
此為 Logit 概率函數對 k t h k t h k^(')thk^{\prime} t h 變數的一階導數。
  • Notice that we still have to choose a vector of X’s at which to evaluate this derivative.
    請注意,我們仍需選擇一個 X 向量來評估此導數。
  • The reason why is that E [ f ( x ) ] f ( E [ x ] ) E [ f ( x ) ] f ( E [ x ] ) E[f(x)]!=f(E[x])E[f(x)] \neq f(E[x]). We typically are going to want the former, not the latter. So just plugging in the average X 's for the derivative is generally the wrong thing to do.
    原因在於 E [ f ( x ) ] f ( E [ x ] ) E [ f ( x ) ] f ( E [ x ] ) E[f(x)]!=f(E[x])E[f(x)] \neq f(E[x]) 。通常我們會想要前者,而非後者。因此,直接將平均 X 值代入導數計算通常是錯誤的做法。

Relative Marginal Effects
相對邊際效應

  • Notice that this leads to a more general computation that allows you to assess the relative marginal effect of two variables:
    請注意,這導向了一種更通用的計算方法,讓您能夠評估兩個變量的相對邊際效應:
Pr ( Y = 1 X ) x k Pr ( Y = 1 X ) x = β k β . Pr ( Y = 1 X ) x k Pr ( Y = 1 X ) x = β k β . ((del Pr(Y=1∣X))/(delx_(k)))/((del Pr(Y=1∣X))/(delx_(ℓ)))=(beta_(k))/(beta_(ℓ)).\frac{\frac{\partial \operatorname{Pr}(Y=1 \mid X)}{\partial x_{k}}}{\frac{\partial \operatorname{Pr}(Y=1 \mid X)}{\partial x_{\ell}}}=\frac{\beta_{k}}{\beta_{\ell}} .
  • This quantity allows you to assess the relative impact of each variable on the change in probability. If the number is greater (less) than 1 , the k t h ( t h ) k t h t h k^(')th(ℓ^(')th)k^{\prime} t h\left(\ell^{\prime} t h\right) variable has a bigger impact on the probability Pr ( Y = 1 ) Pr ( Y = 1 ) Pr(Y=1)\operatorname{Pr}(Y=1) than the t h ( k t h ) t h k t h ℓ^(')th(k^(')th)\ell^{\prime} t h\left(k^{\prime} t h\right) variable.
    這個數值讓你能評估每個變量對概率變化的相對影響。如果該數值大於(小於)1,則變量 k t h ( t h ) k t h t h k^(')th(ℓ^(')th)k^{\prime} t h\left(\ell^{\prime} t h\right) 對概率 Pr ( Y = 1 ) Pr ( Y = 1 ) Pr(Y=1)\operatorname{Pr}(Y=1) 的影響大於(小於)變量 t h ( k t h ) t h k t h ℓ^(')th(k^(')th)\ell^{\prime} t h\left(k^{\prime} t h\right)

First Differences首差法

For this methods one computes the change in the dependent variable for different vectors of the predictors, give the MLE.
此方法通過給定最大似然估計值,計算不同預測變量向量下因變量的變化量。
  • This then measures the effect of changing a particular value of X X XX, say x k x k x_(k)x_{k} from x k x k x_(k)x_{k} to x k + δ x k + δ x_(k)+deltax_{k}+\delta.
    這便衡量了將特定值 X X XX (例如從 x k x k x_(k)x_{k} 變為 x k + δ x k + δ x_(k)+deltax_{k}+\delta )改變時所產生的效應。
  • This computation can be done for the logit model as follows:
    此計算可以針對邏輯模型進行如下操作:
Δ Pr ( Y = 1 X ) Δ x k = Pr ( Y = 1 X , x k + δ ) Pr ( Y = 1 X , x k ) = exp ( β 0 + β 1 x 1 + + β k ( x k + δ ) ) 1 + exp ( β 0 + β 1 x 1 + + β k ( x k + δ ) ) exp ( β 0 + β 1 1 + exp ( β 0 + Δ Pr ( Y = 1 X ) Δ x k = Pr Y = 1 X , x k + δ Pr Y = 1 X , x k = exp β 0 + β 1 x 1 + + β k x k + δ 1 + exp β 0 + β 1 x 1 + + β k x k + δ exp β 0 + β 1 1 + exp β 0 + {:[(Delta Pr(Y=1∣X))/(Deltax_(k))=Pr(Y=1∣X,x_(k)+delta)-Pr(Y=1∣X,x_(k))],[=(exp(beta_(0)+beta_(1)x_(1)+cdots+beta_(k)(x_(k)+delta)))/(1+exp(beta_(0)+beta_(1)x_(1)+cdots+beta_(k)(x_(k)+delta)))-(exp(beta_(0)+beta_(1))/(1+exp(beta_(0)+)]:}\begin{aligned} \frac{\Delta \operatorname{Pr}(Y=1 \mid X)}{\Delta x_{k}} & =\operatorname{Pr}\left(Y=1 \mid X, x_{k}+\delta\right)-\operatorname{Pr}\left(Y=1 \mid X, x_{k}\right) \\ & =\frac{\exp \left(\beta_{0}+\beta_{1} x_{1}+\cdots+\beta_{k}\left(x_{k}+\delta\right)\right)}{1+\exp \left(\beta_{0}+\beta_{1} x_{1}+\cdots+\beta_{k}\left(x_{k}+\delta\right)\right)}-\frac{\exp \left(\beta_{0}+\beta_{1}\right.}{1+\exp \left(\beta_{0}+\right.} \end{aligned}
  • Notice this computation depends on the starting values for x k x k x_(k)x_{k}.
    請注意,此計算取決於 x k x k x_(k)x_{k} 的起始值。

Choosing the change, δ δ delta\delta
選擇變更, δ δ delta\delta

The value of δ δ delta\delta is open. Possibilities include:
δ δ delta\delta 的值是開放的。可能性包括:
  • A unit change: δ = 1 δ = 1 delta=1\delta=1.
    一個單位的變化: δ = 1 δ = 1 delta=1\delta=1
  • A centered discrete change: δ = x ¯ k ± 0.5 δ = x ¯ k ± 0.5 delta= bar(x)_(k)+-0.5\delta=\bar{x}_{k} \pm 0.5.
    一個中心化的離散變化: δ = x ¯ k ± 0.5 δ = x ¯ k ± 0.5 delta= bar(x)_(k)+-0.5\delta=\bar{x}_{k} \pm 0.5
  • A change from zero to one (for dummy variables).
    從零到一的變更(針對虛擬變數)。
  • A one standard deviation change: δ = σ x k δ = σ x k delta=sigma_(x_(k))\delta=\sigma_{x_{k}} (for continuous variables).
    一個標準差的變更: δ = σ x k δ = σ x k delta=sigma_(x_(k))\delta=\sigma_{x_{k}} (針對連續變數)。

Odds Ratios勝算比

  • A final interpretation method that can be used with any likelihood model where the mean or location parameter is of the form λ = exp ( X β ) λ = exp ( X β ) lambda=exp(X beta)\lambda=\exp (X \beta) is the odds-ratio.
    最後一種解釋方法可用於任何似然模型,其中均值或位置參數的形式為 λ = exp ( X β ) λ = exp ( X β ) lambda=exp(X beta)\lambda=\exp (X \beta) ,即勝算比。
  • This means this method can be used with Poisson and other event count regressions and most of the event history models.
    這意味著該方法可應用於泊松回歸及其他事件計數回歸模型,以及大多數事件史模型。
  • Recall that the odd are defined from a set of probabilities, so for an event with probability p p pp, the odds are p 1 p p 1 p (p)/(1-p)\frac{p}{1-p}.
    回想一下,勝算是由一組概率定義的,因此對於概率為 p p pp 的事件,其勝算為 p 1 p p 1 p (p)/(1-p)\frac{p}{1-p}

Logit Odds對數勝算

  • For the logit example, odds of the event give X X XX is defined as,
    以 logit 模型為例,當事件勝算為 X X XX 時,其定義如下:
Ω ( X ) = Pr ( Y = 1 X ) Pr ( Y = 0 X ) = Pr ( Y = 1 X ) 1 Pr ( Y = 1 X ) . Ω ( X ) = Pr ( Y = 1 X ) Pr ( Y = 0 X ) = Pr ( Y = 1 X ) 1 Pr ( Y = 1 X ) . Omega(X)=(Pr(Y=1∣X))/(Pr(Y=0∣X))=(Pr(Y=1∣X))/(1-Pr(Y=1∣X)).\Omega(X)=\frac{\operatorname{Pr}(Y=1 \mid X)}{\operatorname{Pr}(Y=0 \mid X)}=\frac{\operatorname{Pr}(Y=1 \mid X)}{1-\operatorname{Pr}(Y=1 \mid X)} .
  • This implies that log ( Ω ( X ) ) = X β log ( Ω ( X ) ) = X β log(Omega(X))=X beta\log (\Omega(X))=X \beta. So the change in the log odds for a change in x k x k x_(k)x_{k} can be defined as
    這意味著 log ( Ω ( X ) ) = X β log ( Ω ( X ) ) = X β log(Omega(X))=X beta\log (\Omega(X))=X \beta 。因此, x k x k x_(k)x_{k} 變化所導致的對數勝算變化可定義為
Ω ( X ) x k = β k Ω ( X ) x k = β k (del Omega(X))/(delx_(k))=beta_(k)\frac{\partial \Omega(X)}{\partial x_{k}}=\beta_{k}
The change in the odds ratio for a discrete change in x k x k x_(k)x_{k} can be computed as:
對於 x k x k x_(k)x_{k} 的離散變化,其勝算比的變化可以計算為:
Ω ( X , x k + δ ) Ω ( X , x k ) = exp ( β k δ ) Ω X , x k + δ Ω X , x k = exp β k δ (Omega(X,x_(k)+delta))/(Omega(X,x_(k)))=exp(beta_(k)delta)\frac{\Omega\left(X, x_{k}+\delta\right)}{\Omega\left(X, x_{k}\right)}=\exp \left(\beta_{k} \delta\right)
For this discrete change, we say that for a change of δ δ delta\delta in x k x k x_(k)x_{k}, we expect the odds to change by a factor of exp ( β k ) exp β k exp(beta_(k))\exp \left(\beta_{k}\right), hold all other variables constant. Note that the scale of this change does not depend on the level of x k x k x_(k)x_{k} or any other X .
對於這種離散變化,我們可以說,當 x k x k x_(k)x_{k} 變化 δ δ delta\delta 時,預期勝算會以 exp ( β k ) exp β k exp(beta_(k))\exp \left(\beta_{k}\right) 的倍數變化,同時保持所有其他變量不變。請注意,這種變化的規模不依賴於 x k x k x_(k)x_{k} 或其他任何 X 的水平。

Issues with (log) odds
(對數)勝算的問題

Notice the difficulty with this measure: it is on a scale different from the quantity of interest - it is written in terms of the relative ratio of the log probabilities.
請注意這個測量方式的困難之處:它與感興趣的數量處於不同的尺度上——它是用對數概率的相對比率來表示的。
  • Further, the change in the odds depends on the baseline point at which this change is measured.
    此外,勝算的變化取決於測量此變化的基準點。
  • Since we abstract away from this with the measure, we need to have a clear idea of the scale of the data and the baseline point of comparison.
    由於我們透過該測量方式抽象化這一點,因此需要清楚了解數據的規模及比較的基準點。

Mayhew Example: Poisson regression
梅休範例:卜瓦松迴歸

  • The basic model is as follows:
    基本模型如下:
Pr ( y i = Y ) = e μ y i μ y i ! μ = exp ( X i β ) Pr y i = Y = e μ y i μ y i ! μ = exp X i β {:[Pr(y_(i)=Y)=(e^(-mu)y_(i)^(mu))/(y_(i)!)],[mu=exp(X_(i)beta)]:}\begin{aligned} \operatorname{Pr}\left(y_{i}=Y\right) & =\frac{e^{-\mu} y_{i}^{\mu}}{y_{i}!} \\ \mu & =\exp \left(X_{i} \beta\right) \end{aligned}
where Y Y YY is the dependent variable, a count with y = 0 , 1 , 2 , , X i y = 0 , 1 , 2 , , X i y=0,1,2,dots,X_(i)y=0,1,2, \ldots, X_{i} is a 1 × k 1 × k 1xx k1 \times k row vector of covariates (including an intercept) and β β beta\beta is a k × 1 k × 1 k xx1k \times 1 column vector of regressors to be estimated.
其中 Y Y YY 為因變量,一個計數值, y = 0 , 1 , 2 , , X i y = 0 , 1 , 2 , , X i y=0,1,2,dots,X_(i)y=0,1,2, \ldots, X_{i} 為包含截距項的 1 × k 1 × k 1xx k1 \times k 列協變量向量, β β beta\beta 為待估計的 k × 1 k × 1 k xx1k \times 1 行回歸係數向量。
  • The log-likelihood function for this model is
    此模型的對數似然函數為
ln L ( β ) = ln ( i = 1 n exp ( exp ( X i β ) ) y i exp ( X i β ) y i ! ) ln L ( β ) = ln i = 1 n exp exp X i β y i exp X i β y i ! ln L(beta)=ln(prod_(i=1)^(n)(exp(-exp(X_(i)beta))y_(i)^(exp(X_(i)beta)))/(y_(i)!))\ln L(\beta)=\ln \left(\prod_{i=1}^{n} \frac{\exp \left(-\exp \left(X_{i} \beta\right)\right) y_{i}^{\exp \left(X_{i} \beta\right)}}{y_{i}!}\right)

Mayhew data梅休數據

The data are the number of pieces of major legislation passed by each Congress from 1947-1989. The regression model for the data is specified as follows:
數據為 1947 年至 1989 年間每屆國會通過的重大立法數量。該數據的回歸模型設定如下:
E ( Y X ) = μ = exp ( β 0 + β 1 Unified + β 2 Terms + β 3 Mood + β 4 Budget ) , E ( Y X ) = μ = exp β 0 + β 1  Unified  + β 2  Terms  + β 3  Mood  + β 4  Budget  , E(Y∣X)=mu=exp(beta_(0)+beta_(1)" Unified "+beta_(2)" Terms "+beta_(3)" Mood "+beta_(4)" Budget "),E(Y \mid X)=\mu=\exp \left(\beta_{0}+\beta_{1} \text { Unified }+\beta_{2} \text { Terms }+\beta_{3} \text { Mood }+\beta_{4} \text { Budget }\right),
where Unified is a dummy variable for unified government, Terms is a dummy variable measuring whether it is the first Congress of a president’s term, Mood is a dummy variable for all Congresses from 1961-1977, measuring the impact of the Great Society congresses, and Budget is a measure of the size of the federal deficit relative to GDP (in billions of 1988 dollars). The main point of Mayhew’s analysis (done using OLS) was to determine whether or not the passage of major legislation was affected by divided government.
其中「Unified」為統一政府的虛擬變量,「Terms」是衡量是否為總統任期首屆國會的虛擬變量,「Mood」是針對 1961-1977 年間所有國會的虛擬變量,用於測量「大社會」國會的影響力,而「Budget」則是聯邦赤字相對於 GDP 規模的衡量指標(以 1988 年十億美元計)。梅休分析(採用普通最小二乘法)的主要目的在於判斷重大立法通過是否受到分裂政府的影響。

Stata outputStata 輸出結果

poisson laws unif terms mood budget
Iteration 0: log likelihood = -51.298175
Iteration 1: log likelihood = -51.298162
Iteration 2: log likelihood = -51.298162
Poisson regression Number of obs = 22
    LR chi2(4) = 33.93
Prob > chi2 = 0.0000
Log likelihood = -51.298162 Pseudo R2 = 0.2485
\begin{tabular}{|l|l|l|l|l|l|l|}
\hline laws & Coef. & Std. Err. & z & \(\mathrm{P}>|\mathrm{z}|\) & [95\% Conf. & Interval] \\
\hline unif & -. 0344413 & . 1265033 & -0.27 & 0.785 & -. 2823832 & . 2135006 \\
\hline terms & . 2773157 & . 1240407 & 2.24 & 0.025 & . 0342004 & . 5204309 \\
\hline mood & . 659386 & . 1236362 & 5.33 & 0.000 & . 4170634 & . 9017086 \\
\hline budget & . 0032319 & . 0071526 & 0.45 & 0.651 & -. 0107868 & . 0172507 \\
\hline _cons & 2.091943 & . 1336811 & 15.65 & 0.000 & 1.829932 & 2.353953 \\
\hline
\end{tabular}

Marginal effects邊際效應

We can compute the impacts of changes in the regressors on the expected number of pieces of major legislation with the command in Stata:
我們可以透過 Stata 指令計算解釋變量變化對預期重大立法數量的影響:
mfx compute;
Marginal effects after poisson % y = predicted number of events
(predict) % = 11.408954 %
-------------------------------------------------------------------------------------
% variable \| dy/dx Std. Err. z P\>\|z\| \[ 95% C.I. \] X %
---------+------------------------------------------------------------------------
% unif*\| -. 3917302 1.43445 -0.27 0.785 -3.2032 2.41974 .409091 %
terms*\| 3.17403 1.41736 2.24 0.025 .396057 5.952 .5 % mood*\| 8.380602
1.68761 4.97 0.000 5.07294 11.6883 .363636 % budget \| .0368731 .0816
0.45 0.651 -. 123067 . 196813 -6.77273 %
% (*) dy/dx is for discrete change of dummy variable from 0 to 1 %
For the discrete or dummy variables on the right hand side of the Poisson regression, the derivative is not computed. Here we see that the presence of unified government decreases the number of pieces of major legislation by -0.39.
對於泊松回歸右側的離散或虛擬變量,不計算導數。此處我們觀察到統一政府的存在使重大立法數量減少了-0.39 項。

Details詳情

Just to show what is going on, let’s use the discrete change formula for the Poisson and compute the answer by hand:
為了展示具體情況,我們使用泊松分佈的離散變化公式並手動計算答案:
E ( Y X , β , Unified = 1 ) E ( Y X , β , Unified = 0 ) = exp ( β 0 + β 1 1 + β 2 Ter E ( Y X , β ,  Unified  = 1 ) E ( Y X , β ,  Unified  = 0 ) = exp β 0 + β 1 1 + β 2  Ter  E(Y∣X,beta," Unified "=1)-E(Y∣X,beta," Unified "=0)=exp(beta_(0)+beta_(1)1+beta_(2):}" Ter "E(Y \mid X, \beta, \text { Unified }=1)-E(Y \mid X, \beta, \text { Unified }=0)=\exp \left(\beta_{0}+\beta_{1} 1+\beta_{2}\right. \text { Ter }

Deficit effects赤字效應

Consider computing the impacts of the budget deficit variable over its range. This makes historical sense, since the largest surpluses (the 21) is during the Truman Administration (at the start of the sample), and the largest deficits are in the Reagan Administration.
考慮計算預算赤字變量在其範圍內的影響。這在歷史上有其意義,因為最大的盈餘(21)出現在杜魯門政府時期(樣本初期),而最大的赤字則出現在雷根政府時期。

Results結果

The following shows how this is done:
以下展示了如何進行此操作:

mfx compute, at(mean, budget=-24);
Marginal effects after poisson
y = predicted number of events (predict) = 10.791092
-------------------------------------------------------------------------------------
    variable | dy/dx Std. Err. z P\>\|z\| \[ 95% C.I. \] X %
---------+------------------------------------------------------------------------
unif*\| -.3705157 1.35041 -0.27 0.784 -3.01727 2.27624 .409091 %
terms*\| 3.002137 1.41022 2.13 0.033 . 238157 5.76612 .5 % mood*\|
7.926742 1.88359 4.21 0.000 4.23498 11.6185 . 363636 % budget \| . 0348762
.07289 0.48 0.632 -. 107984 . 177736 -24 %
    (*) dy/dx is for discrete change of dummy variable from 0 to 1
. mfx compute, at(mean, budget=21);
Marginal effects after poisson % y = predicted number of events
(predict) % = 12.480387 %
------------------------------------------------------------------------------------
    variable \| dy/dx Std. Err. z P\>\|z\| \[ 95% C.I. \] X %
---------+--------------------------------------------------------------------------
    unif*\| -.4285182 1.58464 -0.27 0.787 -3.53436 2.67732 .409091 %
terms*\| 3.472108 1.66186 2.09 0.037 .214925 6.72929 .5 % mood*\|
9.167637 2.57398 3.56 0.000 4.12273 14.2125 . 363636 % budget \| . 0403359
.09728 0.41 0.678 -. 150323 . 230995 21 %
-----------------------------------------------------------------------------------

More更多

  • We see that at the time of the greatest surplus, Congress passes on average 12.48 laws, all else constant. At the times of greatest deficit, they pass about 10.79 major laws, all else constant. This difference of nearly 2 laws is a better statement of the difference between the two different regimes.
    我們可以看到,在盈餘最大的時期,國會平均通過 12.48 項法律,其他條件保持不變。而在赤字最大的時期,他們通過約 10.79 項主要法律,其他條件同樣不變。這近 2 項法律的差異,更能說明兩種不同制度之間的區別。
  • Note however that it may not make sense to leave the other variables at their means when doing this computation.
    但請注意,在進行此計算時,將其他變量保持在平均值可能並不合理。
  • Since all other variables are discrete, it makes more sense to put them at specific values. We can do this with more modifications to the at option of the mfx command.
    由於所有其他變量都是離散的,更合理的做法是將它們設定為特定值。我們可以通過進一步修改 mfx 命令的 at 選項來實現這一點。
  • Consider the difference in legislation between the unified government periods of the Great Society and the rest of the era when there if a Republican president. The first condition is when Unified = 1 and Mood=1. In the second case both are zero.
    考量「大社會」時期統一政府與共和黨總統執政其他時期在立法上的差異。第一種情況是當「統一」=1 且「民意」=1 時;第二種情況則是兩者皆為零。
% . /\* Compute the Great Society / Reagan era comparison
\*/ % \> mfx compute, at(mean, unif=1 mood=1);
\% Marginal effects after poisson % y = predicted number of events
(predict) % = 17.007529 %
--------------------------------------------------------------------------------------
% variable \| dy/dx Std. Err. z P\>\|z\| \[ 95% C.I. \] X %
---------+-------------------------------------------------------------------------
% unif*\| -. 5959654 2.1886 -0.27 0.785 -4.88554 3.69361 1 % terms*\|
4.731582 2.12985 2.22 0.026 .557156 8.90601 .5 % mood*\| 8.211766
1.68637 4.87 0.000 4.90654 11.517 1 % budget \| . 0549674 . 12105 0.45
0.650 -. 182296 . 29223 -6.77273 %
% (*) dy/dx is for discrete change of dummy variable from 0 to 1
\% . mfx compute, at(mean, unif=0 mood=0);
\% Marginal effects after poisson % y = predicted number of events
(predict) % = 9.1039774 %
% variable \| dy/dx Std. Err. z P\>\|z\| \[ 95% C.I. \] x %
---------+-------------------------------------------------------------------------------
% unif*\| -. 3082147 1.127 -0.27 0.784 -2.5171 1.90067 0 % terms*\\
2.532773 1.15508 2.19 0.028 . 268854 4.79669 .5 % mood*\| 8.499517
1.82243 4.66 0.000 4.92763 12.0714 0 % budget \| .0294235 .0654 0.45
0.653 -.098767 . 157614 -6.77273 %

Conclusion結論

This explains the main results: During the Great Society, there is a large increase in the amount of major legislation. During this period, the expectation is that there are 17 major laws per Congress. In contrast, during the periods of divided government, there are about 9.10 pieces of major legislation, all else being equal.
這解釋了主要結果:在「大社會」時期,重大立法數量大幅增加。此期間預期每屆國會會通過 17 項重大法律。相較之下,在政府分裂時期,其他條件相同的情況下,重大立法約為 9.10 項。

Reporting uncertainty about Quantities
報告數量不確定性

Note that there are three sources of uncertainty about the parameters in any ML model
需注意在任何機器學習模型中,參數不確定性主要來自三個來源
  • Uncertainty about the ML estimates: The estimates are generated conditional on the data, subject to a probability model. This introduces uncertainty since the parameters of the probability model mean that the likelihood depends on the uncertainty of the likelihood function.
    關於最大似然估計的不確定性:這些估計是在數據條件下生成的,並受概率模型影響。這引入了不確定性,因為概率模型的參數意味著似然函數依賴於其自身的不確定性。
  • Uncertainty about Data Generating Process: There is a fundamental uncertainty in any data generation process. In ML models, we assume a particular distribution for the data – which generates a measure of the uncertainty of the estimates based on the parameters and the variance of the data under the DGP.
    數據生成過程的不確定性:任何數據生成過程都存在根本性的不確定性。在最大似然模型中,我們假設數據服從特定分佈——這會基於參數和數據在 DGP 下的方差產生估計的不確定性度量。
  • Uncertainty about the Data Itself: The data used to compute the likelihood may not be fixed, inducing more uncertainty.
    數據本身的不確定性:用於計算似然函數的數據可能並非固定,從而引發更多不確定性。

Computing tips: How to do this right
計算技巧:如何正確操作

  • margins in Stata: Clearly, this is the workhorse command if you are working with ML and linear models in Stata. It is worth spending some time reading about how this command works.
    Stata 中的 margins 指令:顯然,這是你在 Stata 中使用機器學習和線性模型時的主力指令。值得花些時間閱讀這個指令的運作方式。
  • marginaleffects in R: There is a whole package for doing this (right).
    R 中的 marginaleffects 套件:有一個完整的套件專門用來做這件事(沒錯)。
  • Brute Force Methods: The computations for the effects interpretations are not so hard to do by hand if you have the fitted models and the summary statistics on the regressors.
    暴力計算方法:如果你有擬合好的模型和回歸變量的摘要統計量,那麼手工計算效應解釋的運算並不那麼困難。
  • Clarify: King et al. have exploited a well known property of ML % models to develop software that can be used to do statistical % simulations of quantities of interest for ML models. The basic % idea is as follows: Suppose we have an ML model that produces % estimates θ ^ θ ^ hat(theta)\hat{\theta}. Since these are normally distributed, % we can simulate values of them and use these values to compute % whatever quantity of interest we want based on them.
    Clarify:King 等人利用了一個機器學習模型的著名特性來開發軟體,該軟體可用於對機器學習模型感興趣的數量進行統計模擬。基本想法如下:假設我們有一個機器學習模型產生估計值 θ ^ θ ^ hat(theta)\hat{\theta} 。由於這些估計值呈正態分佈,我們可以模擬這些值,並用這些值來基於它們計算我們想要的任何感興趣的數量。
Examples範例

Examples of LDV models
有限應變數模型的例子

There are three examples here
這裡有三個範例
  • Poisson and count regressions (2)
    卜瓦松與計數迴歸(2)
  • Logit model for binary outcomes
    二元結果的 Logit 模型
The idea is that you see the code an how these results are generated from them.
其理念是讓你能看到程式碼以及這些結果是如何從中產生的。

Poisson model: POTUS Vetoes
泊松模型:美國總統否決權

Brandt and Williams (2001) builds on earlier work by Lewis and Strine (1996). The event counts are annual presidential vetoes from 1890-1994. Covariates are
Brandt 與 Williams(2001 年)的研究建立在 Lewis 和 Strine(1996 年)早期工作的基礎上。事件計數為 1890 至 1994 年間總統的年度否決次數。協變量包括
  • Number of public laws passed
    通過的公共法律數量
  • Seat percentage in Congress (measure of partisan conflict)
    國會席位百分比(衡量黨派衝突的指標)
  • Annual unemployment年度失業率
  • Presidential approval總統支持率
  • Succession in office職位繼任
  • War戰爭
  • Year in term任期中的年份
  • Term in office (first or second)
    任期(第一或第二任)

Basic Count Model Estimation
基本計數模型估計

data <- read.csv("veto.csv")
attach(data)
# Fit a Poisson regression
poisson.fit <- glm(vetoes ~ public.laws + partisans + unemployment +
    mandate + seccession + war + year + term,
    family=poisson())
# Fit a negative binomial
library(MASS)
neg.binomial.fit <- glm.nb(vetoes ~ public.laws + partisans + unemployment +
    mandate + seccession + war + year + term)

Basic Poisson count model output
基本泊松計數模型輸出

Call:
glm(formula = vetoes ~ public.laws + partisans + unemployment +
    mandate + seccession + war + year + term, family = poisson())
Coefficients:
\begin{tabular}{|l|l|l|l|l|l|}
\hline & Estimate & Std. Error & z value & \(\operatorname{Pr}(>|z|)\) & \multirow{2}{*}{} \\
\hline (Intercept) & 1.0223578 & 0.3053456 & 3.348 & 0.000813 & *** \\
\hline public.laws & 0.0035449 & 0.0002718 & 13.040 & < 2e-16 & *** \\
\hline partisans & -0.0027905 & 0.0035529 & -0.785 & 0.432218 & \\
\hline unemployment & 0.0777468 & 0.0063765 & 12.193 & < 2e-16 & *** \\
\hline mandate & -0.0227489 & 0.0061718 & -3.686 & 0.000228 & *** \\
\hline seccession & 0.4114497 & 0.1101600 & 3.735 & 0.000188 & *** \\
\hline war & 0.0884952 & 0.0913051 & 0.969 & 0.332433 & \\
\hline year & 0.1435743 & 0.0322690 & 4.449 & 8.62e-06 & *** \\
\hline term & 0.5920547 & 0.0718544 & 8.240 & < 2e-16 & *** \\
\hline
\end{tabular}
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
    Null deviance: 733.52 on 104 degrees of freedom

Basic negative binomial count model output
基本負二項計數模型輸出

Call:
glm.nb(formula = vetoes ~ public.laws + partisans + unemployment +
    mandate + seccession + war + year + term, init.theta = 3.674164316,
link \(=\log )\)
Coefficients:係數:
Estimate估計值 Std. Error標準誤差 z valuez 值 Pr ( > | z | ) Pr ( > | z | ) Pr( > |z|)\operatorname{Pr}(>|z|)
(Intercept)(截距) 1.4319375 0.6141402 2.332 0.0197 *
public.laws公共法律 0.0037564 0.0005232 7.180 6.97e-13 ***
partisans黨派人士 -0.0134525 0.0071328 -1.886 0.0593 .
unemployment失業 0.0803472 0.0133749 6.007 1.89e-091.89e-9 ***
mandate授權 -0.0197230 0.0117566 -1.678 0.0934 -
seccession分離 0.3960945 0.2013471 1.967 0.0492 *
war戰爭 0.0732675 0.1614093 0.454 0.6499
year年份 0.1117194 0.0611136 1.828 0.0675 .
term術語 0.6186779 0.1428174 4.332 1.48 e 05 1.48 e 05 1.48e-051.48 \mathrm{e}-05 ***
Coefficients: Estimate Std. Error z value Pr( > |z|) (Intercept) 1.4319375 0.6141402 2.332 0.0197 * public.laws 0.0037564 0.0005232 7.180 6.97e-13 *** partisans -0.0134525 0.0071328 -1.886 0.0593 . unemployment 0.0803472 0.0133749 6.007 1.89e-09 *** mandate -0.0197230 0.0117566 -1.678 0.0934 - seccession 0.3960945 0.2013471 1.967 0.0492 * war 0.0732675 0.1614093 0.454 0.6499 year 0.1117194 0.0611136 1.828 0.0675 . term 0.6186779 0.1428174 4.332 1.48e-05 ***| Coefficients: | | | | | | | :--- | :--- | :--- | :--- | :--- | :--- | | | Estimate | Std. Error | z value | $\operatorname{Pr}(>\|z\|)$ | | | (Intercept) | 1.4319375 | 0.6141402 | 2.332 | 0.0197 | * | | public.laws | 0.0037564 | 0.0005232 | 7.180 | 6.97e-13 | *** | | partisans | -0.0134525 | 0.0071328 | -1.886 | 0.0593 | . | | unemployment | 0.0803472 | 0.0133749 | 6.007 | 1.89e-09 | *** | | mandate | -0.0197230 | 0.0117566 | -1.678 | 0.0934 | - | | seccession | 0.3960945 | 0.2013471 | 1.967 | 0.0492 | * | | war | 0.0732675 | 0.1614093 | 0.454 | 0.6499 | | | year | 0.1117194 | 0.0611136 | 1.828 | 0.0675 | . | | term | 0.6186779 | 0.1428174 | 4.332 | $1.48 \mathrm{e}-05$ | *** |
Signif. codes: 0 0.001 0 0.001 0^(')******^(')0.0010{ }^{\prime} * * * ' 0.001 ‘**’ 0.01 ‘*’ 0.05 0.1 0.05 0.1 0.05^('')0.10.05 '^{\prime} 0.1 ’ ’ 1
顯著性代碼: 0 0.001 0 0.001 0^(')******^(')0.0010{ }^{\prime} * * * ' 0.001 ‘**’ 0.01 ‘*’ 0.05 0.1 0.05 0.1 0.05^('')0.10.05 '^{\prime} 0.1 ’ ’ 1

(Dispersion parameter for Negative Binomial(3.6742) family taken to be 1)
(負二項分佈(3.6742)族的離散參數設定為 1)

Logit Model: Survey Vote Choice
羅吉特模型:調查投票選擇

Data from 1990-1992 American National Election Study Panel. This survey series was administered to a panel of respondents in 1990 and 1992. It covered the standard issue set of questions about vote choice in the 1990 congressional election and the 1992 congressional and presidential elections.
數據來自 1990-1992 年美國全國選舉研究面板。該調查系列於 1990 年和 1992 年對一組受訪者進行了調查。它涵蓋了關於 1990 年國會選舉和 1992 年國會及總統選舉投票選擇的標準問題集。
The subset of the data we will work with has 1113 respondents (This was culled from the full panel series, and then missing cases were removed - not exactly the best practice, but it works for an example).
我們將使用的數據子集包含 1113 名受訪者(這是從完整面板系列中篩選出來的,然後移除了缺失案例——雖然不是最佳實踐,但作為示例足夠使用)。

Variables變量

  • PID1: Three point self reported partisan identification scale in 1990; -1 = GOP, 0 = 0 = 0=-0=- Independent, 1 = Democrat.
    PID1:1990 年三點自報黨派認同量表;-1 = 共和黨,0 = 獨立人士,1 = 民主黨。
  • PVOTE88: Presidential vote in 1988. 0 = Bush, 1 = Dukakis, 2 = No vote.
    PVOTE88:1988 年總統選舉投票。0 = 布希,1 = 杜卡基斯,2 = 未投票。
  • IDEO71: Seven point ideology scale, self reported in 1990. 0 = Liberal, 6 = 6 = 6=6= Conservative.
    IDEO71:七點意識形態量表,1990 年自我報告。0 = 自由派, 6 = 6 = 6=6= 保守派。
  • INCOME: Income in 1990, self reported. 0 = lowest category, 23 = Highest.
    INCOME:1990 年自我報告的收入。0 = 最低類別,23 = 最高類別。
  • SEX: 0 = Male, 1 = Female.
    SEX:0 = 男性,1 = 女性。
  • RACE: 0 = White, 1 = Non-white.
    RACE:0 = 白人,1 = 非白人。
  • PB1: Percentage of the respondent’s congressional district that is black in 1990.
    PB1:1990 年受訪者所屬國會選區中非裔人口百分比。
  • SSOUTH: 1=Lives in the Solid South, 0 = Otherwise.
    SSOUTH:1=居住於南方保守州,0=其他情況。

Setup Code設定代碼

library(foreign)
dd <- read.dta("anes9092data.dta")
# Do the recoding
library("dplyr")
# Deal with the NV cases -> missing
dd$vote1 <- recode_factor(dd$vote1, "NV" = NA_character_)
dd$pvote88 <- recode_factor(dd$pvote88, "NV" = NA_character_)
# Squared % black
dd$pb12 <- dd$pb1^2

Logit Estimation邏輯回歸估計

Call:
glm(formula = vote1 ~ pid1 + pvote88 + ideo71 + income1 + sex +
    race + pb1 + pb12 + ssouth, family = binomial(), data = dd)
Coefficients:係數:
Estimate估計值 Std. Error標準誤差 z valuez 值 Pr ( > | z | ) Pr ( > | z | ) Pr( > |z|)\operatorname{Pr}(>|z|)
(Intercept)(截距) -0.4994235 0.4582133 -1.090 0.275741
pid1黨派 1 -0.7918194 0.1568803 -5.047 4.48e-074.48e-7 ***
pvote88Dukakis88 年杜卡基斯得票率 -1.5703604 0.3370641 -4.659 3.18e-060.00000318 ***
ideo7171 年意識形態 0.1643770 0.0675846 2.432 0.015009 *
income1收入 1 0.0409860 0.0220682 1.857 0.063276 -
sexfemale性別女性 -0.3981275 0.2821279 -1.411 0.158197
racenon-whit種族非白人 -1.2998431 0.5686850 -2.286 0.022272 *
pb1 -0.0806471 0.0302272 -2.668 0.007630 **
pb12 0.0013853 0.0005966 2.322 0.020234 *
ssouthSouth南方南方 1.4672061 0.4379697 3.350 0.000808 ***
Estimate Std. Error z value Pr( > |z|) (Intercept) -0.4994235 0.4582133 -1.090 0.275741 pid1 -0.7918194 0.1568803 -5.047 4.48e-07 *** pvote88Dukakis -1.5703604 0.3370641 -4.659 3.18e-06 *** ideo71 0.1643770 0.0675846 2.432 0.015009 * income1 0.0409860 0.0220682 1.857 0.063276 - sexfemale -0.3981275 0.2821279 -1.411 0.158197 racenon-whit -1.2998431 0.5686850 -2.286 0.022272 * pb1 -0.0806471 0.0302272 -2.668 0.007630 ** pb12 0.0013853 0.0005966 2.322 0.020234 * ssouthSouth 1.4672061 0.4379697 3.350 0.000808 ***| | Estimate | Std. Error | z value | $\operatorname{Pr}(>\|z\|)$ | | | :--- | :--- | :--- | :--- | :--- | :--- | | (Intercept) | -0.4994235 | 0.4582133 | -1.090 | 0.275741 | | | pid1 | -0.7918194 | 0.1568803 | -5.047 | 4.48e-07 | *** | | pvote88Dukakis | -1.5703604 | 0.3370641 | -4.659 | 3.18e-06 | *** | | ideo71 | 0.1643770 | 0.0675846 | 2.432 | 0.015009 | * | | income1 | 0.0409860 | 0.0220682 | 1.857 | 0.063276 | - | | sexfemale | -0.3981275 | 0.2821279 | -1.411 | 0.158197 | | | racenon-whit | -1.2998431 | 0.5686850 | -2.286 | 0.022272 | * | | pb1 | -0.0806471 | 0.0302272 | -2.668 | 0.007630 | ** | | pb12 | 0.0013853 | 0.0005966 | 2.322 | 0.020234 | * | | ssouthSouth | 1.4672061 | 0.4379697 | 3.350 | 0.000808 | *** |

Signif. codes: 0 ‘’ 0.001 '’ 0.01 ' 0.05 0.1 0.05 0.1 0.05^('')0.10.05 '^{\prime} 0.1 ’ 1
顯著性代碼:0 ‘’ 0.001 '’ 0.01 '’ 0.05 0.1 0.05 0.1 0.05^('')0.10.05 '^{\prime} 0.1 ’ 1

(Dispersion parameter for binomial family taken to be 1)
(二項分佈族的離散參數設定為 1)

Plot 1 (basic single term plot)
圖 1(基礎單項圖)

1 # Plot the results
1 # 繪製結果圖

2 termplot(M1, rug=TRUE, se=TRUE)






Plot 2 (Fancier termplot)
圖 2(進階版項圖)

1 par(mfrow=c(5,2), mar=c(4,2,1,1))
2 \text { termplot(M1, rug=TRUE, se=TRUE, ask=FALSE)}

Margins plot 1邊際效應圖 1

Margins plot 2邊際效應圖 2

1 cplot(M1, x="pb1")
2 cplot(M1, x="pb12", draw="add")

Margins plot 3邊際效應圖 3

Margins all at once
一次顯示所有邊際效應

Mayhew Important Legislation Count Time Series
梅休重要立法計數時間序列

The data are the number of pieces of major legislation passed by each Congress from 1947-1989. The regression model for the data is specified as follows:
數據為 1947 年至 1989 年間每屆國會通過的重大立法數量。該數據的回歸模型設定如下:

E ( Y X ) = μ = exp ( β 0 + β 1 E ( Y X ) = μ = exp β 0 + β 1 E(Y∣X)=mu=exp(beta_(0)+beta_(1):}E(Y \mid X)=\mu=\exp \left(\beta_{0}+\beta_{1}\right. Unified + β 2 + β 2 +beta_(2)+\beta_{2} Terms + β 3 + β 3 +beta_(3)+\beta_{3} Mood + β 4 + β 4 +beta_(4)+\beta_{4} Budget ) ) )) where Unified is a dummy variable for unified government, Terms is a dummy variable measuring whether it is the first Congress of a president’s term, Mood is a dummy variable for all Congresses from 1961-1977, measuring the impact of the Great Society congresses, and Budget is a measure of the size of the federal deficit relative to GDP (in billions of 1988 dollars). The main point of Mayhew’s analysis (done using OLS) was to determine whether or not the passage of major legislation was affected by divided government.
E ( Y X ) = μ = exp ( β 0 + β 1 E ( Y X ) = μ = exp β 0 + β 1 E(Y∣X)=mu=exp(beta_(0)+beta_(1):}E(Y \mid X)=\mu=\exp \left(\beta_{0}+\beta_{1}\right. 統一政府 + β 2 + β 2 +beta_(2)+\beta_{2} 任期 + β 3 + β 3 +beta_(3)+\beta_{3} 情緒 + β 4 + β 4 +beta_(4)+\beta_{4} 預算 ) ) )) 其中「統一政府」為統一政府的虛擬變量,「任期」衡量是否為總統任期的首屆國會的虛擬變量,「情緒」為 1961-1977 年間所有國會的虛擬變量,用於測量「大社會」國會的影響,而「預算」則是聯邦赤字相對於 GDP 的規模(以 1988 年十億美元計)。梅休分析(使用 OLS 進行)的主要目的是確定重大立法的通過是否受到分裂政府的影響。

Data Setup and Model
數據設置與模型

mh <- read.dta("mayhew1.dta")
mh1 <- glm(laws ~ unif + terms + mood + budget,
data=mh, family=poisson())

DG ModelsDG 模型

Call:
glm(formula = laws ~ unif + terms + mood + budget, family = poisson(),
    data = mh)
Coefficients:
\begin{tabular}{lrrrrr} 
& Estimate & Std. Error & z value & \(\operatorname{Pr}(>|z|)\) & \\
(Intercept) & 2.091943 & 0.133681 & 15.649 & \(<2 e-16\) & *** \\
unif & -0.034441 & 0.126503 & -0.272 & 0.7854 & \\
terms & 0.277316 & 0.124041 & 2.236 & 0.0254 & * \\
mood & 0.659386 & 0.123636 & 5.333 & \(9.65 e-08\) & *** \\
budget & 0.003232 & 0.007153 & 0.452 & 0.6514 &
\end{tabular}
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
    Null deviance: 42.623 on 21 degrees of freedom
Residual deviance: 8.691 on 17 degrees of freedom
    (2 observations deleted due to missingness)
AIC: 112.6
Call:
lm(formula = laws ~ unif + terms + mood + budget, data = mh)
Residuals:
\begin{tabular}{rrrr} 
Min & \(1 Q\) & Median & \(3 Q\) \\
-4.042 & -1.627 & -0.106 & 1.759 \\
\hline
\end{tabular}
Coefficients:
\begin{tabular}{lrrrrr} 
& Estimate & Std. Error \(t\) value \(\operatorname{Pr}(>|t|)\) \\
(Intercept) & 7.90377 & 1.00641 & 7.853 & \(4.69 \mathrm{e}-07\)
\end{tabular} ***
terms條款 3.47026 1.07263 3.235 0.00486 **
mood情緒 8.52088 1.11910 7.614 7.11e-077.11e-7 ***
budget預算 0.05334 0.05552 0.961 0.35014
Signif.顯著性
0
'***'
0 '***'| 0 | | :--- | | '***' |
0.001
'**'
0.001 '**'| 0.001 | | :--- | | '**' |
0.01
'*'
0.05
'*' 0.05| '*' | | :--- | | 0.05 |
'. 0.1
'
1
' 1| ' | | :--- | | 1 |
terms 3.47026 1.07263 3.235 0.00486 ** mood 8.52088 1.11910 7.614 7.11e-07 *** budget 0.05334 0.05552 0.961 0.35014 Signif. "0 '***'" "0.001 '**'" 0.01 "'*' 0.05" '. 0.1 "' 1"| terms | 3.47026 | 1.07263 | 3.235 | 0.00486 | ** | | | | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | | mood | 8.52088 | 1.11910 | 7.614 | 7.11e-07 | *** | | | | budget | 0.05334 | 0.05552 | 0.961 | 0.35014 | | | | | Signif. | 0 <br> '***' | 0.001 <br> '**' | 0.01 | '*' <br> 0.05 | '. | 0.1 | ' <br> 1 |
Residual standard error: 2.499 on 17 degrees of freedom
殘差標準誤:2.499,自由度為 17

(2 observations deleted due to missingness)
(因缺失值刪除了 2 個觀測值)

Multiple R-squared: 0.8022, Adjusted R-squared: 0.7557
多元 R 平方值:0.8022,調整後 R 平方值:0.7557

Some comparative statics
一些比較靜態分析

mx <- apply(mh[,3:6], 2, mean, na.rm=TRUE)
# Unified v. divided
predict(mh1, newdata = data.frame(unif=0, terms=mx[2], mood=mx[3], budget=mx[4]), type = "response")
terms
11.34193
1 predict(mh1, newdata = data.frame(unif=1, terms=mx[2], mood=mx[3], budget=mx[4]), type = "response")
terms
10.95795
# Budget range
predict(mh1, newdata = data.frame(unif=mx[1], terms=mx[2], mood=mx[3], budget=-24), type="response")
unif均勻
10.59004
1 predict(mh1, newdata = data.frame(unif=mx[1], terms=mx[2], mood=mx[3], budget=21), type="response")
unif均勻
12.24786
# Great Society
predict(mh1, newdata = data.frame(unif=1, terms=mx[2], mood=1, budget=-24), type="response")

terms術語

16.08647 16.08647 16.0864716.08647
1 predict(mh1, newdata = data.frame(unif=0, terms=mx[2], mood=0, budget=21), type="response")

Some comparisons with marginaleffects
與 marginaleffects 的一些比較

Now we will do this with a different package: marginaleffects
現在我們將使用不同的套件來進行此操作:marginaleffects

See docs here:參見此處文檔:
https://vincentarelbundock.github.io/marginaleffects/articles/comparisons.html
1 library(marginaleffects)
2 avg_comparisons(mh1)
\begin{tabular}{lcrrrrrrrr}
\(\quad\) Term Contrast & Estimate & Std. Error & z & \(\operatorname{Pr}(>|z|)\) & \(S\) & 2.5 & \(\%\) & \(97.5 \%\) \\
budget & +1 & 0.0393 & 0.0871 & 0.451 & 0.6520 & 0.6 & -0.131 & 0.21 \\
mood & \(1-0\) & 8.4620 & 1.7021 & 4.972 & \(<0.001\) & 20.5 & 5.126 & 11.80 \\
terms & \(1-0\) & 3.3438 & 1.4908 & 2.243 & 0.0249 & 5.3 & 0.422 & 6.27 \\
unif & \(1-0\) & -0.4171 & 1.5287 & -0.273 & 0.7850 & 0.3 & -3.413 & 2.58
\end{tabular}
Type: response
Columns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
See help(“comparisons”) for more details.
詳見幫助文件「comparisons」以獲取更多細節。

Marginal effects plots邊際效應圖

These are different ones from the other packages.
這些與其他套件中的不同。
1 plot_slopes(mh1, variables = "budget"
2 condition = list("budget", "unif"))

Another one另一個

1 plot_slopes(mh1, variables = "budget",
2 condition = list("budget", "unif", "mood"))