估計量的偏誤

在統計學中，估計量的偏誤（或偏誤函數）是此估計量的期望值與估計母數的真值之差。偏誤為零的估計量或決策規則稱為不偏的。否則該估計量是偏誤的。在統計中，「偏誤」是一個函數的客觀陳述。

偏誤也可以相對於中位數來衡量，而非相對於均值（期望值），在這種情況下為了與通常的「均值」不偏性區別，稱作「中值」不偏。偏誤與一致性相關聯，一致估計量都是收斂並且漸進不偏的（因此會收斂到正確的值），雖然一致序列中的個別估計量可能是偏誤的（只要偏誤收斂於零）；參見偏誤與一致性。

當其他量相等時，不偏估計量比偏誤估計量更好一些，但在實踐中，並不是所有其他統計量的都相等，於是也經常使用偏誤估計量，一般偏誤較小。當使用一個偏誤估計量時，也會估計它的偏誤。偏誤估計量可能用於以下原因：由於如果不對母體進一步假設，不偏估計量不存在或很難計算（如標準差的不偏估計（英語：unbiased estimation of standard deviation））；由於估計量是中值不偏的，卻不是均值不偏的（或反之）；由於一個偏誤估計量較之不偏估計量（特別是收縮估計量（英語：shrinkage estimator））可以減小一些損失函數（尤其是均方差）；或者由於在某些情況下，不偏的條件太強，這種情況不偏估計量不是必要的。此外，在非線性轉換下均值不偏性不會保留，不過中值不偏性會保留（參見轉換的效應）；例如樣本變異數是母體變異數的不偏估計量，但它的平方根標準差則是母體標準差的偏誤估計量。下面會進行說明。

定義

設我們有一個母數為實數 θ 的機率模型，產生觀測數據的機率分布 $P_{\theta }(x)=P(x\mid \theta )$ ，而統計量 ${\hat {\theta }}$ 是基於任何觀測數據 $x$ 下 θ 的估計量。也就是說，我們假定我們的數據符合某種未知分布 $P_{\theta }(x)=P(x\mid \theta )$ （其中 θ 是一個固定常數，而且是該分布的一部分，但具體值未知），於是我們構造估計量 ${\hat {\theta }}$ ，該估計量將觀測數據與我們希望的接近 θ 的值對應起來。因此這個估量的（相對於母數 θ的）偏誤定義為

\operatorname {Bias} _{\theta }[\,{\hat {\theta }}\,]=\operatorname {E} _{\theta }[\,{\hat {\theta }}\,]-\theta =\operatorname {E} _{\theta }[\,{\hat {\theta }}-\theta \,],

其中 $\operatorname {E} _{\theta }$ 表示分布 $P_{\theta }(x)=P(x\mid \theta )$ 的期望值，即對所有可能的觀測值 $x$ 取平均。由於 θ 對於條件分布 $P(x\mid \theta )$ 是可測的，就有了第二個等號。

對於母數 θ 的所有值的偏誤都等於零的估計量稱為不偏估計量。

在一次關於估計量性質的模擬實驗中，估計量的偏誤可以用平均有符號離差（英語：mean signed difference）來評估。

例子

樣本變異數

隨機變數的樣本變異數從兩方面說明了估計量偏誤：首先，自然估計量（naive estimator）是偏誤的，可以通過比例因子校正；其次，不偏估計量的均方差（MSE）不是最優的，可以用一個不同的比例因子來最小化，得到一個比不偏估計量的MSE更小的偏誤估計量。

具體地說，自然估計量就是將離差平方和加起來然後除以 n，是偏誤的。不過除以 n − 1 會得到一個不偏估計量。相反，MSE可以通過除以另一個數來最小化（取決於分布），但這會得到一個偏誤估計量。這個數總會比 n − 1 大，所以這就叫做收縮估計量（英語：shrinkage estimator），因為它把不偏估計量向零「收縮」；對於常態分布，最佳值為 n + 1。

設 X₁, ..., X_n 是期望值為 μ、變異數為 σ² 的獨立同分布（i.i.d.）隨機變數。如果樣本均值與未修正樣本變異數定義為

{\overline {X}}={\frac {1}{n}}\sum _{i=1}^{n}X_{i},\qquad S^{2}={\frac {1}{n}}\sum _{i=1}^{n}\left(X_{i}-{\overline {X}}\,\right)^{2},

則 S² 是 σ² 的一個偏誤估計量，因為

{\begin{aligned}\operatorname {E} [S^{2}]&=\operatorname {E} \left[{\frac {1}{n}}\sum _{i=1}^{n}{\big (}X_{i}-{\overline {X}}{\big )}^{2}\right]=\operatorname {E} {\bigg [}{\frac {1}{n}}\sum _{i=1}^{n}{\bigg (}(X_{i}-\mu )-({\overline {X}}-\mu ){\bigg )}^{2}{\bigg ]}\\[8pt]&=\operatorname {E} {\bigg [}{\frac {1}{n}}\sum _{i=1}^{n}{\bigg (}(X_{i}-\mu )^{2}-2({\overline {X}}-\mu )(X_{i}-\mu )+({\overline {X}}-\mu )^{2}{\bigg )}{\bigg ]}\\[8pt]&=\operatorname {E} {\bigg [}{\frac {1}{n}}\sum _{i=1}^{n}(X_{i}-\mu )^{2}-{\frac {2}{n}}({\overline {X}}-\mu )\sum _{i=1}^{n}(X_{i}-\mu )+{\frac {1}{n}}({\overline {X}}-\mu )^{2}\sum _{i=1}^{n}1{\bigg ]}\\[8pt]&=\operatorname {E} {\bigg [}{\frac {1}{n}}\sum _{i=1}^{n}(X_{i}-\mu )^{2}-{\frac {2}{n}}({\overline {X}}-\mu )\sum _{i=1}^{n}(X_{i}-\mu )+{\frac {1}{n}}({\overline {X}}-\mu )^{2}\cdot n{\bigg ]}\\[8pt]&=\operatorname {E} {\bigg [}{\frac {1}{n}}\sum _{i=1}^{n}(X_{i}-\mu )^{2}-{\frac {2}{n}}({\overline {X}}-\mu )\sum _{i=1}^{n}(X_{i}-\mu )+({\overline {X}}-\mu )^{2}{\bigg ]}\\[8pt]\end{aligned}}

換句話說，未修正的樣本變異數的期望值不等於母體變異數 σ²，除非乘以歸一化因子。而樣本均值是母體均值 μ 的不偏^[1]估計量。

S² 是偏誤的原因源於樣本均值是 μ 的普通最小平方（英語：ordinary least squares）（OLS）估計量這個事實： ${\overline {X}}$ 是令 $\sum _{i=1}^{n}(X_{i}-{\overline {X}})^{2}$ 儘可能小的數。也就是說，當任何其他數代入這個求和中時，這個和只會增加。尤其是，在選取 $\mu \neq {\overline {X}}$ 就會得出，

{\frac {1}{n}}\sum _{i=1}^{n}(X_{i}-{\overline {X}})^{2}<{\frac {1}{n}}\sum _{i=1}^{n}(X_{i}-\mu )^{2},

於是

{\begin{aligned}\operatorname {E} [S^{2}]&=\operatorname {E} {\bigg [}{\frac {1}{n}}\sum _{i=1}^{n}(X_{i}-{\overline {X}})^{2}{\bigg ]}<\operatorname {E} {\bigg [}{\frac {1}{n}}\sum _{i=1}^{n}(X_{i}-\mu )^{2}{\bigg ]}=\sigma ^{2}.\end{aligned}}

注意到，通常的樣本變異數定義為

s^{2}={\frac {1}{n-1}}\sum _{i=1}^{n}(X_{i}-{\overline {X}}\,)^{2},

而這時母體變異數的不偏估計量。可以由下式看出：

\operatorname {E} {\big [}({\overline {X}}-\mu )^{2}{\big ]}={\frac {1}{n}}\sigma ^{2}.

變異數的偏誤（未修正）與不偏估計之比稱為自由度修正（英語：Bessel's correction）。

參見

參考文獻

Brown, George W. "On Small-Sample Estimation." The Annals of Mathematical Statistics, vol. 18, no. 4 (Dec., 1947), pp. 582–585.
JSTOR 2236236
.
Lehmann, E. L.（英語：Erich Leo Lehmann） "A General Concept of Unbiasedness" The Annals of Mathematical Statistics, vol. 22, no. 4 (Dec., 1951), pp. 587–592.
JSTOR 2236928
.
Allan Birnbaum（英語：Allan Birnbaum）, 1961. "A Unified Theory of Estimation, I", The Annals of Mathematical Statistics, vol. 32, no. 1 (Mar., 1961), pp. 112–135.
Van der Vaart, H. R., 1961. "Some Extensions of the Idea of Bias" The Annals of Mathematical Statistics, vol. 32, no. 2 (June 1961), pp. 436–447.
Pfanzagl, Johann. 1994. Parametric Statistical Theory. Walter de Gruyter.
Stuart, Alan; Ord, Keith; Arnold, Steven [F.]. Classical Inference and the Linear Model. Kendall's Advanced Theory of Statistics 2A. Wiley. 2010. ISBN 0-4706-8924-2. .
Voinov, Vassily [G.]; Nikulin, Mikhail [S.]. Unbiased estimators and their applications. 1: Univariate case. Dordrect: Kluwer Academic Publishers. 1993. ISBN 0-7923-2382-3.
Voinov, Vassily [G.]; Nikulin, Mikhail [S.]. Unbiased estimators and their applications. 2: Multivariate case. Dordrect: Kluwer Academic Publishers. 1996. ISBN 0-7923-3939-8.
Klebanov, Lev [B.]; Rachev, Svetlozar [T.]; Fabozzi, Frank [J.]. Robust and Non-Robust Models in Statistics. New York: Nova Scientific Publishers. 2009. ISBN 978-1-60741-768-2.

外部連結

Hazewinkel, Michiel (編), Unbiased estimator, 数学百科全书, Springer, 2001, ISBN 978-1-55608-010-4

^ Richard Arnold Johnson; Dean W. Wichern. Applied Multivariate Statistical Analysis. Pearson Prentice Hall. 2007 [10 August 2012]. ISBN 978-0-13-187715-3. （原始內容存檔於2016-05-29）.

[JohnsonWichern2007-1] Richard Arnold Johnson; Dean W. Wichern. Applied Multivariate Statistical Analysis. Pearson Prentice Hall. 2007 [10 August 2012]. ISBN 978-0-13-187715-3. （原始內容存檔於2016-05-29）.

[1]