Standard score: Difference between revisions - Wikipedia


Article Images
(25 intermediate revisions by 20 users not shown)

Line 2:

{{Use American English|date = January 2019}}

{{redirect|Standardize|industrial and technical standards|Standardization}}

{{redirect|Z-score}}

{{redirect|Z-score|Fisher z-transformation in statistics|Fisher transformation|Z-values in ecology|Z-value|z-transformation to complex number domain|Z-transform|Z-factor in high-throughput screening|Z-factor|Z-score financial analysis tool|Altman Z-score}}

[[File:The Normal Distribution.svg|thumb|upright=1.5|ComparesComparison of the various grading methods in a [[normal distribution.]], Includesincluding: Standard[[standard deviationsdeviation]]s, cumulative percentages, [[percentile]] equivalents, Zz-scores, [[#T-score|T-scores]]]]

In [[statistics]], the '''standard score''' is the number of [[standard deviation]]s by which the value of a [[raw score]] (i.e., an observed value or data point) is above or below the [[mean]] value of what is being observed or measured. Raw scores above the mean have positive standard scores, while those below the mean have negative standard scores.

It is calculated by subtracting the [[population mean]] from an individual raw score and then dividing the difference by the [[Statistical population|population]] standard deviation. This process of converting a raw score into a standard score is called '''standardizing''' or '''normalizing''' (however, "normalizing" can refer to many types of ratios; see ''[[Normalization (statistics)|normalizationNormalization]]'' for more).

Standard scores are most commonly called '''''z''-scores'''; the two terms may be used interchangeably, as they are in this article. Other equivalent terms in use include '''z-valuesvalue''', '''z-statistic''', '''normal scoresscore''', '''standardized variablesvariable''' and '''pull''' in [[high energy physics]].<ref>{{Cite book|url=https://cds.cern.ch/record/2005324?ln=en|title=2015 European School of High-Energy Physics: Bansko, Bulgaria 02 - 15 Sep 2015|date=2017|publisher=CERN|isbn=978-92-9083-472-4|editor-last=Mulders|editor-first=Martijn|series=CERN Yellow Reports: School Proceedings|location=Geneva|editor-last2=Zanderighi|editor-first2=Giulia}}</ref><ref>{{Cite journal |last=Gross |first=Eilam |date=2017-11-06 |title=Practical Statistics for High Energy Physics |url=https://e-publishing.cern.ch/index.php/CYRSP/article/download/303/405/2022 |journal=CERN Yellow Reports: School Proceedings |language=en |volume=4/2017 |pages=165–186 |doi=10.23730/CYRSP-2017-004.165}}</ref>

Computing a z-score requires knowledge of the mean and standard deviation of the complete population to which a data point belongs; if one only has a [[Sample (statistics)|sample]] of observations from the population, then the analogous computation using the sample mean and sample standard deviation yields the [[t-statistic|''t''-statistic]].

== Calculation 1,1==

If the population mean and population standard deviation are known, a raw score

Line 88:

! ACT

|-

|! Mean

| 1500

| 21

|-

|! Standard deviation

| 300

| 5

Line 104:

=== Percentage of observations below a z-score ===

Continuing the example of ACT and SAT scores, if it can be further assumed that both ACT and SAT scores are [[Normal distribution|normally distributed]] (which is approximately correct), then the z-scores may be used to calculate the percentage of test-takers who received lower scores than students A and B.

=== Cluster analysis and multidimensional scaling ===

"For some multivariate techniques such as multidimensional scaling and cluster analysis, the concept of distance between the units in the data is often of considerable interest and importance… When the variables in a multivariate data set are on different scales, it makes more sense to calculate the distances after some form of standardization."<ref name="EverittHothorn2011 ">{{Citation |last1= Everitt |first1= Brian |last2= Hothorn |first2= Torsten J |title= An Introduction to Applied Multivariate Analysis with R |year=2011|publisher= Springer

|isbn= 978-1441996497 }} </ref>

===Principal components analysis===

In principal components' analysis, "Variables measured on different scales or on a common scale with widely differing ranges are often standardized."<ref name="JohnsonWichern2007">{{Citation |last1= Johnson |first1= Richard |last2= Wichern |first2= Wichern |title= Applied Multivariate Statistical Analysis |year=2007|publisher= Pearson / Prentice Hall}}</ref>

=== Relative importance of variables in multiple regression: Standardizedstandardized regression coefficients ===

Standardization of variables prior to [[multiple regression analysis]] is sometimes used as an aid to interpretation.<ref name="AfifiMayClark2012">{{Citation |last1= Afifi |first1= Abdelmonem |last2= May |first2= Susanne K. |last3= Clark |first3= Virginia A. |title= Practical Multivariate Analysis

|edition= Fifth |year=2012 |publisher= Chapman & Hall/CRC |isbn= 978-1439816806}}</ref>

(page 95) state the following.

"The standardized regression slope is the slope in the regression equation if X and Y are standardized…standardized … Standardization of X and Y is done by subtracting the respective means from each set of observations and dividing by the respective standard deviations…deviations … In multiple regression, where several X variables are used, the standardized regression coefficients quantify the relative contribution of each X variable."

However, Kutner et al.<ref name="KutnerNachtsheim2004">{{Citation |last1= Kutner |first1= Michael |last2= Nachtsheim |first2= Christopher |last3= Neter |first3= John |title= Applied Linear Regression Models |edition= Fourth |year=204 |publisher= McGraw Hill|isbn= 978-0073014661 }}</ref> (p 278) give the following caveat: "… one must be cautious about interpreting any regression coefficients, whether standardized or not. The reason is that when the predictor variables are correlated among themselves, … the regression coefficients are affected by the other predictor variables in the model … The magnitudes of the standardized regression coefficients are affected not only by the presence of correlations among the predictor variables, but also by the spacings of the observations on each of these variables. Sometimes these spacings may be quite arbitrary. Hence, it is ordinarily not wise to interpret the magnitudes of standardized regression coefficients as reflecting the comparative importance of the predictor variables."

==Standardizing in mathematical statistics==

Line 134:

then the standardized version is

:<math>Z = \frac{\bar{X}-\operatorname{E}[\bar{X}]}{\sigma(X)/\sqrt{n}}.</math>

:

:Where the standardised sample mean's variance was calculated as follows:

:

:<math> \begin{array}{l}

\operatorname{Var}\left(\sum x_{i}\right) =\sum \operatorname{Var}(x_{i}) =n\operatorname{Var}(x_{i}) =n\sigma ^{2}\\

\operatorname{Var}(\overline{X}) =\operatorname{Var}\left(\frac{\sum x_{i}}{n}\right) =\frac{1}{n^{2}} \operatorname{Var}\left(\sum x_{i}\right) =\frac{n\sigma ^{2}}{n^{2}} =\frac{\sigma ^{2}}{n}

\end{array}</math>

:

==T-score==

{{distinguish-redirect|T-score|t-statistic{{!}}''t''-statistic}}

In educational assessment, '''T-score''' is a standard score Z shifted and scaled to have a mean of 50 and a standard deviation of 10.<ref>{{cite book|author1=John Salvia|author2=James Ysseldyke|author3=Sara Witmer|title=Assessment: In Special and Inclusive Education|url=https://books.google.com/books?id=57jdRoC4hCoC&pg=PA43|date=29 January 2009|publisher=Cengage Learning|isbn=978-0-547-13437-6|pages=43–}}</ref><ref>{{cite book|author1=Edward S. Neukrug|author2=R. Charles Fawcett|title=Essentials of Testing and Assessment: A Practical Guide for Counselors, Social Workers, and Psychologists|url=https://books.google.com/books?id=dejKAgAAQBAJ&pg=PA133|date=1 January 2014|publisher=Cengage Learning|isbn=978-1-305-16183-2|pages=133–}}</ref><ref>{{cite book|author=Randy W. Kamphaus|title=Clinical Assessment of Child and Adolescent Intelligence|url=https://books.google.com/books?id=sMSWbI23RMUC&pg=PA123|date=16 August 2005|publisher=Springer|isbn=978-0-387-26299-4|pages=123–}}</ref> It is also known as ''[[:ja:偏差値|hensachi]]'' in Japanese, where the concept is much more widely known and used in the context of university admissions.

In bone density measurements, the T-score is the standard score of the measurement compared to the population of healthy 30-year-old adults, and has the usual mean of 0 and standard deviation of 1.<ref>

Line 144 ⟶ 152:

==See also==

* [[Error function]]

* [[Mahalanobis distance]]

* [[Normalization (statistics)]]

* [[Omega ratio]]

* [[Standard normal deviate]]

* [[Studentized residual]]

==References==

Line 158 ⟶ 169:

== External links ==

* [https://zscorecalculator.org z-score calculator]

* [http://staff.argyll.epsb.ca/jreed/math30p/statistics/standardCurve.htm Interactive Flash on the z-scores and the probabilities of the normal curve] by Jim Reed

{{Statistics}}