Multi-way multi-group segregation and diversity indices
Background: How can we compute a segregation or diversity index from a three-way or multi-way contingency table, where each variable can take on an arbitrary finite number of values and where the index takes values between zero and one? Previous methods only exist for two-way contingency tables or dichotomous variables. A prototypical three-way case is the segregation index of a set of industries or departments given multiple explanatory variables of both sex and race. This can be further extended to other variables, such as disability, number of years of education, and former military service. Methodology/Principal Findings: We extend existing segregation indices based on Euclidean distance (square of coefficient of variation) and Boltzmann/Shannon/Theil index from two-way to multi-way contingency tables by including multiple summations. We provide several biological applications, such as indices for age polyethism and linkage disequilibrium. We also provide a new heuristic conceptualization of entropy-based indices. Higher order association measures are often independent of lower order ones, hence an overall segregation or diversity index should be the arithmetic mean of the normalized association measures at all orders. These methods are applicable when individuals selfidentify as multiple races or even multiple sexes and when individuals work part-time in multiple industries. Conclusions/Significance: The policy implications of this work are enormous, allowing people to rigorously test whether employment or biological diversity has changed.