Dealing with skewed data: an example using asthma-related costs of medicaid clients

Clin Ther. 2001 Mar;23(3):481-98. doi: 10.1016/s0149-2918(01)80052-7.

Abstract

Background: Cost data often are nonnormally distributed due to a few very high cost values that may not necessarily be dismissed as outliers. Researchers have not reached agreement on how to appropriately deal with skewed cost data.

Objectives: This study presents an example of skewed cost data that were collected retrospectively from the Texas Medicaid database. Common methods of dealing with skewed cost distributions are discussed. Data were analyzed using various methods, and the statistical results of each test were compared.

Methods: Prescription and medical claims data extracted from the Texas Medicaid database were analyzed using the Mann-Whitney U test and t tests of untransformed, log-transformed, and bootstrapped data.

Results: All distributions of the untransformed cost data were nonnormally distributed, and comparison groups had unequal variances. The Mann-Whitney U test negated the effect of the high-cost patients and gave a significant result for overall cost differences between groups, but in the opposite direction of the mean. The t tests on raw data and log-transformed data may not have been optimal because distributions of both raw costs and log-costs were nonnormal.

Conclusions: The bootstrap method does not need to meet the assumptions of normality and equal variances. In analyses of small sample sizes with skewed cost data, the bootstrap method may offer an alternative to the more traditional nonparametric or log-transformation techniques.

MeSH terms

  • Asthma / drug therapy*
  • Health Care Costs*
  • Humans
  • Medicaid*