Student: Statistical Hypothesis Testing and Clusters Essay

Submitted By xcheng5
Words: 4122
Pages: 17

Using scan type cluster detection methods to study spatial pattern of raw disease (leukemia) in United States
Xi Cheng #37413680

1. Introduction

Cluster analysis is one of the most important aspects in spatial statistics. It answers the question that whether incidences of a phenomenon are clustered or randomly distributed over space. This analysis has been actively pursued in recent decades to detect clustering in various phenomena. The identification of spatial clusters of disease incidence has been an important component in health studies because of its effectiveness in detecting and monitoring potential public health threats.

Disease incidences can be treated as point objects. The clustering of point objects could be tested using methods like k-mean, k-median, nearest neighbor statistic, etc. Howecer, on a map of points, an apparent cluster in a particular area could be misleading, because it may be explained simply by a clustering of the population itself. Thus, people are typically interested in clusters of disease incidences only after having adjusted for spatial variations in the density of the background population (Rogerson, 2009).

There are generally three types of cluster detection methods. The first type is global test (Moran’s I statistic, Geary’s C statistic) which uses a single quantity to summarize the degree to which an observed spatial pattern deviates from random pattern. The observed value of global test is compared with the expected value null hypothesis of spatial randomness, and the comparison leads to acceptance or rejection of the null. The problem of global test is that it does not give a direct indication about the size and location of regions which are inconsistent with the null hypothesis of randomness. The second type is local test, which is designed to test the null hypothesis of whether observed counts are raised in the vicinity of a particular, relative to the expected counts (local Moran statistic, Tango’s CF statistic, Getis’ Gi statistic). This type of test is typically used when the global test proves significant and there are some locations of interests, like environment hazard. Still, even if the global test is not significant, it is possible that there are significant local clusters. Last but not the least, the scan-type test are one type of cluster detection statistics designed to searching spatial clusters with no or little prior information about their location (Geographic Analysis Machine, Besag and Newell’s test, Kulldorff’s scan statistic). This kind of test scan the study area to find sub-regions that constitute spatial clustering. Often, such scanning is done not only across all locations, but also at different spatial scales (Rogerson, 2009).

The objective of the project is to try to use the scan-type tests implement in R’s package to study and compare the spatial pattern of leukemia in United States (mainland) with different constraints. The project could give some experience in cluster detection in R and help to understand the effects of constraints of the methods as well as the methods themselves. The methods employed in this project could be extended to other studies. In addition, the results may able to reveal certain natural of the disease.

This article is organized as follows. The next section (section 2) discusses the three methods employed in this project and their implementation. Section 3 describes the data and study area of this study. Sections 4 is the usage of the methods and results, respectively. Section 5 concludes the study.

2. Methods

2.1 Openshaw et al.'s (1987) Geographical Analysis Machine (GAM)

GAM places hypothetical circles with an initial radius on all the intersections of a predefined grid over the study area. An upper bound of radius is set in priori. For each circle, the number of incidences in the circle are counted and compared with the expected number of incidences under the null hypothesis that the incidences are