Essay about UFO Statistical Analysis

Submitted By phemy5485
Words: 1004
Pages: 5

Following the footprints of UFO..
Bilikis Osomo

The UFO Dataset


A collection of UFO incident reports recorded across the US and
Canada. Each incident is recorded with the following information


Date and Time



City and State



Shape of object



Duration of incident



Summary of incident



There are over 93,000 incidents reported so far, since 1998.



The data set is available from www.nuforc.org

Problem Statement


Is there a predictable pattern to the incident occurrence based on time, location, and duration?



Is there a correlation between the color and shape of the object observed? 

Through text mining, find what were the most frequently used words by witnesses to describe an UFO.



By studying the distribution of time between UFO occurrences, can we predict the chances of next UFO appearance?



Is the population has anything to do with the chances of reporting an
UFO.



Associate the different parameters to bring meaningful insights and patterns. Challenges with the UFO Dataset


Inconsistent date and time formats: Time of incident was not recorded for about 2600 incidents. We have done missing value imputation on these cases.



Inconsistent place information: Around 6500 records had no mention of State. These type of records were included in the analysis where the
State parameter was not considered, but excluded in all other cases.



Missing shape information: Around 1500 records were missing the shape of the UFO. Missing value imputation performed and the value is set as
“Unknown” for these cases.



No Durations: No uniformity followed in recording this value. We categorized and the duration detail is preserved as “In seconds”, “In minutes”, “In hours”.



Extracting from summary: String search applied on this column to identify the color of the UFO object by matching the details with a list of known colors.
A new “color” column created based on the search.
Records with no mention of color is set as
“Unknown”.

States and Shapes Summary
UFO incidents by states

UFO incidents by Shapes

State Population Vs UFO Reporting


Population of a state has a very high impact on the number of UFO reported 

Population of US states taken from https://www.census.gov.



Population and UFO incidents are mapped spatially in the below maps using ggplot.



States WA and NV slightly contradicts this claim – It shows low number of reporting.

Chances of Future UFO
Appearance


Fitting the time series to a standard probability distribution.



Observed that the time series is closely following an Exponential distribution. 

By estimating the parameter of the Exponential distribution, we have computed the chances of UFO appearing in a state, say in the next
24 hours.



Based on the past occurrences, the following states have highest chances that the UFO will appear in the next 24 hours. CA, TX, FL,
AZ has the highest chances. 96% chances!!!



The rest of the states together just has a 4% chance of a UFO appearing in the next 24 hours.



This closely follows the Pareto principle – the 80-20 rule; i.e., large number of reporting are from small fraction of states.

Time Series Analysis – Case Study of CA


Fitdist (normal distribution) yields a negative likelihood of -42482; and Fitdist (Exponential

distribution) yields a negative likelihood of -35004. The larger the better therefore,
Exponential
distribution is the relatively best distribution that fits the data considered.


To estimate the probability of event occurring in next 24 hours is given by

P(X<= 24) = the cumulative function of exponential distribution = 1-e^(-rate*24)


For the State - CA

summary(fitexp)
Fitting of the distribution ' exp ' by maximum likelihood
Parameters : estimate Std. Error rate 0.06340195 0.0006567898
Loglikelihood: -35004.44

AIC: 70010.88 BIC: 70018.02

> 1-(exp(-0.06340195*24))
0.781648 (80% probability that an incident can occur in the next 24 hours in state CA)

Fitting Distribution –