Essay Final Project 1

Submitted By lilywang0314
Words: 1737
Pages: 7

Consumer Complaints: Financial Products and Regional Analysis

Introduction
As a result of the financial crisis, the financial services industry has been under the spotlight in recent years. It has been alleged that financial institutions have poor business practices, leading to increased risk being transferred to consumers. These poor business practices would be expected to be reflected in consumer complaint data for these services. Public unrest with the financial services industry has also mainly been focused on certain products, such as mortgages, and large public financial corporations, such as Bank of America, or J.P. Morgan Chase.

The goal of our analysis is to understand the actual underlying structure and trends of consumer complaints regarding consumer financial instruments in order to understand if these assumptions hold true. This includes which financial products receive the most complaints, which geographic areas these complaints originate from, and whether a link between company size and consumer complaints can be identified.

Related Work
While multiple studies have been conducted to analyze consumer complaints in relation to behavior, strategy, satisfaction, and response; our approach to review consumer complaint data through financial product geographic segmentation can be deemed unique.

Processes
To fulfill the goals of this analysis, our team explored the following three research questions:

1. For each year in the consumer complaint dataset, which three products recorded the most complaints?

2. What region has the highest proportion of complaints in relation to the population of complaints for each year in the consumer complaint dataset? Within this region, which state has the highest proportion of complaints, and within this state, which zip code has the highest proportion of complaints?

3. Can a correlation between the market capitalization of the company and the rate of complaints received be identified?

To address these questions, we wrote multiple python programs utilizing the pandas library to analyze and visualize the data set.

Prior to starting our analysis of any of our purposed questions, our first step was to confirm data integrity and perform any data cleansing that would be necessary. Right away we discovered that our dataset did not contain complete data for all the years listed. The team concluded that to ensure our analysis was not skewed; the original dataset had to be reduced to only the available years of complete data, 2012 and 2013. From this point, we were able to dig into the data and start answering questions.

1. For each year in the consumer complaint dataset, which three products recorded the most complaints?

The team utilized Python to import the entire dataset into a pandas DataFrame. From the original import we created two additional DataFrames which contained the separated 2012 and 2013 complaint data. We then performed a groupby sum function on both DataFrames, with the grouped criteria being the financial product (ex. Mortgage, Credit Card, etc.). This provided a very clear picture of the top three financial products in each year that received the most complaints. We then chose to graphically display these results by plotting a bar graph. To ensue the clarity of our results, we differentiated the top three complaint products by color.

2. What region has the highest proportion of complaints in relation to the population of complaints for each year in the consumer complaint dataset? Within this region, which state has the highest proportion of complaints, and within this state, which zip code has the highest proportion of complaints?

In order to effectively perform analysis for our second inquiry, we needed to expand our data to encompass population demographics. The original complaint data set provided us with the state and zip code where the complaint was received. This allowed us to merge our existing DataFrame with the following additional