Introduction to the Data

Background of the SBA(FOIA):

The U.S. Small Business Administration is an independent agency of the United States government that provides support to entrepreneurs and small businesses. The SBA provides loans through banks, credit unions and lenders who partner with the SBA.

Background of Annual Survey of Manufacturers (ASM):

Focuses on the manufacturing sector as opposed to the SBA which includes all industries. This dataset gives insight to the manufacturing sector and includes data on expenses, capital expenditures (cost a company obtains to maintain, gain and improve assets) and inventory among other things.

Part 1: FOIA & ASM Merged Data

First we are going to rename the Geographic Area Name to State in the asm data. We also want to replace the full name of the states with their abbreviations. For example Alabama with AL

##  [1] "AL" "AK" "AZ" "AR" "CA" "CO" "CT" "DE" "DC" "FL" "GA" "HI" "ID" "IL" "IN"
## [16] "IA" "KS" "KY" "LA" "ME" "MD" "MA" "MI" "MN" "MS" "MO" "MT" "NE" "NV" "NH"
## [31] "NJ" "NM" "NY" "NC" "ND" "OH" "OK" "OR" "PA" "RI" "SC" "SD" "TN" "TX" "UT"
## [46] "VT" "VA" "WA" "WI" "WY"

We are going to group the foia data by NaicsIndustry, BorrState and Approval Fiscal Year to only include the years in the asm dataset (2018,2019,2020,2021) We are going to filter out PR and GU and do some summarizing on the 50 states.

We also renamed the borrstate column to State and approval fiscal year to year. This is to ensure the left-join works as intended.

## `summarise()` has grouped output by 'NaicsIndustry', 'BorrState'. You can
## override using the `.groups` argument.

Now we are going to do a left join, joining by state and year.We are going to drop the naics description since they are all from the manufacturing industry.

Filtering to see what states most of the NA values are in We see that they are in Arkansas, Delaware, DC, Mississippi,Montana and Wyoming we are just going to drop these values. Since multiple columns are missing values.

##   State Year Total_Fringe_Benefits Total_Materials_Cost Cost_Materials_Used
## 1     0    0                     0                    0                   0
##   Cost_Resales Inventories_Beginning_Year Finished_Goods_Beginning
## 1            0                          0                       15
##   Work_In_Process_Beginning Materials_Supplies_Beginning Inventories_End_Year
## 1                        15                            0                    0
##   Finished_Goods_End Work_In_Process_End Materials_Supplies_End
## 1                  5                   5                      0
##   Total_Capital_Expenditures Capital_Buildings Capital_Machinery
## 1                          0                 3                 3
##   Capital_Automobiles Capital_Computers Capital_Other sum_gross_approval
## 1                   3                 3             1                  0
##   sum_jobs_supported sum_third_party_dollars
## 1                  0                       0

Graphs for the asm and foia joined dataset

First we are going to make a graph on the sum of jobs supported by each state. We are going to separate by year, and make separate graphs for each of the years so we can see how, over time,these loans are supporting jobs in the manufacturing industry for each respective state.

## Warning in sf_column %in% names(g): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 1 of `x` matches multiple rows in `y`.
## ℹ Row 172 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.

California consistently has the most jobs supported throughout the years. The explanation of this could be due to the fact that California has the highest GDP. https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_GDP

Texas also consistently has been supporting a good amount of jobs throughout the years

Also, in 2021 across the US there were more jobs being supported compared to previous years

Some other questions we can want to investigate

Is there a relationship between Total Fringe Benefits and the amount of jobs supported?

What Regions have the highest Total Fringe Benefits?

We are going to add another column named region and group the states by region.

We first want to look at the distribution of Total Fringe Benefits to see if the data is skewed a particular way.

Knowing the data is heavily right skewed we are just going to log the total fringe benefits so that we can see the relative differences between regions.

According to the box plot above the west has the widest range of fringe benefits, this shows that there is more variability and tells us that the benefits vary greatly in this region.The Midwest has the highest median of total fringe benefits which tells us employees in the midwest are more likely to receive more benefits than other regions.The South has the least variability in fringe benefits which tells us that these benefits are consistent in the region.

We also want to see how the average fringe benefits change overtime.

According to the graph the midwest and west follow a similar trend with Fringe benefits decreasing from 2018 hitting its lowest in 2020. Then increasing in 2021.

The northeast, southeast and south all follow a similar trend as well in fringe benefits. With Fringe benefits slightly increasing from 2018 to 2019, then decreasing in 2020 to then increasing in 2021.

## `summarise()` has grouped output by 'region'. You can override using the
## `.groups` argument.

## Warning: line.color doesn't (yet) support data arrays
## Warning: line.color doesn't (yet) support data arrays
## Warning: line.color doesn't (yet) support data arrays
## Warning: line.color doesn't (yet) support data arrays
## Warning: line.color doesn't (yet) support data arrays
## Warning: line.color doesn't (yet) support data arrays
## Warning: line.color doesn't (yet) support data arrays
## Warning: line.color doesn't (yet) support data arrays
## Warning: line.color doesn't (yet) support data arrays
## Warning: line.color doesn't (yet) support data arrays

Part 2: FOIA Individual Data

This part of the report analyzes FOIA data, focusing on loans in the manufacturing industry.

The first step is filtering the data to focus on the manufacturing industry and selecting relevant columns.

## Rows: 12,444
## Columns: 6
## $ ApprovalFiscalYear   <int> 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2…
## $ ProjectState         <chr> "FL", "MA", "WI", "NJ", "TN", "TX", "CA", "UT", "…
## $ TermInMonths         <int> 240, 240, 120, 240, 240, 240, 240, 240, 240, 240,…
## $ GrossApproval        <int> 2019000, 211000, 174000, 446000, 762000, 899000, …
## $ JobsSupported        <int> 21, 4, 10, 7, 50, 10, 20, 8, 15, 15, 20, 0, 12, 5…
## $ GrossChargeOffAmount <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…

Trends in Loan Metrics

To analyze loan trends, the data is grouped by fiscal year. We calculate the total gross approval and average loan term for each year, to create a line graph of loan trends over the years.

We will rescale the total loan amount to millions to make the values easier to interpret.

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

The graph above shows trends in total loan amounts and average loan terms over time, with peaks highlighted and 2020 marked for the COVID-19 impact. As we can see, the highest peak in total loan amount was in 2021, which aligns with the aftermath of the COVID-19 pandemic. This increase was probably due to recovery programs and increased borrowing to help manufacturing businesses stay afloat during this hard time.

Regional Analysis of Loans

For our next visual, we grouped states into regions to avoid overcrowding and analyze average loan amounts and jobs supported more clearly.

To make the average loan amounts easier to compare across regions, we rescaled them to a log scale, which helps show differences more clearly.

## `geom_smooth()` using formula = 'y ~ x'

The scatter plot above shows the relationship between average loan amounts (on a log scale) and average jobs supported for different regions. The South and West have higher average loan amounts, which might mean larger businesses or bigger funding needs. Territories, even with smaller loans, support more jobs on average, likely due to smaller businesses needing more workers.

Charge-Off Data Analysis

For our last visual in this part, we are analyzing and visualizing the total charge-off amounts by state, rescaled to millions for easier interpretation.

We are going to highlight states with significant charge-off amounts using a color gradient, where lighter shades represent lower amounts and darker shades indicate higher amounts. This gradient will help emphasize the states with the highest loan charge-offs.

The map above shows that California, New York, and Illinois have the highest charge-off amounts, highlighted by the darkest colors. This is likely because these states have larger economies and more loans, so when businesses face challenges, the total charge-offs tend to be higher.

Part 3: ASM Individual Data

Let’s start by trying to create a map of the Total Capital Expenditures by Each State and year. In order to do this we have to first prepare the US States Map to recognize the abbreviations. Next, we have to summarize the data bu the state and year.

## `summarise()` has grouped output by 'State'. You can override using the
## `.groups` argument.

Next, lets merge the map data with the States and create the map and filter out any NA values.

## Warning in left_join(., asm_summary, by = "State"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 1 of `x` matches multiple rows in `y`.
## ℹ Row 5 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.

Finally, let’s generate an interactive map of total capital expenditure by state and year.

The interactive map above shows the states that have the highest level of Total Capital Expenditures each year. The Capital Expenditure shows how much a company invests in existing and new fixed assets to maintain or grow its business. Clearly, we see that Texas and California have the highest amount of Capital Expenditures which makes sense because they are the largest states which allows them to have access to a large population workforce. Additionally, they have favorable business regulations that allow for people to start manufacturing there.

Let’s see if there is a relationship between the total fringe benefits that employees recieve and the total capital expenditures for that year. First let’s pick which variables to include

## `geom_smooth()` using formula = 'y ~ x'

## NULL

The graph above was used to show if the total capital expenditures have any effect on the total fringe benefits, which are benefits employees recieve from their employers. In the graph, it looks like as the total capital expenditure increases so does the amount of total fringe benefits for the most part.

We are going to look at which states and which year had the highest material cost based on the state.

## `summarise()` has grouped output by 'Industry'. You can override using the
## `.groups` argument.

The bar graph above shows the total capital expenditures per year based on the industry. We see that the highest capital investments are machinery and other. We know that machinery for heavy manufacturing is essential but we don’t have an insight as to what other capital expenses are made in the other category.

Final Report

Patience Heath, Amna Dar, Sanjida Rashid

2024-12-01