Skip to content

larsvilhuber/jobcreationblog

Repository files navigation

title author date output csl bibliography nocite
Replication for: How Much Do Startups Impact Employment Growth in the U.S.?
Lars Vilhuber
December 1, 2016
html_document pdf_document
highlight keep_md theme toc
tango
true
journal
true
toc
true
acm-siggraph.csl
references.bib
original.bib
references-alt.bib
@allaire2019rmarkdown, @arnold2018ggthemes, @wickham2016ggplot2, @xie2019knitr

DOI

The goal of this project is to demonstrate the feasibility of creating replicable blog posts for national statistical agencies. We pick a single blog post from the United States Census Bureau, but the general principle could be applied to many countries' national statistical agencies.

Source document

A blog post by Jim Lawrence, U.S. Census Bureau [-@lawrence2016] (archived version, locally archived version).

Source data

Data to produce a graph like this can be found at https://www.census.gov/ces/dataproducts/bds/data_firm.html. Users can look at the economy-wide data by age of the firm, where startups are firms with zero age:

Select Firm Age

Getting and manipulating the data

We will illustrate how to generate Figure 1 using R -@2019language. Users wishing to use Javascript, SAS, or Excel, or Python, can achieve the same goal using the tool of their choice. Note that we will use the full CSV file at http://www2.census.gov/ces/bds/firm/bds_f_age_release.csv, but users might also want to consult the BDS API.

bdsbase <- "http://www2.census.gov/ces/bds/"
type <- "f_age"
ltype <- "firm"
# for economy-wide data
ewtype <- "f_all"

fafile <- paste("bds_",type,"_release.csv",sep="")
ewfile <- paste("bds_",ewtype,"_release.csv",sep="")

# this changes whether we read live data or Zenodo data
bds.from.source <- TRUE

We are going to read in two files: the economy wide file bds_f_all_release.csv, and the by-firm-age file bds_f_age_release.csv:

# we need the particular type 
if ( bds.from.source ) {
  conr <- gzcon(url(paste(bdsbase,ltype,fafile,sep="/")))
  txt <- readLines(conr)
  bdstype <- read.csv(textConnection(txt))
  # the ew file
  ewcon <- gzcon(url(paste(bdsbase,ltype,ewfile,sep="/")))
  ewtxt <- readLines(ewcon)
  bdsew <- read.csv(textConnection(ewtxt))
}

We're going to now compute the fraction of total U.S. employment (Emp) that is accounted for by job creation from startups (Job_Creation if fage4="a) 0"):

analysis <- bdsew[,c("year2","emp")]
analysis <- merge(x = analysis, y=subset(bdstype,fage4=="a) 0")[,c("year2","Job_Creation")], by="year2")
analysis$JCR_startups <- analysis$Job_Creation * 100 / analysis$emp
# properly name everything
names(analysis) <- c("Year","Employment","Job Creation by Startups", "Job Creation Rate by Startups")

Create Figure 1

Now we simply plot this for the time period 2004-2014:

Compare to original image:

original image

References