I was reading a paper recently (Autor, Dorn, Hanson, and Song; QJE 2014) that reported some statistics regarding the time series of the manufacturing share of employment in the US. ADHS used the County Business Patterns dataset to compute these statistics.
The reported numbers didn’t seem in line with what I thought I remembered
seeing when I looked at the
MANEMP and
PAYEMS time series from FRED,
both of which originally come from the US Bureau of Labor statistics.
I wanted to see if computing the manufacturing share of employment from these two sources lead to qualitatively different answers or if I was just remembering wrong. The figure I ended up with
From this picture it seems I was just remembering wrong. Overall the two lines are quite close, which makes me confident in the similarity of the two sources.
The spike in the County Business Patterns line between 1997 and 1998 is a result of the Census Bureau switching from tracking industries using the SIC code to tracking them with NAICS codes. You can see a few lines of code in the snippet below that try to reconcile the differences between the two systems, but it appears that I wasn’t able to do a perfect job.
The Python code I used to build the figure is included below. Note that in
order to run it you need to install the currently (as of 2017-05-04)
un-published Python package uscensus. To do this enter the following at your
terminal or command prompt:
pip install git+git://github.com/sglyon/uscensus.git
The code:
import uscensus
import pandas_datareader
import pandas as pd
import numpy as np
import plotly.plotly as py
import cufflinks as cf
# Average number of employees and manufacturing employees each year from
# 1990-2010 as reported by FRED
fred = pandas_datareader.DataReader(
["PAYEMS", "MANEMP"],
data_source="fred",
start="1990/01/01",
end="2010/01/01").resample("A").mean()
fred["man_share"] = fred["MANEMP"] / fred["PAYEMS"]
# collect similar results from the census
from_census = {"man": {}, "total": {}}
sic_codes = list(np.unique(
list(uscensus.util.naics2002_to_sics("31")) +
list(uscensus.util.naics2002_to_sics("32")) +
list(uscensus.util.naics2002_to_sics("33"))
))
for year in range(1990, 2011):
cbp = uscensus.CountyBusinessPatterns(year)
from_census["total"][year] = cbp.get(["EMP"], us=True)["EMP"][0]
industry_code = uscensus.util.cbp_industry_var[year]
if industry_code == "SIC":
# SIC manufacturing is anything starting with 2 or 3
# https://www.osha.gov/pls/imis/sic_manual.html
# in this case we need to do two queries, one to pick up SIC=2*
# and another for SIC=3*
df1 = cbp.get(["EMP"], us=True, SIC="2*")
df2 = cbp.get(["EMP"], us=True, SIC="3*")
df = pd.concat([df1, df2])
# now restrict ourselves to the sic codes in the Census' concordance
val = df[df.SIC.isin(sic_codes)]["EMP"].sum()
else:
# some form of NAICS. Man code is always 31-33 for all NAICS variants
# https://www.census.gov/cgi-bin/sssd/naics/naicsrch?chart=2007
kwarg = {industry_code: "31-33"}
val = cbp.get(["EMP"], us=True, **kwarg)["EMP"][0]
from_census["man"][year] = val
census = pd.DataFrame(from_census)
census["man_share"] = census["man"] / census["total"]
compare = pd.DataFrame(
{
"BLS": fred["man_share"].values,
"Community Business Patterns": census["man_share"].values
}, index=fred.index
)
compare.iplot(filename="man-share_cbp_vs_bls")
References
Autor, D. H., Dorn, D., Hanson, G. H., & Song, J. (2014). Trade Adjustment: Worker-Level Evidence. The Quarterly Journal of Economics, 129(4), 1799–1860. http://doi.org/10.1093/qje/qju026