Employee counts

May 4, 2017 · 3 minute read

I was reading a paper recently (Autor, Dorn, Hanson, and Song; QJE 2014) that reported some statistics regarding the time series of the manufacturing share of employment in the US. ADHS used the County Business Patterns dataset to compute these statistics.

The reported numbers didn’t seem in line with what I thought I remembered seeing when I looked at the MANEMP and PAYEMS time series from FRED, both of which originally come from the US Bureau of Labor statistics.

I wanted to see if computing the manufacturing share of employment from these two sources lead to qualitatively different answers or if I was just remembering wrong. The figure I ended up with

From this picture it seems I was just remembering wrong. Overall the two lines are quite close, which makes me confident in the similarity of the two sources.

The spike in the County Business Patterns line between 1997 and 1998 is a result of the Census Bureau switching from tracking industries using the SIC code to tracking them with NAICS codes. You can see a few lines of code in the snippet below that try to reconcile the differences between the two systems, but it appears that I wasn’t able to do a perfect job.

The Python code I used to build the figure is included below. Note that in order to run it you need to install the currently (as of 2017-05-04) un-published Python package uscensus. To do this enter the following at your terminal or command prompt:

pip install git+git://github.com/sglyon/uscensus.git

The code:

import uscensus
import pandas_datareader
import pandas as pd
import numpy as np
import plotly.plotly as py
import cufflinks as cf

# Average number of employees and manufacturing employees each year from
# 1990-2010 as reported by FRED
fred = pandas_datareader.DataReader(
    ["PAYEMS", "MANEMP"],
    data_source="fred",
    start="1990/01/01",
    end="2010/01/01").resample("A").mean()
fred["man_share"] = fred["MANEMP"] / fred["PAYEMS"]

# collect similar results from the census
from_census = {"man": {}, "total": {}}
sic_codes = list(np.unique(
    list(uscensus.util.naics2002_to_sics("31")) +
    list(uscensus.util.naics2002_to_sics("32")) +
    list(uscensus.util.naics2002_to_sics("33"))
))
for year in range(1990, 2011):
    cbp = uscensus.CountyBusinessPatterns(year)
    from_census["total"][year] = cbp.get(["EMP"], us=True)["EMP"][0]

    industry_code = uscensus.util.cbp_industry_var[year]
    if industry_code == "SIC":
        # SIC manufacturing is anything starting with 2 or 3
        # https://www.osha.gov/pls/imis/sic_manual.html
        # in this case we need to do two queries, one to pick up SIC=2*
        # and another for SIC=3*
        df1 = cbp.get(["EMP"], us=True, SIC="2*")
        df2 = cbp.get(["EMP"], us=True, SIC="3*")
        df = pd.concat([df1, df2])

        # now restrict ourselves to the sic codes in the Census' concordance
        val = df[df.SIC.isin(sic_codes)]["EMP"].sum()
    else:
        # some form of NAICS. Man code is always 31-33 for all NAICS variants
        # https://www.census.gov/cgi-bin/sssd/naics/naicsrch?chart=2007
        kwarg = {industry_code: "31-33"}
        val = cbp.get(["EMP"], us=True, **kwarg)["EMP"][0]

    from_census["man"][year] = val

census = pd.DataFrame(from_census)
census["man_share"] = census["man"] / census["total"]

compare = pd.DataFrame(
    {
        "BLS": fred["man_share"].values,
        "Community Business Patterns": census["man_share"].values
    }, index=fred.index
)
compare.iplot(filename="man-share_cbp_vs_bls")

References

Autor, D. H., Dorn, D., Hanson, G. H., & Song, J. (2014). Trade Adjustment: Worker-Level Evidence. The Quarterly Journal of Economics, 129(4), 1799–1860. http://doi.org/10.1093/qje/qju026