**************************************************************** * Unemployment Insurance In Survey and Administrative Data * Jeff Larrimore, Jacob Mortenson, and David Splinter * * CPS Estimates **************************************************************** global datadir = "YOUR DIRECTORY" * save the CPS data and imputation data in the above directory *** *Step 1: read, format, and save the imputation clear import excel $datadir/LMS-UI-data.xlsx, sheet(Imputations) cellrange(A5:F2204) rename A inc_centile rename B taxyr rename C ui_sum rename D ui_n rename E ui_mean rename F ui_sd save $datadir/IRS_centile_ui.dta, replace *** *Step 2: create a dummy file with income bins so that there are no gaps in output clear set obs 501 gen incunemp_bin = 100*(_n - 1) gen asecwt = 1 gen incunemp = 1 gen temp = 1 gen year = 0 tempfile dummy save `dummy' *** *Step 3: Set up the CPS data use $datadir/cps_00055.dta, clear replace asecwt = asecwtcvd if asecwtcvd!=. keep if age>=16 *Replace NIU values with zeros for individual income sources replace incunemp = 0 if incunemp==999999 | incunemp==. replace increti1 = 0 if increti1 == 999999 | increti1 == . replace increti2 = 0 if increti2 == 999999 | increti2 == . replace incret1 = 0 if incret1 == 9999999 | incret1==. replace incret2 = 0 if incret2 == 9999999 | incret2==. replace incsurv1 = 0 if incsurv1 == 99999999 | incsurv1==. replace incsurv2 = 0 if incsurv2 == 99999999 | incsurv2==. replace incpens = 0 if incpens==. replace incrann = 0 if incrann==. replace incalim = 0 if incalim==. *Verify that everything looks correct and that total earnings matches total personal earnings reported by Census *gen testtotal = incwage + incbus + incfarm + incint + incdivid + incrent + incchild + incother + incsurv1 + incsurv2 + inceduc + incasist + incssi + incdisa1 + incdisa2 + incunemp + incwkcom + incvet + incss + incwelfr + incpens + incrann + incalim + incretir gen laborearn = incwage + incbus + incfarm gen privinc = laborearn + incrent + incint + incdivid + incother + incretir + incalim + incsurv1 + incsurv2 + incpens + incrann *Preserving this, but in the ipums data, none of the survivors benefits are workers compensation. *replace privinc = privinc - incsurv1 if srcsurv1==6 // survivors benefits from workers comp. are non-taxable *replace privinc = privinc - incsurv2 if srcsurv2==6 // survivors benefits from workers comp. are non-taxable *** taxform_inc IS THE INCOME THAT WE ARE FOCUSING ON IN THIS PAPER *** NOTE THAT IT EXCLUDES UNEMPLOYMENT INSURANCE (deviation from other papers where we use taxform_inc) gen priv_plus_ss_di = privinc + incss + incdisa1 + incdisa2 replace priv_plus_ss_di = priv_plus_ss_di - incdisa1 if srcdisa1==1 // disability benefits from workers comp. are non-taxable replace priv_plus_ss_di = priv_plus_ss_di - incdisa2 if srcdisa2==1 // disability benefits from workers comp. are non-taxable gen taxform_inc = priv_plus_ss_di **gen taxform_inc = priv_plus_ss_di + incunemp *Determine spouse's income, and link together individual and spouse preserve keep taxform_inc incunemp year serial pernum rename pernum sploc rename taxform_inc sp_taxform_inc rename incunemp sp_incunemp tempfile spouse_inc save `spouse_inc' tab year restore count joinby year serial sploc using `spouse_inc', unmatched(master) count tab year _merge gen married = 0 replace married = 1 if _merge==3 replace sp_taxform_inc = 0 if sp_taxform_inc==. replace sp_incunemp = 0 if sp_incunemp==. *reminder: taxform_inc EXCLUDES unemployment income already gen tu_unemp = incunemp + sp_incunemp gen tu_totalinc = taxform_inc + sp_taxform_inc *Equal-split income gen split_tu_totalinc = tu_totalinc replace split_tu_totalinc = tu_totalinc/2 if married==1 ***** *Step 4: Create figures based on the raw data * THIS LIMITS US TO PEOPLE WITH UNEMPLOYMENT INCOME *Figure 2 preserve drop if incunemp==0 gen incunemp_bin = round(incunemp, 100) replace incunemp_bin = 50000 if incunemp_bin>50000 & incunemp_bin!=. collapse (sum) asecwt, by(incunemp_bin year) append using `dummy' rename year svyyear tab incunemp_bin svyyear [iw=asecwt] if svyyear==0 | svyyear>=2020, matcell(x) restore *Figure 3 local centiles = 10 set seed 19893871 gen rand1 = runiform() sort year split_tu_totalinc rand1 by year: gen double runningwgt = sum(asecwt) by year: egen double totalwgt = total(asecwt) gen inc_centile = runningwgt/totalwgt replace inc_centile = ceil(inc_centile*`centiles') drop runningwgt totalwgt *f3_weight is just converting to one-billionth so that output is billions of dollars gen f3_wgt = asecwt/1000000000 tabstat incunemp if year==2010 [aw=f3_wgt], by(inc_centile) stat(sum) tabstat incunemp if year==2020 [aw=f3_wgt], by(inc_centile) stat(sum) tabstat incunemp if year==2021 [aw=f3_wgt], by(inc_centile) stat(sum) tabstat taxform_inc if year==2021 [aw=f3_wgt], by(inc_centile) stat(sum) *Number of people in each decile tab inc_centile if year==2021 [iw=asecwt] drop inc_centile ***** *Step 5: set up the data for imputation (identifying single centiles) local centiles = 100 sort year split_tu_totalinc rand1 by year: gen double runningwgt = sum(asecwt) by year: egen double totalwgt = total(asecwt) gen inc_centile = runningwgt/totalwgt replace inc_centile = ceil(inc_centile*`centiles') drop runningwgt totalwgt *** * Step 6: MERGE IN IRS DATA AND RUN POVERTY CALCULATIONS gen povfam = famid replace povfam = 1 if famid!=1 & ftype==3 *NOTE THAT FINC BASICALLY PERFECTLY MATCHES OFFTOTVAL, SO CAN USE POVFAM AND INCTOT FOR MY BASELINE COMPUTATIONS *sort year serial povfam *by year serial povfam: egen finc = total(inctot) *gen test = finc - offtotval *tab year *joinby year serial pernum using `centile_merge', unmatched(master) *tab year keep if year>=2021 gen taxyr = year - 1 gen anyui = (incunemp>0) *Merge in IRS UI data sort taxyr inc_centile joinby taxyr inc_centile using $datadir/IRS_centile_ui.dta, unmatched(master) _merge(IRS_cent_merge) *Determine how many people and recipients in each centile sort taxyr inc_centile gen temp = anyui * asecwt by taxyr inc_centile: egen double ui_n_cps_raw = total(temp) drop temp by taxyr inc_centile: egen double cps_n = total(asecwt) *Count the number of missing recipients (ui_n is recipients from IRS) and determine how many I need to impute gen missing_recipients = ui_n - ui_n_cps_raw replace missing_recipients = 0 if missing_recipients<0 gen impute_share = missing_recipients / (cps_n - ui_n_cps_raw) replace impute_share = 0 if impute_share<0 *Randomply impute recipients in each centile. anyui_impute are those who either have original UI or imputed UI set seed 1983716767 gen rand = runiform() gen anyui_impute = 0 replace anyui_impute = 1 if anyui==1 replace anyui_impute = 1 if anyui==0 & rand<=impute_share gen temp = anyui_impute * asecwt if anyui==0 sort taxyr inc_centile by taxyr inc_centile: egen double imputed_count = total(temp) drop temp *****Now determine how much UI to impute. *If original report, keep as is. If imputed, divide up remaining dollars. *How much UI $ is missing in each centile sort taxyr inc_centile gen temp = incunemp * asecwt by taxyr inc_centile: egen double centile_ui_dollars_cps = total(temp) drop temp gen missing_dollars = (ui_sum) - centile_ui_dollars_cps gen impute_mean = 0 replace impute_mean = (missing_dollars)/(imputed_count) if anyui_impute == 1 & anyui==0 *Give UI to imputed recipients with the mean and sd that matches IRS data. gen incunemp_impute = 0 set seed 19893871 forval i = 1/100 { noi disp `i' quietly { forval yr = 2019/2020 { quietly sum impute_mean if inc_centile==`i' & anyui_impute== 1 & anyui==0 & taxyr == `yr', meanonly if r(N)>0 { local mean = r(mean) quietly sum ui_sd if inc_centile==`i' & anyui_impute == 1 & anyui==0 & taxyr == `yr', meanonly local sd = r(mean) drawnorm temp, means(`mean') sds(`sd') replace incunemp_impute = temp if anyui_impute == 1 & anyui==0 & inc_centile==`i' & taxyr == `yr' drop temp } } } } *Place a $100 lower bound on imputed UI replace incunemp_impute = 100 if incunemp_impute<100 & anyui_impute == 1 & anyui==0 *And replace CPS reported values with their actuals replace incunemp_impute = incunemp if anyui==1 *Figure 2 - WITH IMPUTATIONS preserve drop if anyui_impute==0 gen incunemp_bin = round(incunemp_impute, 100) replace incunemp_bin = 50000 if incunemp_bin>50000 & incunemp_bin!=. *gen temp = incunemp * asecwt gen temp = asecwt collapse (sum) temp, by(incunemp_bin year) append using `dummy' rename year svyyear tab incunemp_bin svyyear [iw=temp] if svyyear==0 | svyyear>=2020, matcell(x) restore *NOTE THAT FINC BASICALLY PERFECTLY MATCHES OFFTOTVAL, SO CAN USE POVFAM AND INCTOT FOR MY BASELINE COMPUTATIONS gen inctot_imp = inctot - incunemp + incunemp_impute sort year serial povfam by year serial povfam: egen finc_impute = total(inctot_imp) gen pov_impute = (finc_impute<=offcutoff) preserve keep year serial povfam pov_impute finc_impute year incunemp incunemp_impute anyui anyui_impute asecwt collapse (mean) finc_impute pov_impute asecwt (sum) incunemp incunemp_impute anyui anyui_impute, by(year serial povfam) rename incunemp funemp rename incunemp_impute funemp_impute sort year serial povfam tempfile pov_status save `pov_status' restore use $datadir/cps_00055.dta, clear replace asecwt = asecwtcvd if asecwtcvd!=. gen povfam = famid replace povfam = 1 if famid!=1 & ftype==3 keep if year>=2020 count joinby year serial povfam using `pov_status', unmatched(master) count tab year offpov [aw=asecwt], row nofreq tab year pov_impute [aw=asecwt], row nofreq *sort year serial povfam *gen u18 = (age<18) *by year serial povfam: egen kids = total(u18) *tab year if funemp_impute>0 [iw=asecwt] *tab year if funemp_impute>0 & kids==0 [iw=asecwt] *Figure 2 - WITH IMPUTATIONS preserve drop if anyui_impute==0 gen incunemp_bin = round(incunemp_impute, 100) replace incunemp_bin = 50000 if incunemp_bin>50000 & incunemp_bin!=. *gen temp = incunemp * asecwt gen temp = asecwt collapse (sum) temp, by(incunemp_bin year) append using `dummy' rename year svyyear tab incunemp_bin svyyear [iw=temp] if svyyear==0 | svyyear>=2020, matcell(x) restore *NOTE THAT FINC BASICALLY PERFECTLY MATCHES OFFTOTVAL, SO CAN USE POVFAM AND INCTOT FOR MY BASELINE COMPUTATIONS gen inctot_imp = inctot - incunemp + incunemp_impute sort year serial povfam by year serial povfam: egen finc_impute = total(inctot_imp) gen pov_impute = (finc_impute<=offcutoff) preserve keep year serial povfam pov_impute finc_impute year incunemp incunemp_impute anyui anyui_impute asecwt collapse (mean) finc_impute pov_impute asecwt (sum) incunemp incunemp_impute anyui anyui_impute, by(year serial povfam) rename incunemp funemp rename incunemp_impute funemp_impute sort year serial povfam tempfile pov_status save `pov_status' restore use $datadir/cps_00055.dta, clear replace asecwt = asecwtcvd if asecwtcvd!=. gen povfam = famid replace povfam = 1 if famid!=1 & ftype==3 keep if year>=2020 count joinby year serial povfam using `pov_status', unmatched(master) count tab year offpov [aw=asecwt], row nofreq tab year pov_impute [aw=asecwt], row nofreq *sort year serial povfam *gen u18 = (age<18) *by year serial povfam: egen kids = total(u18) *tab year if funemp_impute>0 [iw=asecwt] *tab year if funemp_impute>0 & kids==0 [iw=asecwt]