Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with how weighted decile bins are created for distribution and difference tables #2460

Open
Peter-Metz opened this issue Aug 18, 2020 · 0 comments

Comments

@Peter-Metz
Copy link
Contributor

Peter-Metz commented Aug 18, 2020

With the current code in utils.py weighted deciles cannot be created if too many filers have 0 or negative values for the relevant income measure (e.g. expanded income). The add_quantile_table_row_variable() function creates the decile bins in two steps:

  1. After sorting the records by the relevant income measure, the records are broken up into 10 bins with equal number of people or filing units.
  2. If decile_details=True, which is the default option for difference and distribution tables, the original deciles are further broken down to include a group of filers with negative income and a group with 0 income.

The problem arises if more than 10% of records have <=0 income since the bin sizes will no longer increase monotonically and pandas will throw an error when calling pd.cut().

This exact problem arose in #2444 when trying to run recipe05 with the new CPS. Since recipe05 replaces expanded income with "market income" (i.e. does not include benefits received), the number of filers with <=0 income grew bigger than 10% of the records.

A short term solution would include modifying recipe05. A more permanent solution would involve modifying the bin creation in utils.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants