Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Programmatically Identify health system website #2

Open
dr00b opened this issue Jun 7, 2024 · 1 comment
Open

Programmatically Identify health system website #2

dr00b opened this issue Jun 7, 2024 · 1 comment

Comments

@dr00b
Copy link
Collaborator

dr00b commented Jun 7, 2024

Starting from the enrollment data set, write logic to attribute individual hospitals to a public facing health system web domain.

Deliberately not saying "LLM", as we should start simple and add complexity as necessary.

@dr00b dr00b added enhancement New feature or request unstructured data acquisition and removed enhancement New feature or request labels Jun 7, 2024
@dr00b
Copy link
Collaborator Author

dr00b commented Jun 11, 2024

@jaanli corresponded with AHD, they maintain this mapping (along with any other vendor that's doing price transparency data loads). We should acquire one of those as a validation set.

Seems like a fairly tractable problem where we could maintain a definitive set and let the researchers focus on the actual content of the websites. Start at hospitals. If it works, take a whack at HHA, nursing homes, health plans (increasing complexity and LLC obfuscation).

"root domain" becomes an entity type in the network database.

There's lots of deets in there, sharing some hackmd.io stuff.

@dr00b dr00b changed the title Identify health system website Programmatically Identify health system website Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant