-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WC2-431: Decipher how to aggregate ETL based on current tableau #1833
base: main
Are you sure you want to change the base?
WC2-431: Decipher how to aggregate ETL based on current tableau #1833
Conversation
…on-current-tableau
…on-current-tableau
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested locally and working. However, I think variable naming could be better to improve the overall code readability. As well as (but more difficult) avoiding function side-effects.
from operator import itemgetter | ||
|
||
|
||
class AGGREGATE_JOURNEY: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why this is not CamelCase?
|
||
monlthly_journey.save() | ||
|
||
def journey_with_visit_and_steps_per_visit(self, account, program): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
programme
?
monlthly_journey.save() | ||
|
||
def journey_with_visit_and_steps_per_visit(self, account, program): | ||
aggregated_journey = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some variable names are pretty confusing. I think this would be better called aggregated_journeys
to better reflect it's a list, what do you think?
) | ||
data_by_journey = groupby(list(journeys), key=itemgetter("visit__org_unit_id")) | ||
|
||
for org_unit, journey in data_by_journey: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
journeys
data_by_journey = groupby(list(journeys), key=itemgetter("visit__org_unit_id")) | ||
|
||
for org_unit, journey in data_by_journey: | ||
visit_by_period = groupby(journey, key=itemgetter("period")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
visits_by_period
for org_unit, journey in data_by_journey: | ||
visit_by_period = groupby(journey, key=itemgetter("period")) | ||
assistance = {"rutf_quantity": 0, "rusf_quantity": 0, "csb_quantity": 0} | ||
aggregated_journey = AGGREGATE_JOURNEY().group_by_period( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aggregated_journeys
number_visits = models.IntegerField(default=0) | ||
given_sachet_rusf = models.FloatField(null=True, blank=True) | ||
given_sachet_rutf = models.FloatField(null=True, blank=True) | ||
given_quantity_csb = models.FloatField(null=True, blank=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason these 3 quantity fields are floats? Intuitively I would think an integer makes more sense?
row["number_visits"] = len(all_visits) | ||
return row | ||
|
||
def group_by_period(self, visit_by_period, org_unit, all_journey, assistance): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
visits_by_period
, all_journeys
@@ -637,3 +641,72 @@ def save_entity_journey(self, journey, beneficiary, record, entity_type): | |||
journey.save() | |||
|
|||
return journey | |||
|
|||
def save_monthly_journey(self, journey, account): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like journey
is an instance of Journey
, but in reality it is a dict representing 1 monthly statistic. Maybe the naming can be improved?
row["given_sachet_rutf"] = assistance.get("rutf_quantity", 0) | ||
row["given_quantity_csb"] = assistance.get("csb_quantity", 0) | ||
all_journey.append(row) | ||
return all_journey |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code seems is working, but I find it pretty hard to follow to be honest. I think the main reason is that there are certain variables like assistance
and row
that are changed in place, and other methods that depend on these changes. I'm not sure if that's fixable, but I think we should avoid manipulating objects in place, and instead have functions return copies. We should strive for functions without side-effects, which makes the code easier to understand.
What problem is this PR solving? Explain here in one sentence.
Related JIRA tickets : WC2-431
Self proofreading checklist
Changes
Aggregated data by period(month and year) and org unit.
How to test
From the local instance with wfp coda database, run ETL script to generate data for:
docker compose run iaso manage etl_ssd
docker compose run iaso manage etl_ng
Then login on django admin and check on Monthly statistics table. You should see data aggregated by period and org unit. You can also filter the data per account(country).
Print screen / video
Administration-de-Wfp-Iaso.webm
Follow the Conventional Commits specification
The merge message of a pull request must follow the Conventional Commits specification.
This convention helps to automatically generate release notes.
Use lowercase for consistency.
Example:
Note that the Jira reference is preceded by a line break.
Both the line break and the Jira reference are entered in the Add an optional extended description… field.