Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WC2-431: Decipher how to aggregate ETL based on current tableau #1833

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

butofleury
Copy link
Member

@butofleury butofleury commented Nov 27, 2024

What problem is this PR solving? Explain here in one sentence.

Related JIRA tickets : WC2-431

Self proofreading checklist

  • Did I use eslint and black formatters
  • Is my code clear enough and well documented
  • Are my typescript files well typed
  • New translations have been added or updated if new strings have been introduced in the frontend
  • My migrations file are included
  • Are there enough tests
  • Documentation has been included (for new feature)

Changes

Aggregated data by period(month and year) and org unit.

How to test

From the local instance with wfp coda database, run ETL script to generate data for:

  • South Sudan: docker compose run iaso manage etl_ssd
  • Nigeria: docker compose run iaso manage etl_ng

Then login on django admin and check on Monthly statistics table. You should see data aggregated by period and org unit. You can also filter the data per account(country).

Print screen / video

Administration-de-Wfp-Iaso.webm

Follow the Conventional Commits specification

The merge message of a pull request must follow the Conventional Commits specification.

This convention helps to automatically generate release notes.

Use lowercase for consistency.

Example:

fix: empty instance pop up

Refs: IA-3665

Note that the Jira reference is preceded by a line break.

Both the line break and the Jira reference are entered in the Add an optional extended description… field.

Copy link
Contributor

@bramj bramj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally and working. However, I think variable naming could be better to improve the overall code readability. As well as (but more difficult) avoiding function side-effects.

from operator import itemgetter


class AGGREGATE_JOURNEY:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why this is not CamelCase?


monlthly_journey.save()

def journey_with_visit_and_steps_per_visit(self, account, program):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

programme?

monlthly_journey.save()

def journey_with_visit_and_steps_per_visit(self, account, program):
aggregated_journey = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some variable names are pretty confusing. I think this would be better called aggregated_journeys to better reflect it's a list, what do you think?

)
data_by_journey = groupby(list(journeys), key=itemgetter("visit__org_unit_id"))

for org_unit, journey in data_by_journey:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

journeys

data_by_journey = groupby(list(journeys), key=itemgetter("visit__org_unit_id"))

for org_unit, journey in data_by_journey:
visit_by_period = groupby(journey, key=itemgetter("period"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

visits_by_period

for org_unit, journey in data_by_journey:
visit_by_period = groupby(journey, key=itemgetter("period"))
assistance = {"rutf_quantity": 0, "rusf_quantity": 0, "csb_quantity": 0}
aggregated_journey = AGGREGATE_JOURNEY().group_by_period(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aggregated_journeys

number_visits = models.IntegerField(default=0)
given_sachet_rusf = models.FloatField(null=True, blank=True)
given_sachet_rutf = models.FloatField(null=True, blank=True)
given_quantity_csb = models.FloatField(null=True, blank=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason these 3 quantity fields are floats? Intuitively I would think an integer makes more sense?

row["number_visits"] = len(all_visits)
return row

def group_by_period(self, visit_by_period, org_unit, all_journey, assistance):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

visits_by_period, all_journeys

@@ -637,3 +641,72 @@ def save_entity_journey(self, journey, beneficiary, record, entity_type):
journey.save()

return journey

def save_monthly_journey(self, journey, account):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like journey is an instance of Journey, but in reality it is a dict representing 1 monthly statistic. Maybe the naming can be improved?

row["given_sachet_rutf"] = assistance.get("rutf_quantity", 0)
row["given_quantity_csb"] = assistance.get("csb_quantity", 0)
all_journey.append(row)
return all_journey
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code seems is working, but I find it pretty hard to follow to be honest. I think the main reason is that there are certain variables like assistance and row that are changed in place, and other methods that depend on these changes. I'm not sure if that's fixable, but I think we should avoid manipulating objects in place, and instead have functions return copies. We should strive for functions without side-effects, which makes the code easier to understand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants