Skip to content
This repository has been archived by the owner on Jul 26, 2023. It is now read-only.

UnableToAnonymizeDataException - occurs eventhough hierachies have been set #29

Open
2Max4 opened this issue Jan 21, 2020 · 5 comments
Open

Comments

@2Max4
Copy link

2Max4 commented Jan 21, 2020

Dear developers,

i am facing the problem that ARX cannot deidentify my test-data, eventhough I have set up the hierachies. I tested it with the titanic dataset (See code below).

I could not figure out at which point I missconfigured PyARX and therefore would be glad for any advice.

Error: RequestException: {"timestamp":"2020-01-21T16:01:55.686+0000","message":"no.nav.arxaas.exception.UnableToAnonymizeDataException: Could not fulfill the privacy criterion set, Unable to anonymize the dataset with the provided attributes and hierarchies. A common cause of this error is more than one QUASIIDENTIFYING attribute without a hierarchy","details":"uri=/api/anonymize"}

My Code:

import pandas as pd
from pyarxaas import Dataset
from pyarxaas import AttributeType
from pyarxaas import ARXaaS

df = pd.read_csv("titanic.csv")
df.head()

#Transform age to int - becaus float cannot be anonymized with hierarchy
df["Age"] = df["Age"].astype("int")
df["Fare"] = df["Fare"].astype("int")
df.dtypes

dataset = Dataset.from_pandas(df)

#dataset.set_attribute_type(AttributeType.QUASIIDENTIFYING,"Sex","Age","Fare")
dataset.set_attribute_type(AttributeType.QUASIIDENTIFYING,"Age","Fare")
dataset.set_attribute_type(AttributeType.IDENTIFYING, "Name")

arxaas = ARXaaS("http://########:8080")
riskprofile = arxaas.risk_profile(dataset)
riskprofile.re_identification_risk_dataframe()

from pyarxaas.hierarchy import IntervalHierarchyBuilder

#Set interval hierachies for Attribute age
age_intervals = IntervalHierarchyBuilder()

age_intervals.add_interval(0,18, "child")
age_intervals.add_interval(18,30, "young-adult")
age_intervals.add_interval(30,60, "adult")
age_intervals.add_interval(60,120, "old")

#this step is optional
age_intervals.level(0)\
    .add_group(2, "young")\
    .add_group(2, "adult")

#Set interval hierachies for Attribute fare
fare_intervals = IntervalHierarchyBuilder()

fare_intervals.add_interval(0,100, "low")
fare_intervals.add_interval(100,200, "high_low")
fare_intervals.add_interval(200,800,"mid")
fare_intervals.add_interval(800,80000, "high")

#create interval_hierachy - note: hierarchy can only take int as input value - but not float!
interval_hierarchy = arxaas.hierarchy(age_intervals, [i for i in df["Age"]])
fare_hierarchy = arxaas.hierarchy(fare_intervals, [i for i in df["Fare"]])

dataset.set_hierarchy("Age",interval_hierarchy)
dataset.set_hierarchy("Fare",fare_hierarchy)

# Import and configure privacy models
from pyarxaas import privacy_models
k = 2
kanon = privacy_models.KAnonymity(k)

# Make API-Call to anonymize the given dataset with the provided and added hierarchies
anonym = arxaas.anonymize(dataset,[kanon])

Error:
`RequestException Traceback (most recent call last)
in ()
1 # Make API-Call to anonymize the given dataset with the provided and added hierarchies
----> 2 anonym = arxaas.anonymize(dataset,[kanon])

~/anaconda3/envs/python3/lib/python3.6/site-packages/pyarxaas/arxaas.py in anonymize(self, dataset, privacy_models, suppression_limit)
28 """
29 request_payload = self._anonymize_payload(dataset, privacy_models, suppression_limit)
---> 30 response = self._anonymize(request_payload)
31 return self._anonymize_result(response)
32

~/anaconda3/envs/python3/lib/python3.6/site-packages/pyarxaas/arxaas.py in _anonymize(self, payload)
52 :return:
53 """
---> 54 response = self._connector.anonymize_data(payload)
55 return response
56

~/anaconda3/envs/python3/lib/python3.6/site-packages/uplink/builder.py in call(self, *args, **kwargs)
75 )
76 self._request_definition.define_request(builder, args, kwargs)
---> 77 return self._request_preparer.prepare_request(builder)
78
79

~/anaconda3/envs/python3/lib/python3.6/site-packages/uplink/builder.py in prepare_request(self, request_builder)
57 self.apply_hooks(chain, request_builder, sender)
58 return sender.send(
---> 59 request_builder.method, request_builder.url, request_builder.info
60 )
61

~/anaconda3/envs/python3/lib/python3.6/site-packages/uplink/clients/requests_.py in send(self, method, url, extras)
55 response = self._session.request(method=method, url=url, **extras)
56 if self._callback is not None:
---> 57 response = self._callback(response)
58 return response
59

~/anaconda3/envs/python3/lib/python3.6/site-packages/uplink/hooks.py in wrapper(, *args, **kwargs)
15 def wrapper(
, *args, **kwargs):
16 # Expects that consumer is the first argument
---> 17 return hook(*args, **kwargs)
18
19 return wrapper

~/anaconda3/envs/python3/lib/python3.6/site-packages/pyarxaas/arxaas_connector.py in raise_for_status(response)
14 """
15 if 400 <= response.status_code < 500:
---> 16 raise RequestException(response.text)
17 if response.status_code >= 500:
18 raise HTTPError(response.text)

RequestException: {"timestamp":"2020-01-21T16:01:55.686+0000","message":"no.nav.arxaas.exception.UnableToAnonymizeDataException: Could not fulfill the privacy criterion set, Unable to anonymize the dataset with the provided attributes and hierarchies. A common cause of this error is more than one QUASIIDENTIFYING attribute without a hierarchy","details":"uri=/api/anonymize"}`

@sonhal
Copy link
Contributor

sonhal commented Jan 22, 2020

Hi Max, glad you are trying out the project! To better help you, could you maybe supply a small sample of the dataset you are trying to anonymize?

As for possible fixes. If there are columns that are set as QUASIIDENTIFYING(the default) that dont have a hierarchy the anonymization might fail. Try adding a hierarchy to the sex column, a basic [["male", "*"], ["female", "*"]] should suffice.

Best regards,
The ARXaaS team

@JeremiahUy
Copy link
Contributor

JeremiahUy commented Jan 23, 2020

Hi Max

It is also possible that the dataset being used has more than 1-3 columns(name, age and sex). The application sets all other columns to QUASIIDENTIFYING(the default), if no attributes are set to them.

Best regards,
The ARXaaS team

@2Max4
Copy link
Author

2Max4 commented Jan 23, 2020

Hey all together!
Thanks for your fast support!
@sonhal The Dataset I use is the following one: Titanic Dataset

I guess that the issue might be, that I didn't create generalizations for all other collumns which - as I just learned - are set to Quasiidentifing by default.

I will retry it and give you feedback asap.

@2Max4
Copy link
Author

2Max4 commented Jan 23, 2020

Hey everyone,

I just added:

df = df.drop(columns=["Survived", "Pclass", "Sex", "Siblings/Spouses Aboard","Parents/Children Aboard" ])

And now I do not get any Error Msg. and it seems to work fine:
image

Does IDENTIFIYING automaticly set the attributes Value to "*"?

I also figured out, that "sensitive" also requires generalization - which obviously does make sense. However I have the situation that I do have sensitive attributes but I do not have a generalization. Therefore I eather want them to sattisfy l-diversity of lets say 2 for the beginning, or completely drop it. Do you have any idea of a possible implementation?

Since I will be working with your project for some time (I need it for my master degree) I am sure that there will be more questions upcomming - is this the right place to post them, or do you have any other communication channel that you would prefer?

Greetings
Max

@JeremiahUy
Copy link
Contributor

Hi Max

All attributes that have an IDENTIFYING attribute will automatically be set to "*" as these directly identify a person.

As of right now all the implemented privacy models related to sensitive attributes does not take a generalisation hierarchy. You only need to set which privacy model to use on a sensitive attribute and its value.

Generalisation will only occur on QUASIIDENTIFYING attributes. If you need more features then we recommend using ARX directly, as we have not yet fully implemented all the features within ARX.

As for future questions, this will be the right place to ask/post them :D

Best regards,
The ARXaaS team

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants