-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE REQUEST #32] On-Premises S3 / S3 Compatible... #389
base: main
Are you sure you want to change the base?
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
...s-core/src/main/java/org/apache/polaris/core/storage/s3/S3CredentialsStorageIntegration.java
Outdated
Show resolved
Hide resolved
...s-core/src/main/java/org/apache/polaris/core/storage/s3/S3CredentialsStorageIntegration.java
Outdated
Show resolved
Hide resolved
spec/polaris-management-service.yml
Outdated
@@ -901,6 +903,58 @@ components: | |||
required: | |||
- roleArn | |||
|
|||
S3StorageConfigInfo: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this specific to s3compat, or is it also meant to be used for s3 itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation focuses on OnPrem S3 because there is already the AWS class..
However, in a next step I can try to let it work seamlessly with AWS too :
- I think that roleArn is not mandatory for AWS S3, so let it to the existing implementation for this scenario
- Using access and secret key should work with AWS S3 too
- I have overrided the AWS STS endpoint with S3 endpoint. I could add a modification, maybe with a STS endpoint property... something like "if property is empty" -> "STSclient call AWS default STS endpoint" else -> "STS client call the endpoint setted" or a boolean with a clear and explicit description
- Region, (maybe little more reflexion is needed to avoid conflict)
- Add region property
- I have removed the cross region tweak of the AWS FileIOClientFactory, it can be kept to assure a full compatibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can unify these, I think that would be ideal. But I don't know enough about how S3 vs S3Compatible are similar/different to say how possible that is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will not be too hard to unify, in a next step. I miss AWS access to do tests, but for what I know or seen :
- STS endpoint is a AWS specific endpoint, in S3 compatible solutions, when it is available, it is merged with S3 endpoint.
- Region is available in S3 compatible solutions, but not used a lot, or mostly implemented to be compliant with aws sdk clients.
- about the cross region, it is more related to Polaris and Iceberg with the Iceberg overloaded S3fileIO, but I seen that [AWS] S3FileIO - Add Cross-Region Bucket Access iceberg#11259 have been merged 3 weeks ago --> so next Iceberg version will be ok soon.
MinIO claim that their product API is 100% compatible with AWS S3 API. Almost the same for many alternatives...
- The S3 Compatible implementation could easily propose an optional parameter "arnRole" like the mandatory one in the existing aws class, with less regexp patern to allow more flexibility for some implementation where "aws" inside the string is replaced by the product name (exemple "ecs" for DELL ECS)... It could help for a smooth transition
enum: | ||
- TOKEN_WITH_ASSUME_ROLE | ||
- KEYS_SAME_AS_CATALOG | ||
- KEYS_DEDICATED_TO_CLIENT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this work? What identifies a client?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here client is anything trying to obtain keys (or security token service) from this catalog (spark, trino,...). There is no particular distinction of identity.
This is not the right term to use in the context of Polaris?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the term is correct, I was just stuck trying to understand how the service will track which keys are dedicated to which client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
It's simply one key for catalog itself, then another unique key for any clients whoever they are. I Let client distinction to the principal/role/privilege level. I think it is hard at the class storage/credential level to stick a pair of keys to each different clients.
It is a basic way, when SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION is True and there is not a temporary token, to not divulge internal catalog key and serve a key that can be deactivated or rotated for security concerns without breaking catalog itself.
After discussing with MonkeyCanCode here
Prod Deployment credentials the advantage in this proposal is that you have not to rely on the main credentials provided at the global Polaris service level.
Today if you revoke the Polaris service credentials for AWS, all catalogs with AWS storages are instantly KO.
In this implementation each catalog is independent. It is the same idea about clients keys, to not breaking catalog when clients keys are revoked or rotated fo security reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you have not to rely on the main credentials provided at the global Polaris service level.
I think this is the key point here. I agree that the experience you describe is bad, but I'm not sure that fixing it should be a blocker for s3compat support (or that this is the right fix).
Would you be okay saving this for later, or carving it out into a different PR? In my view relying on the global credentials in production is universally a bad idea, regardless of what STORAGE_TYPE you're using.
dd8d860
to
e2c296b
Compare
@@ -23,6 +23,8 @@ public enum PolarisCredentialProperty { | |||
AWS_KEY_ID(String.class, "s3.access-key-id", "the aws access key id"), | |||
AWS_SECRET_KEY(String.class, "s3.secret-access-key", "the aws access key secret"), | |||
AWS_TOKEN(String.class, "s3.session-token", "the aws scoped access token"), | |||
AWS_ENDPOINT(String.class, "s3.endpoint", "the aws s3 endpoint"), | |||
AWS_PATH_STYLE_ACCESS(Boolean.class, "s3.path-style-access", "the aws s3 path style access"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whether or not to use path-style access
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many S3COMPATIBLE solutions are deployed without network devices or configurations in front of them allowing support of dynamic hosts names including buckets.
TLS certificate with private AC could also be a challenge for dynamic host name. "*. domain" can also be forbidden by some enterprise security policy
Path style is useful in many cases. In ideal world I agree it should stay deprecated...
.../org/apache/polaris/core/storage/s3compatible/S3CompatibleCredentialsStorageIntegration.java
Show resolved
Hide resolved
.../org/apache/polaris/core/storage/s3compatible/S3CompatibleCredentialsStorageIntegration.java
Show resolved
Hide resolved
.../org/apache/polaris/core/storage/s3compatible/S3CompatibleCredentialsStorageIntegration.java
Show resolved
Hide resolved
.../java/org/apache/polaris/core/storage/s3compatible/S3CompatibleStorageConfigurationInfo.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I don't understand what isn't supported today with catalog-properties. E.g., in https://github.com/apache/polaris/blob/main/polaris-service/src/test/java/org/apache/polaris/service/catalog/PolarisSparkIntegrationTest.java , we use S3MockContainer
as an S3 endpoint, which requires the same path-style access and custom enpdoint configuration as what's included here. Can we not follow the same pattern for minio?
As a rule, I think vending static credentials is not a good idea. Some customization for how the STS client is instantiated, possibly with support for custom profiles for different catalogs could make sense. But I think, ultimately, the credentials returned should always be a temporary session token. Even if we just call GetSessionToken without requiring an IAM role, it would vastly more secure than sending raw credentials.
propertiesMap.put(PolarisCredentialProperty.AWS_ENDPOINT, storageConfig.getS3Endpoint()); | ||
propertiesMap.put( | ||
PolarisCredentialProperty.AWS_PATH_STYLE_ACCESS, | ||
storageConfig.getS3PathStyleAccess().toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are catalog properties, not credential-vending properties. These should be set at the catalog-level when it is created. Those properties would then be passed into the FileIO when it is constructed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be refactor for satisfying change to requested : boolean "SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION".
I will try to find a way to move it in the catalog properties. But catalog properties are not forwarded to "S3CompatibleCredentialsStorageIntegration.java", only storage properties by default.
public void createStsClient(S3CompatibleStorageConfigurationInfo s3storageConfig) { | ||
|
||
LOGGER.debug("S3Compatible - createStsClient()"); | ||
StsClientBuilder stsBuilder = software.amazon.awssdk.services.sts.StsClient.builder(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this constructed here rather than being passed in as a constructor parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like AWS class ?
Find it weird to put something related to a storage type outside the class and provided by the constructor. Seems Azure class is keeping it inside the class too.
No ?
import software.amazon.awssdk.services.sts.model.AssumeRoleRequest; | ||
import software.amazon.awssdk.services.sts.model.AssumeRoleResponse; | ||
|
||
/** Credential vendor that supports generating */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment seems to just ...
String cli = System.getenv(storageConfig.getS3CredentialsClientAccessKeyId()); | ||
String cls = System.getenv(storageConfig.getS3CredentialsClientSecretAccessKey()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems you could rely on the DefaultCredentialsProvider
and maybe allow profiles to be specified? This would allow for env variables, but also file configuration or other means of retrieving credentials.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not find how to conciliate this with createCatalog() REST API... And not happy with the idea to let this credentials catalog's fully and exclusively at Polaris service level.
Bad compromise ?
Not found how endpoint or other properties can be set from REST API create or update catalog. Only arnRole is accepted as mandatory parameter today in AWS storage type.
I agree, by default now it is STS "AssumeRole" without any "role". Raw credentials are poor fallback scenario when STS are not available. There is enterprise context where STS, assumeRole etc are not allowed. Only pair of keys are available. By example, Dell ECS require additional policy to enable STS AssumeRole. It's not activated out of the box. I tried to be explicit about this degraded security pattern. "GetSessionToken" is not part of S3 API, it is IAM API. It is not available in MinIO |
Description:
This is a proposition of Polaris core storage implementation, copy of the aws, without the required arnRole parameter, replaced by endpoint, path style, and credentials.
You can choose a strategy for the client calling catalog :
It will disabling Polaris credential "SubScoping", but if you do not care a lot about security
There is also possiblity during creation of the catalog (cli, curl api) to choose between 2 strategies to communicate the S3 access and secret keys :
Polaris need these variables available inside is running context. A kubernetes production deployment should be suited for this configuration (envFromSecrets deployed to Polaris pods)
Let me know your opinion about this design proposal.
Thank you
Included Changes:
Type of change:
Checklist:
Please delete options that are not relevant.