-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use block size from HDFS configuration for Large Files calculation #62
Comments
What about making the file size definitions user configurable here as it's reasonable to expect differing opinions on what constitutes a particular file size from users. Currently file sizes are defined: |
Good idea, maybe a web UI where the user can select different filter to sort/group the files would be a better interface |
Hmm, yes a good idea @americanyorkie -- something to keep in mind though is that those are cached results you see -- so while yes that is possible to change them however the result may not be reflected until the next SuggestionEngine run. Still though; this is probably fine. I can see an admin-only REST endpoint that would set these. For example, a naive one like: Thoughts? I don't think it will be possible to have different settings per user though... |
I still think this is best to be fetched from the HDFS configuration file (hdfs-site.xml) as that should be the same value used by the active NameNode. If a different value is desired then it can be changed for just the NNA hosts hdfs-site.xml. Changing this value on the fly will not be good for NNA so it needs to be a hard value decided on bootstrap time. |
An additional justification is that once NNA bootstraps from a cluster NameNode (Observer or Standby), it will anyway have expected configuration. |
More thoughts on this one -- I think we should give a statistic by which we measure tiny, small, and medium files. I think ratios are probably the best measure here. If we were to retain the same hardcoded values then... assuming Large files = Greater than blocksize The ratios aren't very intuitive however. Might be better to stick with the hardcoded 1KB and 1MB sizes. Just dumping thoughts. |
In NNA today, particularly if you look around here:
NNAnalytics/src/main/java/org/apache/hadoop/hdfs/server/namenode/cache/SuggestionsEngine.java
Lines 161 to 165 in b17e8e6
You will see that NNA uses a hardcoded cut off of 128 Megabyte block sizes to distinguish between "Medium Files" and "Large Files".
We should instead utilize the bytes count from
dfs.blocksize
value found inhdfs-site.xml
(Configuration object in NNA, programmatically) passed into NNA that came from the source cluster.The text was updated successfully, but these errors were encountered: