-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mitigate performance issues through cache configuration and other improvements. #215
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reviewed and provided feedback to Aaron directly. LGTM so far.
This reverts commit 7605b70.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Changes look great.
Thank you @naioja for your tweaks for NSG with Standard Load Balancer. I have merged the current changes from master and resolved the merge conflict. I have also included your suggested snippet to ensure the alternative_component_cache directory exists! |
This PR mitigates performance and transient reliability issues which we have identified during load testing via JMeter and the Latency-Sensitive Stress Testing (time-gated-exam.jmx) exam with tweaks and updates for the latest version. The changes are as follows:
Sets the Moodle localcachedir to
/tmp/localcachedir
During testing of the Large size deployment, which defaults to Azure Premium Files as the external file share, we identified files in the
/moodle/moodledata
directory that caused increased latency. The first is thelocalcachedir
directory which Moodle recommends using a fast local file system for when Moodle is clustered.Sets
alternative_component_cache
to/var/www/html/moodle/core_component.php
This change is in conjunction with
localcachedir
and provides significant performance improvements whenmoodledata
is located on an external file share such as Azure Premium Files (see related issue caching problem with gluster #126 regarding GlusterFS). We chose this directory because it must already exist and the web server must have permissions to write to it.Increases default osDisk size from 30Gb (120 IOPS/3,500 Burst IOPS/25MB/sec) to 256Gb (1,100 IOPS/3,500 Burst IOPS/125MB/sec)
During load testing we believe we may have hit IOPS and/or Throughput limits at either the Disk and/or VM level which can cause a VM to become unavailable. Updates to Disk and VM metrics will make this clearer. In order to mitigiate this we chose a Premium SSD size with significantly more IOPS and throughput.
We initially chose 1,024Gb (5,000 IOPS/200MB/sec) because this size is the first that does not utilize the 3,500 "Burst" IOPS. Latency also decreased as the disk size was increased. However, a smaller size such as 256Gb (1,100 IOPS/3,500 Burst IOPS/125MB/sec) may be suitable and this PR changes from 30Gb to 256Gb.
We applied this change to both the Virtual Machine Scale Set (VMSS) that handles the web traffic, as well as the Controller VM we use for JMeter testing (after resizing to match the VMSS), in order to maintain parity in terms of IOPS and throughput.
Defaults Load Balancer and Public IP to the Standard SKU.
We upgraded our Load Balancer and Public IP to the Standard SKU to enable the Multi-dimensional metrics and alerts, particularly "SNAT connections", to help avoid as well as confirm we do not experience issues such as SNAT Port Exhaustion.
These changes have been tested to deploy successfully against the current master, though load testing was performed against an earlier commit.
(Special thanks to @iennae for feedback and insights throughout!)