-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quantify and expand coverage of TLC simulation mode #6537
Comments
Quantify current coverage with five nodesMain observation: The The following plots display statistics gathered from a long-running simulation (note that the overall overhead associated with extended statistics ( Python code used to generate the plot```python import matplotlib.pyplot as plt import pandas as pd data = pd.read_json("SIMccfraft.ndjson", lines=True) json_data = data.iloc[-1].to_dict() actions_df = pd.DataFrame(json_data["actions"].items(), columns=["Action", "Frequency"]) actions_df_sorted = actions_df.sort_values(by="Frequency", ascending=False) plt.figure(figsize=(10, 6)) plt.bar(actions_df_sorted["Action"], actions_df_sorted["Frequency"]) plt.yscale('log') plt.xticks(rotation=90) plt.xlabel('Actions') plt.ylabel('Frequency (Log scale)') plt.title('Action Frequencies with Logarithmic Y-Axis') plt.tight_layout() plt.show() ```Python code used to generate the plot```python import matplotlib.pyplot as plt import pandas as pd data = pd.read_json("SIMccfraft.ndjson", lines=True) plt.plot(data['duration'], data['distinct'], marker='x', linestyle='--', color='black', label='distinct') plt.plot(data['duration'], data['generated'], marker='o', linestyle='-', color='blue', label='generated') plt.title('Distinct states over Duration') plt.xlabel('Duration') plt.ylabel('Count') plt.grid(True) plt.legend() plt.show() ```Variables with finitely many distinct values:
Python code used to generate the plot```python import matplotlib.pyplot as plt import pandas as pd import matplotlib.cm as cm import numpy as np data = pd.read_json("SIMccfraft.ndjson", lines=True) plt.figure(figsize=(10, 6)) i = 1 colors = cm.rainbow(np.linspace(0, 1, len(data['distinctvalues'].iloc[-1].keys()))) for value, color in zip(data['distinctvalues'].iloc[-1].keys(),colors): plt.subplot(4, 4, i) plt.plot(data['duration'], data['distinctvalues'].apply(lambda x: x.get(value, None)), marker='o', linestyle='-', label=value, color=color) plt.xlabel('Duration') plt.ylabel('Count') plt.grid(True) plt.legend() i += 1 plt.show() ```@lemmy ➜ /workspaces/CCF (main) $ git diff
diff --git a/tla/consensus/SIMccfraft.cfg b/tla/consensus/SIMccfraft.cfg
index 675b9e322..de6948a74 100644
--- a/tla/consensus/SIMccfraft.cfg
+++ b/tla/consensus/SIMccfraft.cfg
@@ -49,12 +49,15 @@ CONSTANTS
Extend <- [abs]ABSExtend
CopyMaxAndExtend <- [abs]ABSCopyMaxAndExtend
-CONSTRAINT
- StopAfter
+\* CONSTRAINT
+\* StopAfter
CHECK_DEADLOCK
FALSE
+_PERIODIC
+ PrintStats
+
PROPERTIES
CommittedLogAppendOnlyProp
MonotonicTermProp
diff --git a/tla/consensus/SIMccfraft.tla b/tla/consensus/SIMccfraft.tla
index 3d86eb251..5663f08d2 100644
--- a/tla/consensus/SIMccfraft.tla
+++ b/tla/consensus/SIMccfraft.tla
@@ -45,6 +45,9 @@ DebugInvUpToDepth ==
\* -depth after generating the first trace.
TLCGet("level") < TLCGet("config").depth
+PrintStats ==
+ Serialize(<<TLCGet("stats")>>, "SIMccfraft.ndjson", [format |-> "NDJSON", charset |-> "UTF-8", openOptions |-> <<"WRITE", "CREATE", "APPEND">>])
+
----
\* Refinement Slightly better coverage with |
This comment was marked as resolved.
This comment was marked as resolved.
@lemmy increasing the depth does sound like a good step to improve coverage. RequestVote seems overly represented, even if we think that we want to focus on fault-heavy scenarios. Similarly, it looks like (and this is very rough from eyeballing the graph) that request votes is about two orders of magnitude more frequent than BecomeLeader. That's a lot of failed elections! My sense is that we should aim to diminish timeout frequency, by maybe an order of magnitude or so, to get to stage where we are generally progressing further. It's still be quite a lot more faults than a typical system. |
This comment has been minimized.
This comment has been minimized.
@lemmy perhaps I missing something here, but this does not seem like an improvement? It looks like there are fewer distinct states in proportion than before, the frequency of RequestVote has increased even more, and it looks like no successful retirement happened (I notice the run is shorter, but I am not sure that explains it). |
The plots above display the state-space coverage without a de-prioritization of the actions With the changes introduced in 47eb0d9, state-space coverage returned to its previous level (see below), while also checking the two liveness properties: The plots appear more irregular because the data structure used to store the statistics, in this case HyperLogLog, is probabilistic and the duration is less. |
Ah, got it, that's excellent! |
Related: tlaplus/tlaplus#1065 |
TLC's simulation mode is generally effective at detecting regressions while being less resource-intensive than full model checking, which helps reduce the strain on our CI resources. However, the actual coverage of simulation mode is unknown. Additionally, increasing coverage is highly desirable.
The text was updated successfully, but these errors were encountered: