-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with Model and Config File Mismatch in Wenet Conformer #2645
Comments
use bitransformerdecoder instead of transformerdecoder |
Hi Xingchen, Thank you! |
Hi Xingchen, 1. This is the key part of my checking code
2. The following is the printed missing keys and unexpected keysm, while the unexpected keys include======= Missing Keys ======= ======== |
Hi Xingchen, Thank you! That solved my decoder problems. But the missing keys are the same as before, while the unexpected keys are related to cmvn.
My questions are:
Thank you! |
Hi Xingchen,
(1) regarding the missing keys, I traced the bin/export_jit.py to utils/init_model.py, and found these missing keys can be detected when the model was loaded via load_checkpoint function imported from utils/checkpoint.py. Therefore, I assume the missing keys should not be an issue once the pt file is loaded using export_jit.py. (2) regarding the unexpected keys, since we have "mean_stat", "var_stat", and "frame_num" in global_cmvn, do you think I should perform the following conversion in my final tune program per my understanding of these the meanings of these parameters?
Thank you! |
Describe the bug
There is a mismatch between the
train.yaml
configuration file and the loaded model weights (final.pt
) when using the Wenet pretrained modelwenetspeech_u2pp_conformer_exp
. Specifically, when attempting to load the weights with the given configuration, several missing and unexpected keys are reported, which may indicate inconsistency between the model architecture defined in the YAML file and the actual pretrained weights.To Reproduce
Steps to reproduce the behavior:
wenetspeech_u2pp_conformer_exp.tar.gz
model from Wenet pretrained models page(https://wenet.org.cn/wenet/pretrained_models.en.html).train.yaml
file to adjust the paths forunits.txt
andglobal_cmvn
(the default path is not accommodated for the fact that the yaml file is under the same directory asunits.txt
andglobal_cmvn
)train.yaml
andfinal.pt
:Expected behavior
The script should load the pretrained weights without any missing or unexpected keys, indicating a consistent configuration between
train.yaml
andfinal.pt
.Screenshots
When I ran the code above, I can print the missing keys and unexpected keys below:
Missing keys: ['encoder.embed.pos_enc.pe', 'decoder.embed.0.weight', 'decoder.embed.1.pe', 'decoder.after_norm.weight', 'decoder.after_norm.bias', 'decoder.output_layer.weight', 'decoder.output_layer.bias', 'decoder.decoders.0.self_attn.linear_q.weight', 'decoder.decoders.0.self_attn.linear_q.bias', 'decoder.decoders.0.self_attn.linear_k.weight', 'decoder.decoders.0.self_attn.linear_k.bias', 'decoder.decoders.0.self_attn.linear_v.weight', 'decoder.decoders.0.self_attn.linear_v.bias', 'decoder.decoders.0.self_attn.linear_out.weight', 'decoder.decoders.0.self_attn.linear_out.bias', 'decoder.decoders.0.src_attn.linear_q.weight', 'decoder.decoders.0.src_attn.linear_q.bias', 'decoder.decoders.0.src_attn.linear_k.weight', 'decoder.decoders.0.src_attn.linear_k.bias', 'decoder.decoders.0.src_attn.linear_v.weight', 'decoder.decoders.0.src_attn.linear_v.bias', 'decoder.decoders.0.src_attn.linear_out.weight', 'decoder.decoders.0.src_attn.linear_out.bias', 'decoder.decoders.0.feed_forward.w_1.weight', 'decoder.decoders.0.feed_forward.w_1.bias', 'decoder.decoders.0.feed_forward.w_2.weight', 'decoder.decoders.0.feed_forward.w_2.bias', 'decoder.decoders.0.norm1.weight', 'decoder.decoders.0.norm1.bias', 'decoder.decoders.0.norm2.weight', 'decoder.decoders.0.norm2.bias', 'decoder.decoders.0.norm3.weight', 'decoder.decoders.0.norm3.bias', 'decoder.decoders.1.self_attn.linear_q.weight', 'decoder.decoders.1.self_attn.linear_q.bias', 'decoder.decoders.1.self_attn.linear_k.weight', 'decoder.decoders.1.self_attn.linear_k.bias', 'decoder.decoders.1.self_attn.linear_v.weight', 'decoder.decoders.1.self_attn.linear_v.bias', 'decoder.decoders.1.self_attn.linear_out.weight', 'decoder.decoders.1.self_attn.linear_out.bias', 'decoder.decoders.1.src_attn.linear_q.weight', 'decoder.decoders.1.src_attn.linear_q.bias', 'decoder.decoders.1.src_attn.linear_k.weight', 'decoder.decoders.1.src_attn.linear_k.bias', 'decoder.decoders.1.src_attn.linear_v.weight', 'decoder.decoders.1.src_attn.linear_v.bias', 'decoder.decoders.1.src_attn.linear_out.weight', 'decoder.decoders.1.src_attn.linear_out.bias', 'decoder.decoders.1.feed_forward.w_1.weight', 'decoder.decoders.1.feed_forward.w_1.bias', 'decoder.decoders.1.feed_forward.w_2.weight', 'decoder.decoders.1.feed_forward.w_2.bias', 'decoder.decoders.1.norm1.weight', 'decoder.decoders.1.norm1.bias', 'decoder.decoders.1.norm2.weight', 'decoder.decoders.1.norm2.bias', 'decoder.decoders.1.norm3.weight', 'decoder.decoders.1.norm3.bias', 'decoder.decoders.2.self_attn.linear_q.weight', 'decoder.decoders.2.self_attn.linear_q.bias', 'decoder.decoders.2.self_attn.linear_k.weight', 'decoder.decoders.2.self_attn.linear_k.bias', 'decoder.decoders.2.self_attn.linear_v.weight', 'decoder.decoders.2.self_attn.linear_v.bias', 'decoder.decoders.2.self_attn.linear_out.weight', 'decoder.decoders.2.self_attn.linear_out.bias', 'decoder.decoders.2.src_attn.linear_q.weight', 'decoder.decoders.2.src_attn.linear_q.bias', 'decoder.decoders.2.src_attn.linear_k.weight', 'decoder.decoders.2.src_attn.linear_k.bias', 'decoder.decoders.2.src_attn.linear_v.weight', 'decoder.decoders.2.src_attn.linear_v.bias', 'decoder.decoders.2.src_attn.linear_out.weight', 'decoder.decoders.2.src_attn.linear_out.bias', 'decoder.decoders.2.feed_forward.w_1.weight', 'decoder.decoders.2.feed_forward.w_1.bias', 'decoder.decoders.2.feed_forward.w_2.weight', 'decoder.decoders.2.feed_forward.w_2.bias', 'decoder.decoders.2.norm1.weight', 'decoder.decoders.2.norm1.bias', 'decoder.decoders.2.norm2.weight', 'decoder.decoders.2.norm2.bias', 'decoder.decoders.2.norm3.weight', 'decoder.decoders.2.norm3.bias']
Unexpected keys: ['encoder.global_cmvn.mean', 'encoder.global_cmvn.istd', 'decoder.left_decoder.embed.0.weight', 'decoder.left_decoder.after_norm.weight', 'decoder.left_decoder.after_norm.bias', 'decoder.left_decoder.output_layer.weight', 'decoder.left_decoder.output_layer.bias', 'decoder.left_decoder.decoders.0.self_attn.linear_q.weight', 'decoder.left_decoder.decoders.0.self_attn.linear_q.bias', 'decoder.left_decoder.decoders.0.self_attn.linear_k.weight', 'decoder.left_decoder.decoders.0.self_attn.linear_k.bias', 'decoder.left_decoder.decoders.0.self_attn.linear_v.weight', 'decoder.left_decoder.decoders.0.self_attn.linear_v.bias', 'decoder.left_decoder.decoders.0.self_attn.linear_out.weight', 'decoder.left_decoder.decoders.0.self_attn.linear_out.bias', 'decoder.left_decoder.decoders.0.src_attn.linear_q.weight', 'decoder.left_decoder.decoders.0.src_attn.linear_q.bias', 'decoder.left_decoder.decoders.0.src_attn.linear_k.weight', 'decoder.left_decoder.decoders.0.src_attn.linear_k.bias', 'decoder.left_decoder.decoders.0.src_attn.linear_v.weight', 'decoder.left_decoder.decoders.0.src_attn.linear_v.bias', 'decoder.left_decoder.decoders.0.src_attn.linear_out.weight', 'decoder.left_decoder.decoders.0.src_attn.linear_out.bias', 'decoder.left_decoder.decoders.0.feed_forward.w_1.weight', 'decoder.left_decoder.decoders.0.feed_forward.w_1.bias', 'decoder.left_decoder.decoders.0.feed_forward.w_2.weight', 'decoder.left_decoder.decoders.0.feed_forward.w_2.bias', 'decoder.left_decoder.decoders.0.norm1.weight', 'decoder.left_decoder.decoders.0.norm1.bias', 'decoder.left_decoder.decoders.0.norm2.weight', 'decoder.left_decoder.decoders.0.norm2.bias', 'decoder.left_decoder.decoders.0.norm3.weight', 'decoder.left_decoder.decoders.0.norm3.bias', 'decoder.left_decoder.decoders.1.self_attn.linear_q.weight', 'decoder.left_decoder.decoders.1.self_attn.linear_q.bias', 'decoder.left_decoder.decoders.1.self_attn.linear_k.weight', 'decoder.left_decoder.decoders.1.self_attn.linear_k.bias', 'decoder.left_decoder.decoders.1.self_attn.linear_v.weight', 'decoder.left_decoder.decoders.1.self_attn.linear_v.bias', 'decoder.left_decoder.decoders.1.self_attn.linear_out.weight', 'decoder.left_decoder.decoders.1.self_attn.linear_out.bias', 'decoder.left_decoder.decoders.1.src_attn.linear_q.weight', 'decoder.left_decoder.decoders.1.src_attn.linear_q.bias', 'decoder.left_decoder.decoders.1.src_attn.linear_k.weight', 'decoder.left_decoder.decoders.1.src_attn.linear_k.bias', 'decoder.left_decoder.decoders.1.src_attn.linear_v.weight', 'decoder.left_decoder.decoders.1.src_attn.linear_v.bias', 'decoder.left_decoder.decoders.1.src_attn.linear_out.weight', 'decoder.left_decoder.decoders.1.src_attn.linear_out.bias', 'decoder.left_decoder.decoders.1.feed_forward.w_1.weight', 'decoder.left_decoder.decoders.1.feed_forward.w_1.bias', 'decoder.left_decoder.decoders.1.feed_forward.w_2.weight', 'decoder.left_decoder.decoders.1.feed_forward.w_2.bias', 'decoder.left_decoder.decoders.1.norm1.weight', 'decoder.left_decoder.decoders.1.norm1.bias', 'decoder.left_decoder.decoders.1.norm2.weight', 'decoder.left_decoder.decoders.1.norm2.bias', 'decoder.left_decoder.decoders.1.norm3.weight', 'decoder.left_decoder.decoders.1.norm3.bias', 'decoder.left_decoder.decoders.2.self_attn.linear_q.weight', 'decoder.left_decoder.decoders.2.self_attn.linear_q.bias', 'decoder.left_decoder.decoders.2.self_attn.linear_k.weight', 'decoder.left_decoder.decoders.2.self_attn.linear_k.bias', 'decoder.left_decoder.decoders.2.self_attn.linear_v.weight', 'decoder.left_decoder.decoders.2.self_attn.linear_v.bias', 'decoder.left_decoder.decoders.2.self_attn.linear_out.weight', 'decoder.left_decoder.decoders.2.self_attn.linear_out.bias', 'decoder.left_decoder.decoders.2.src_attn.linear_q.weight', 'decoder.left_decoder.decoders.2.src_attn.linear_q.bias', 'decoder.left_decoder.decoders.2.src_attn.linear_k.weight', 'decoder.left_decoder.decoders.2.src_attn.linear_k.bias', 'decoder.left_decoder.decoders.2.src_attn.linear_v.weight', 'decoder.left_decoder.decoders.2.src_attn.linear_v.bias', 'decoder.left_decoder.decoders.2.src_attn.linear_out.weight', 'decoder.left_decoder.decoders.2.src_attn.linear_out.bias', 'decoder.left_decoder.decoders.2.feed_forward.w_1.weight', 'decoder.left_decoder.decoders.2.feed_forward.w_1.bias', 'decoder.left_decoder.decoders.2.feed_forward.w_2.weight', 'decoder.left_decoder.decoders.2.feed_forward.w_2.bias', 'decoder.left_decoder.decoders.2.norm1.weight', 'decoder.left_decoder.decoders.2.norm1.bias', 'decoder.left_decoder.decoders.2.norm2.weight', 'decoder.left_decoder.decoders.2.norm2.bias', 'decoder.left_decoder.decoders.2.norm3.weight', 'decoder.left_decoder.decoders.2.norm3.bias', 'decoder.right_decoder.embed.0.weight', 'decoder.right_decoder.after_norm.weight', 'decoder.right_decoder.after_norm.bias', 'decoder.right_decoder.output_layer.weight', 'decoder.right_decoder.output_layer.bias', 'decoder.right_decoder.decoders.0.self_attn.linear_q.weight', 'decoder.right_decoder.decoders.0.self_attn.linear_q.bias', 'decoder.right_decoder.decoders.0.self_attn.linear_k.weight', 'decoder.right_decoder.decoders.0.self_attn.linear_k.bias', 'decoder.right_decoder.decoders.0.self_attn.linear_v.weight', 'decoder.right_decoder.decoders.0.self_attn.linear_v.bias', 'decoder.right_decoder.decoders.0.self_attn.linear_out.weight', 'decoder.right_decoder.decoders.0.self_attn.linear_out.bias', 'decoder.right_decoder.decoders.0.src_attn.linear_q.weight', 'decoder.right_decoder.decoders.0.src_attn.linear_q.bias', 'decoder.right_decoder.decoders.0.src_attn.linear_k.weight', 'decoder.right_decoder.decoders.0.src_attn.linear_k.bias', 'decoder.right_decoder.decoders.0.src_attn.linear_v.weight', 'decoder.right_decoder.decoders.0.src_attn.linear_v.bias', 'decoder.right_decoder.decoders.0.src_attn.linear_out.weight', 'decoder.right_decoder.decoders.0.src_attn.linear_out.bias', 'decoder.right_decoder.decoders.0.feed_forward.w_1.weight', 'decoder.right_decoder.decoders.0.feed_forward.w_1.bias', 'decoder.right_decoder.decoders.0.feed_forward.w_2.weight', 'decoder.right_decoder.decoders.0.feed_forward.w_2.bias', 'decoder.right_decoder.decoders.0.norm1.weight', 'decoder.right_decoder.decoders.0.norm1.bias', 'decoder.right_decoder.decoders.0.norm2.weight', 'decoder.right_decoder.decoders.0.norm2.bias', 'decoder.right_decoder.decoders.0.norm3.weight', 'decoder.right_decoder.decoders.0.norm3.bias', 'decoder.right_decoder.decoders.1.self_attn.linear_q.weight', 'decoder.right_decoder.decoders.1.self_attn.linear_q.bias', 'decoder.right_decoder.decoders.1.self_attn.linear_k.weight', 'decoder.right_decoder.decoders.1.self_attn.linear_k.bias', 'decoder.right_decoder.decoders.1.self_attn.linear_v.weight', 'decoder.right_decoder.decoders.1.self_attn.linear_v.bias', 'decoder.right_decoder.decoders.1.self_attn.linear_out.weight', 'decoder.right_decoder.decoders.1.self_attn.linear_out.bias', 'decoder.right_decoder.decoders.1.src_attn.linear_q.weight', 'decoder.right_decoder.decoders.1.src_attn.linear_q.bias', 'decoder.right_decoder.decoders.1.src_attn.linear_k.weight', 'decoder.right_decoder.decoders.1.src_attn.linear_k.bias', 'decoder.right_decoder.decoders.1.src_attn.linear_v.weight', 'decoder.right_decoder.decoders.1.src_attn.linear_v.bias', 'decoder.right_decoder.decoders.1.src_attn.linear_out.weight', 'decoder.right_decoder.decoders.1.src_attn.linear_out.bias', 'decoder.right_decoder.decoders.1.feed_forward.w_1.weight', 'decoder.right_decoder.decoders.1.feed_forward.w_1.bias', 'decoder.right_decoder.decoders.1.feed_forward.w_2.weight', 'decoder.right_decoder.decoders.1.feed_forward.w_2.bias', 'decoder.right_decoder.decoders.1.norm1.weight', 'decoder.right_decoder.decoders.1.norm1.bias', 'decoder.right_decoder.decoders.1.norm2.weight', 'decoder.right_decoder.decoders.1.norm2.bias', 'decoder.right_decoder.decoders.1.norm3.weight', 'decoder.right_decoder.decoders.1.norm3.bias', 'decoder.right_decoder.decoders.2.self_attn.linear_q.weight', 'decoder.right_decoder.decoders.2.self_attn.linear_q.bias', 'decoder.right_decoder.decoders.2.self_attn.linear_k.weight', 'decoder.right_decoder.decoders.2.self_attn.linear_k.bias', 'decoder.right_decoder.decoders.2.self_attn.linear_v.weight', 'decoder.right_decoder.decoders.2.self_attn.linear_v.bias', 'decoder.right_decoder.decoders.2.self_attn.linear_out.weight', 'decoder.right_decoder.decoders.2.self_attn.linear_out.bias', 'decoder.right_decoder.decoders.2.src_attn.linear_q.weight', 'decoder.right_decoder.decoders.2.src_attn.linear_q.bias', 'decoder.right_decoder.decoders.2.src_attn.linear_k.weight', 'decoder.right_decoder.decoders.2.src_attn.linear_k.bias', 'decoder.right_decoder.decoders.2.src_attn.linear_v.weight', 'decoder.right_decoder.decoders.2.src_attn.linear_v.bias', 'decoder.right_decoder.decoders.2.src_attn.linear_out.weight', 'decoder.right_decoder.decoders.2.src_attn.linear_out.bias', 'decoder.right_decoder.decoders.2.feed_forward.w_1.weight', 'decoder.right_decoder.decoders.2.feed_forward.w_1.bias', 'decoder.right_decoder.decoders.2.feed_forward.w_2.weight', 'decoder.right_decoder.decoders.2.feed_forward.w_2.bias', 'decoder.right_decoder.decoders.2.norm1.weight', 'decoder.right_decoder.decoders.2.norm1.bias', 'decoder.right_decoder.decoders.2.norm2.weight', 'decoder.right_decoder.decoders.2.norm2.bias', 'decoder.right_decoder.decoders.2.norm3.weight', 'decoder.right_decoder.decoders.2.norm3.bias']
Desktop (please complete the following information):
Smartphone (please complete the following information):
N/A
Additional context
'encoder.embed.pos_enc.pe'
,'decoder.embed.0.weight'
, and other weights related to thedecoder
structure.'encoder.global_cmvn.mean'
,'decoder.left_decoder.embed.0.weight'
, and others.train.yaml
file. I am wondering if there is an updated version of the model or if this is a known issue that can be safely ignored.Thank you!
The text was updated successfully, but these errors were encountered: