Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

反向传播不带梯度 #8

Open
guanhdrmq opened this issue Dec 10, 2023 · 19 comments
Open

反向传播不带梯度 #8

guanhdrmq opened this issue Dec 10, 2023 · 19 comments

Comments

@guanhdrmq
Copy link

你好,当使用反向传播到cross_modal_image_layers和cross_modal_text_layers的时候 为什么会没有梯度。在BERT的modelling_bert 设置梯度的时候显示是没有,请问怎么拿到每一层的梯度,例如cross_modal_image_layers倒数第一层的特征图和梯度。谢谢

@GoGoJoestar
Copy link
Contributor

可以具体说下"在BERT的modelling_bert 设置梯度"是指做了什么操作吗?

@guanhdrmq
Copy link
Author

梯度问题已经解决。另外请问怎么获取图片特征输入的长度,假如图片是384384,patch大小是1616,那么patch个数应该是576. 请问怎么获取图片输入特征的长度。谢谢

@GoGoJoestar
Copy link
Contributor

假设图片经过vision model (ViT) 编码后的维度是[batch_size, vision_length, hidden_size],其中第二维vision_length表示图片特征的长度,其由图片整体特征拼上每个patch的特征组成。因此length = 1 + patch数,以图片尺寸384*384、patch大小16*16为例,patch数量为(384 / 16) ^ 2 = 576vision_length = 1 + 576 = 577

@guanhdrmq
Copy link
Author

好滴,请问怎么获取到image cross attention 和 text cross attention的qk值?因为底层调用都是huggingface的BERT的modelling_bert代码,没有重构VLE代码融合部分。谢谢。

@GoGoJoestar
Copy link
Contributor

我们没有对cross attention内部做修改,如果要获取其中的query和key,可以考虑在models/VLE/modeling_vle.py中重写huggingface的BertAttention等相关代码

@guanhdrmq
Copy link
Author

谢谢,还有一个问题,请问怎么拿到视觉的最后一层的特征,也就是�hidden_states

@GoGoJoestar
Copy link
Contributor

VLEModel的输出中包含了最后的视觉特征,可以参照下面的代码

model = VLEModel.from_pretrained(model_name)
model_outputs = model(inputs)

# 最后的图像表示
model_outputs.image_embeds

# 最后的文本表示
model_outputs.text_embeds

@guanhdrmq
Copy link
Author

好的,谢谢。还有一个问题,请问VLE模型可以在huggingface的框架上可以使用两个2080ti把模型分配在2个GPU上去吗?或者共享内存设置,例如device_map或者共享CPU内存,目前我们还没有成功,但是在一个3060可以运行或者一个4080可以运行也是勉强。谢谢

@GoGoJoestar
Copy link
Contributor

没有这样做过,可以试试在device_map中手动指定模型的各模块分配到哪张卡上。使用device_map可能会和分布式训练冲突

device_map={
 "vision_model": 0,
 "text_model": 0,
 "text_projection_layer": 1,
 "image_projection_layer": 1,
 "token_type_embeddings": 1,
 "cross_modal_image_layers": 1,
 "cross_modal_text_layers": 1,
 "cross_modal_image_pooler": 1,
 "cross_modal_text_pooler": 1
}

@guanhdrmq
Copy link
Author

您好,这是我们的代码,只能注释model.to(device,无法加载到2个2080ti上。只是在CPU上可以运行。我们只是推理不训练。请问要怎么解决好呢?谢谢
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0, 1"

import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

if name == "main":
from VLE import VLEForVQA, VLEProcessor, VLEForVQAPipeline

device_map = {
    "vision_model": 0,
    "text_model": 0,
    "text_projection_layer": 1,
    "image_projection_layer": 1,
    "token_type_embeddings": 1,
    "cross_modal_image_layers": 1,
    "cross_modal_text_layers": 1,
    "cross_modal_image_pooler": 1,
    "cross_modal_text_pooler": 1
}

model = VLEForVQA.from_pretrained("./pretrained/vle-base-for-vqa")


vle_processor = VLEProcessor.from_pretrained(
    "./pretrained/vle-base-for-vqa",
    num_labels=len(config.id2label),
    id2label=config.id2label,
    label2id=config.label2id,
    output_hidden_states=True
)
vqa_pipeline = VLEForVQAPipeline(model=model, device_map=device_map, vle_processor=vle_processor)
# vqa_pipeline = VLEForVQAPipeline(model=model, device=device, vle_processor=vle_processor)
# model.to(device)
model.eval()

dataset = VQADataset(questions=questions[:5000], annotations=annotations[:5000],)
test_dataloader = DataLoader(dataset, batch_size=1, shuffle=False)

# 计数器
correct = 0.0
total = 0

for image, text, labels in test_dataloader:
    image = image.squeeze(0)
    # image = image.to(device)
    image = tensor_to_pil(image)
    inputs = {"image": image, "question": text[0]}
    vqa_answers = vqa_pipeline(**inputs, top_k=5)

    _, _, logits, answer_list = vqa_answers
    top_answer = answer_list[0]['answer']
    print("prediction answer:", top_answer)

    true_label_index = torch.argmax(labels)
    true_label = config_idandlabel["answer_candidates"][true_label_index]
    if top_answer == true_label:
        correct = correct + 1
    total = total + 1
    print("total==================", total)
    if total % 100 == 0:
        print("total:{}".format(total))

acc = correct / total
print("acc:{:.4f}".format(acc))

@GoGoJoestar
Copy link
Contributor

使用VLEForVQA模型的话,device_map里的模块名要调整下

device_map = {
    "vle.vision_model": 0,
    "vle.text_model": 0,
    "vle.text_projection_layer": 1,
    "vle.image_projection_layer": 1,
    "vle.token_type_embeddings": 1,
    "vle.cross_modal_image_layers": 1,
    "vle.cross_modal_text_layers": 1,
    "vle.cross_modal_image_pooler": 1,
    "vle.cross_modal_text_pooler": 1,
    "vqa_classifier": 1,
}

在加载model时传入device_map参数

model = VLEForVQA.from_pretrained("./pretrained/vle-base-for-vqa",device_map=device_map)

这时模型已经分配到两张卡上了。

关于Pipeline,似乎不支持多卡(device_map),只支持传入device。比如下面这行代码,传入device=0

vqa_pipeline = VLEForVQAPipeline(model=model, device=0, vle_processor=vle_processor)

这边又会把0,1卡上的模型全放到0卡上。多卡建议不使用Pipeline,可以参照VLEForVQAPipeline中的处理逻辑在你的代码里重写一下流程。

@guanhdrmq
Copy link
Author

您好,使用model = VLEForVQA.from_pretrained("./pretrained/vle-base-for-vqa",device_map=device_map)确实用到了2个2080ti,在batch_size=1时,只跑了4个数据,显存就爆满,报出out of memory的错误,请问是不是当前的设备的原因,导致无法继续运行?如果不是的话,请问还有其他的解决方案吗?谢谢

@GoGoJoestar
Copy link
Contributor

可以试试下面几个方法,能否降低显存使用

  1. 调整device_map的分配,让显存更均匀
  2. 使用torch.no_grad
  3. 减小model和processor的image size。具体修改模型config.json的image_size和preprocessor_config.json的crop_size和size

@guanhdrmq
Copy link
Author

你好,1方案昨天已经尝试了,但是没有成功,请问可否再给个device_map字典。2 我们是需要梯度的,所以Torch.no_grad应该不会采用,已经修改了bert底层,打开记录梯度,3 已经尝试了 但是模型输入图片必须得是576576,改成384384报错,请问您那边可以尝试一下resize吗?谢谢

@GoGoJoestar
Copy link
Contributor

GoGoJoestar commented Dec 21, 2023

  1. device_map可以根据在两张卡上的实际显存占用,调整放0号卡和1号卡的模块,比如把vle.vision_model也设为1(可能需要相应修改图像输入的device)。我们没有2080ti,没法给出更具体的设置了。
  2. 需要梯度的话显存占用确实会增加很多。是需要全部梯度吗?不需要梯度的部分模块有设置requires_grad=False吗
  3. 在模型目录下的config.json和preprocessor_config.json中修改size后是可以运行的,具体是报什么错?

@guanhdrmq
Copy link
Author

如果图片改成384*384大小,造成维度不匹配了
Traceback (most recent call last):
File "D:\WorkSpace\workspace\multimodal_robustness\vle_vqav2_image.py", line 171, in
model = VLEForVQA.from_pretrained("./pretrained/vle-base-for-vqa")
File "C:\Users\Admin.conda\envs\Base\lib\site-packages\transformers\modeling_utils.py", line 3307, in from_pretrained
) = cls._load_pretrained_model(
File "C:\Users\Admin.conda\envs\Base\lib\site-packages\transformers\modeling_utils.py", line 3756, in _load_pretrained_model
raise RuntimeError(f"Error(s) in loading state_dict for {model.class.name}:\n\t{error_msg}")
RuntimeError: Error(s) in loading state_dict for VLEForVQA:
size mismatch for vle.vision_model.vision_model.embeddings.position_embedding.weight: copying a param with shape torch.Size([1297, 768]) from checkpoint, the shape in current model is torch.Size([577, 768]).
You may consider adding ignore_mismatched_sizes=True in the model from_pretrained method.

@GoGoJoestar
Copy link
Contributor

GoGoJoestar commented Dec 22, 2023

试下在加载model后调整position_embedding的权重,使用models/VLE/modeling_vle.py中的extend_position_embedding方法。具体方式参考下面的代码或者examples/VQA/vqav2_train_module.py中的line: 68~76

patch_size = model.config.vision_config.patch_size
position_length_after = (model.config.vision_config.image_size//model.config.vision_config.patch_size)**2 + 1
position_embed_dim = model.vle.vision_model.vision_model.embeddings.position_embedding.embedding_dim

new_state_dict = extend_position_embedding(model.state_dict(), patch_size, model.config.vision_config.image_size)
model.vle.vision_model.vision_model.embeddings.position_embedding = nn.Embedding(position_length_after, position_embed_dim, device=model.vle.vision_model.vision_model.embeddings.position_embedding.weight.device)
model.vle.vision_model.vision_model.embeddings.register_buffer("position_ids", torch.arange(position_length_after, device=model.vle.vision_model.vision_model.embeddings.position_ids.device).expand((1, -1)))
model.load_state_dict(new_state_dict)

@guanhdrmq
Copy link
Author

还有2个问题 1源代码用的是DeBERTa-v2 in huggingface. 2请问在A100上可以做多显卡推理吗?谢谢

@GoGoJoestar
Copy link
Contributor

  1. 没看懂你的问题
  2. 多卡推理可以在A100上进行

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants