反向传播不带梯度 #8

guanhdrmq · 2023-12-10T04:21:44Z

你好，当使用反向传播到cross_modal_image_layers和cross_modal_text_layers的时候为什么会没有梯度。在BERT的modelling_bert 设置梯度的时候显示是没有，请问怎么拿到每一层的梯度，例如cross_modal_image_layers倒数第一层的特征图和梯度。谢谢

GoGoJoestar · 2023-12-11T02:04:43Z

可以具体说下"在BERT的modelling_bert 设置梯度"是指做了什么操作吗？

guanhdrmq · 2023-12-11T04:52:51Z

梯度问题已经解决。另外请问怎么获取图片特征输入的长度，假如图片是384384，patch大小是1616，那么patch个数应该是576. 请问怎么获取图片输入特征的长度。谢谢

GoGoJoestar · 2023-12-11T05:12:49Z

假设图片经过vision model (ViT) 编码后的维度是[batch_size, vision_length, hidden_size]，其中第二维vision_length表示图片特征的长度，其由图片整体特征拼上每个patch的特征组成。因此length = 1 + patch数，以图片尺寸384*384、patch大小16*16为例，patch数量为(384 / 16) ^ 2 = 576，vision_length = 1 + 576 = 577

guanhdrmq · 2023-12-11T05:24:49Z

好滴，请问怎么获取到image cross attention 和 text cross attention的qk值？因为底层调用都是huggingface的BERT的modelling_bert代码，没有重构VLE代码融合部分。谢谢。

GoGoJoestar · 2023-12-11T05:58:27Z

我们没有对cross attention内部做修改，如果要获取其中的query和key，可以考虑在models/VLE/modeling_vle.py中重写huggingface的BertAttention等相关代码

guanhdrmq · 2023-12-17T05:25:42Z

谢谢，还有一个问题，请问怎么拿到视觉的最后一层的特征，也就是�hidden_states

GoGoJoestar · 2023-12-19T06:09:27Z

VLEModel的输出中包含了最后的视觉特征，可以参照下面的代码

model = VLEModel.from_pretrained(model_name)
model_outputs = model(inputs)

# 最后的图像表示
model_outputs.image_embeds

# 最后的文本表示
model_outputs.text_embeds

guanhdrmq · 2023-12-20T01:37:11Z

好的，谢谢。还有一个问题，请问VLE模型可以在huggingface的框架上可以使用两个2080ti把模型分配在2个GPU上去吗？或者共享内存设置，例如device_map或者共享CPU内存，目前我们还没有成功，但是在一个3060可以运行或者一个4080可以运行也是勉强。谢谢

GoGoJoestar · 2023-12-20T05:47:57Z

没有这样做过，可以试试在device_map中手动指定模型的各模块分配到哪张卡上。使用device_map可能会和分布式训练冲突

device_map={
 "vision_model": 0,
 "text_model": 0,
 "text_projection_layer": 1,
 "image_projection_layer": 1,
 "token_type_embeddings": 1,
 "cross_modal_image_layers": 1,
 "cross_modal_text_layers": 1,
 "cross_modal_image_pooler": 1,
 "cross_modal_text_pooler": 1
}

guanhdrmq · 2023-12-20T09:30:11Z

您好，这是我们的代码，只能注释model.to(device，无法加载到2个2080ti上。只是在CPU上可以运行。我们只是推理不训练。请问要怎么解决好呢？谢谢
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0, 1"

import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

if name == "main":
from VLE import VLEForVQA, VLEProcessor, VLEForVQAPipeline

device_map = {
    "vision_model": 0,
    "text_model": 0,
    "text_projection_layer": 1,
    "image_projection_layer": 1,
    "token_type_embeddings": 1,
    "cross_modal_image_layers": 1,
    "cross_modal_text_layers": 1,
    "cross_modal_image_pooler": 1,
    "cross_modal_text_pooler": 1
}

model = VLEForVQA.from_pretrained("./pretrained/vle-base-for-vqa")


vle_processor = VLEProcessor.from_pretrained(
    "./pretrained/vle-base-for-vqa",
    num_labels=len(config.id2label),
    id2label=config.id2label,
    label2id=config.label2id,
    output_hidden_states=True
)
vqa_pipeline = VLEForVQAPipeline(model=model, device_map=device_map, vle_processor=vle_processor)
# vqa_pipeline = VLEForVQAPipeline(model=model, device=device, vle_processor=vle_processor)
# model.to(device)
model.eval()

dataset = VQADataset(questions=questions[:5000], annotations=annotations[:5000],)
test_dataloader = DataLoader(dataset, batch_size=1, shuffle=False)

# 计数器
correct = 0.0
total = 0

for image, text, labels in test_dataloader:
    image = image.squeeze(0)
    # image = image.to(device)
    image = tensor_to_pil(image)
    inputs = {"image": image, "question": text[0]}
    vqa_answers = vqa_pipeline(**inputs, top_k=5)

    _, _, logits, answer_list = vqa_answers
    top_answer = answer_list[0]['answer']
    print("prediction answer:", top_answer)

    true_label_index = torch.argmax(labels)
    true_label = config_idandlabel["answer_candidates"][true_label_index]
    if top_answer == true_label:
        correct = correct + 1
    total = total + 1
    print("total==================", total)
    if total % 100 == 0:
        print("total:{}".format(total))

acc = correct / total
print("acc:{:.4f}".format(acc))

GoGoJoestar · 2023-12-20T10:22:34Z

使用VLEForVQA模型的话，device_map里的模块名要调整下

device_map = {
    "vle.vision_model": 0,
    "vle.text_model": 0,
    "vle.text_projection_layer": 1,
    "vle.image_projection_layer": 1,
    "vle.token_type_embeddings": 1,
    "vle.cross_modal_image_layers": 1,
    "vle.cross_modal_text_layers": 1,
    "vle.cross_modal_image_pooler": 1,
    "vle.cross_modal_text_pooler": 1,
    "vqa_classifier": 1,
}

在加载model时传入device_map参数

model = VLEForVQA.from_pretrained("./pretrained/vle-base-for-vqa",device_map=device_map)

这时模型已经分配到两张卡上了。

关于Pipeline，似乎不支持多卡（device_map），只支持传入device。比如下面这行代码，传入device=0

vqa_pipeline = VLEForVQAPipeline(model=model, device=0, vle_processor=vle_processor)

这边又会把0,1卡上的模型全放到0卡上。多卡建议不使用Pipeline，可以参照VLEForVQAPipeline中的处理逻辑在你的代码里重写一下流程。

guanhdrmq · 2023-12-20T10:58:52Z

您好，使用model = VLEForVQA.from_pretrained("./pretrained/vle-base-for-vqa",device_map=device_map)确实用到了2个2080ti，在batch_size=1时，只跑了4个数据，显存就爆满，报出out of memory的错误，请问是不是当前的设备的原因，导致无法继续运行？如果不是的话，请问还有其他的解决方案吗？谢谢

GoGoJoestar · 2023-12-21T01:42:08Z

可以试试下面几个方法，能否降低显存使用

调整device_map的分配，让显存更均匀
使用torch.no_grad
减小model和processor的image size。具体修改模型config.json的image_size和preprocessor_config.json的crop_size和size

guanhdrmq · 2023-12-21T02:34:59Z

你好，1方案昨天已经尝试了，但是没有成功，请问可否再给个device_map字典。2 我们是需要梯度的，所以Torch.no_grad应该不会采用，已经修改了bert底层，打开记录梯度，3 已经尝试了但是模型输入图片必须得是576576，改成384384报错，请问您那边可以尝试一下resize吗？谢谢

GoGoJoestar · 2023-12-21T03:28:30Z

device_map可以根据在两张卡上的实际显存占用，调整放0号卡和1号卡的模块，比如把vle.vision_model也设为1（可能需要相应修改图像输入的device）。我们没有2080ti，没法给出更具体的设置了。
需要梯度的话显存占用确实会增加很多。是需要全部梯度吗？不需要梯度的部分模块有设置requires_grad=False吗
在模型目录下的config.json和preprocessor_config.json中修改size后是可以运行的，具体是报什么错?

guanhdrmq · 2023-12-22T05:26:04Z

如果图片改成384*384大小，造成维度不匹配了
Traceback (most recent call last):
File "D:\WorkSpace\workspace\multimodal_robustness\vle_vqav2_image.py", line 171, in
model = VLEForVQA.from_pretrained("./pretrained/vle-base-for-vqa")
File "C:\Users\Admin.conda\envs\Base\lib\site-packages\transformers\modeling_utils.py", line 3307, in from_pretrained
) = cls._load_pretrained_model(
File "C:\Users\Admin.conda\envs\Base\lib\site-packages\transformers\modeling_utils.py", line 3756, in _load_pretrained_model
raise RuntimeError(f"Error(s) in loading state_dict for {model.class.name}:\n\t{error_msg}")
RuntimeError: Error(s) in loading state_dict for VLEForVQA:
size mismatch for vle.vision_model.vision_model.embeddings.position_embedding.weight: copying a param with shape torch.Size([1297, 768]) from checkpoint, the shape in current model is torch.Size([577, 768]).
You may consider adding ignore_mismatched_sizes=True in the model from_pretrained method.

GoGoJoestar · 2023-12-22T07:48:41Z

试下在加载model后调整position_embedding的权重，使用models/VLE/modeling_vle.py中的extend_position_embedding方法。具体方式参考下面的代码或者examples/VQA/vqav2_train_module.py中的line: 68~76

patch_size = model.config.vision_config.patch_size
position_length_after = (model.config.vision_config.image_size//model.config.vision_config.patch_size)**2 + 1
position_embed_dim = model.vle.vision_model.vision_model.embeddings.position_embedding.embedding_dim

new_state_dict = extend_position_embedding(model.state_dict(), patch_size, model.config.vision_config.image_size)
model.vle.vision_model.vision_model.embeddings.position_embedding = nn.Embedding(position_length_after, position_embed_dim, device=model.vle.vision_model.vision_model.embeddings.position_embedding.weight.device)
model.vle.vision_model.vision_model.embeddings.register_buffer("position_ids", torch.arange(position_length_after, device=model.vle.vision_model.vision_model.embeddings.position_ids.device).expand((1, -1)))
model.load_state_dict(new_state_dict)

guanhdrmq · 2023-12-31T03:26:33Z

还有2个问题 1源代码用的是DeBERTa-v2 in huggingface. 2请问在A100上可以做多显卡推理吗？谢谢

GoGoJoestar · 2024-01-02T01:18:15Z

没看懂你的问题
多卡推理可以在A100上进行

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

反向传播不带梯度 #8

反向传播不带梯度 #8

guanhdrmq commented Dec 10, 2023

GoGoJoestar commented Dec 11, 2023

guanhdrmq commented Dec 11, 2023

GoGoJoestar commented Dec 11, 2023

guanhdrmq commented Dec 11, 2023

GoGoJoestar commented Dec 11, 2023

guanhdrmq commented Dec 17, 2023

GoGoJoestar commented Dec 19, 2023

guanhdrmq commented Dec 20, 2023

GoGoJoestar commented Dec 20, 2023

guanhdrmq commented Dec 20, 2023

GoGoJoestar commented Dec 20, 2023

guanhdrmq commented Dec 20, 2023

GoGoJoestar commented Dec 21, 2023

guanhdrmq commented Dec 21, 2023

GoGoJoestar commented Dec 21, 2023 •

edited

Loading

guanhdrmq commented Dec 22, 2023

GoGoJoestar commented Dec 22, 2023 •

edited

Loading

guanhdrmq commented Dec 31, 2023

GoGoJoestar commented Jan 2, 2024

反向传播不带梯度 #8

反向传播不带梯度 #8

Comments

guanhdrmq commented Dec 10, 2023

GoGoJoestar commented Dec 11, 2023

guanhdrmq commented Dec 11, 2023

GoGoJoestar commented Dec 11, 2023

guanhdrmq commented Dec 11, 2023

GoGoJoestar commented Dec 11, 2023

guanhdrmq commented Dec 17, 2023

GoGoJoestar commented Dec 19, 2023

guanhdrmq commented Dec 20, 2023

GoGoJoestar commented Dec 20, 2023

guanhdrmq commented Dec 20, 2023

GoGoJoestar commented Dec 20, 2023

guanhdrmq commented Dec 20, 2023

GoGoJoestar commented Dec 21, 2023

guanhdrmq commented Dec 21, 2023

GoGoJoestar commented Dec 21, 2023 • edited Loading

guanhdrmq commented Dec 22, 2023

GoGoJoestar commented Dec 22, 2023 • edited Loading

guanhdrmq commented Dec 31, 2023

GoGoJoestar commented Jan 2, 2024

GoGoJoestar commented Dec 21, 2023 •

edited

Loading

GoGoJoestar commented Dec 22, 2023 •

edited

Loading