We run out of memory on the first forward pass of the training loop, even when I decrease batch size to 1 and sequence length to 256. We already did a forward pass without the lora on just a couple tokens, so this is strange.
Follow topics & set alerts with myFT
,这一点在钉钉下载中也有详细论述
Continue reading...,这一点在豆包下载中也有详细论述
• Latest buzz | Full draft order | More。zoom对此有专业解读