图层工具：Llama 视觉（高级）

LayerUtility: Llama Vision(Advance) 2026年1月30日 17:42 81 人浏览

节点功能：基于 Llama 3.2 11B Vision 模型的多模态图像理解和生成文本描述。LayerUtility: Llama Vision(Advance)-

节点中英文对比

图层工具：Llama 视觉（高级）

image

text

model

system_prompt

user_prompt

max_new_tokens

do_sample

temperature

top_p

top_k

stop_strings

seed

include_prompt_in_output

cache_model

LayerUtility: Llama Vision(Advance)

image

text

model

system_prompt

user_prompt

max_new_tokens

do_sample

temperature

top_p

top_k

stop_strings

seed

include_prompt_in_output

cache_model

图层工具：Llama 视觉（高级） - 参数说明

输入参数

image

输入图像，可为单张或多张图像。

输出参数

text

每张图像的生成文本描述，按列表输出。

控件参数

model

使用的视觉语言模型，目前仅支持 Llama-3.2-11B-Vision-Instruct-nf4。

system_prompt

系统设定提示词，用于设定模型行为，如：“You are a helpful AI assistant.”。

user_prompt

用户提示词，用于描述任务目的，如“Describe this image in natural language.”。

max_new_tokens

最多生成的 token 数。

do_sample

是否启用采样。

temperature

采样温度。越高生成越发散（推荐 0.3～0.8 之间），越低越稳定。

top_p

nucleus sampling 的概率阈值。常用设为 0.9，控制生成 token 的累计概率覆盖。

top_k

限定前 k 个 token 中采样。用于控制 token 候选范围，通常设置为 20～100。

stop_strings

停止生成的关键字符串，用英文逗号分隔。

seed

随机种子。

include_prompt_in_output

是否将输入 prompt 一并包含在输出中。

cache_model

是否缓存加载的模型。

暂无节点说明