Multimodal Models

mul-tee-MOH-dl /ˌmʌltiˈməʊdl/

AI models that work with more than one type of input or output – such as text, images, audio or video. A multimodal model might describe an image in words, answer questions about a diagram or generate an image from text.