Multimodal Models Application

In this notebook, you will implement Image Captioning application using ViT (Vision Transformers) + GPT2 and later look into latest OpenAI Clip demo using HuggingFace Transformers model. 

This website uses cookies to ensure you get the best experience. Learn more