HEADLINES

Alibaba Cloud launches open-source large vision language model

Qwen-VL is the multimodal version of Qwen-7B, Alibaba Cloud’s 7-billion-parameter model of its large language model Tongyi Qianwen (also available on ModelScope as open-source).

Upgrade Staff

Published

September 4, 2023

Alibaba Cloud, the digital technology and intelligence backbone of Alibaba Group, launched two open-source large vision language models (LVLM), Qwen-VL and its conversationally fine-tuned Qwen-VL-Chat. The models can comprehend images, texts and bounding boxes in prompts and facilitate multi-round question answering in both English and Chinese.

Qwen-VL is the multimodal version of Qwen-7B, Alibaba Cloud’s 7-billion-parameter model of its large language model Tongyi Qianwen (also available on ModelScope as open-source). Capable of understanding both image inputs and text prompts in English and Chinese, Qwen-VL can perform various tasks such as responding to open-ended queries related to different images and generating image captions.

Qwen-VL-Chat caters to more complex interaction, such as comparing multiple image inputs and engaging in multi-round question answering. Leveraging alignment techniques, this AI assistant exhibits a range of creative capabilities, which include writing poetry and stories based on input images, summarizing the content of multiple pictures, and solving mathematical questions displayed in images.

Contribution to open source and inclusivity

In a bid to democratize AI technologies, Alibaba Cloud has shared the model’s code, weights, and documentation with academics, researchers, and commercial institutions worldwide. This contribution to the open-source community is accessible via Alibaba’s AI model community ModelScope and the collaborative AI platform Hugging Face. For commercial uses, companies with over 100 million monthly active users can request a license from Alibaba Cloud.

Advertisement. Scroll to continue reading.

The introduction of these models, with their ability to extract meaning and information from images, holds the potential to revolutionize the interaction with visual content. For instance, leveraging its image comprehension and question-answering capability, the models could provide information assistance to visually impaired individuals during online shopping in the future.

The Qwen-VL model was pre-trained on image and text datasets. Compared to other open-source large vision language models that can process and understand images in 224*224 resolution, Qwen-VL can handle image input at a resolution of 448*448, resulting in better image recognition and comprehension.

Based on various benchmarks,Qwen-VL recorded outstanding performs on several visual language tasks, including zero-shot captioning, general visual question answering, text-oriented visual question answering, and object detection.

Qwen-VL-Chat has also achieved leading results in both Chinese and English for text-image dialogue and alignment levels with humans, according to the benchmark test of Alibaba Cloud. This test involved over 300 images, 800 questions, and 27 categories.

Earlier this month, Alibaba Cloud open sourced its 7-billion-parameter LLMs, Qwen-7B and Qwen-7B-Chat as its ongoing contribution to the open-source community. The two models have had over 400,000 downloads within a month of their launch.

Advertisement. Scroll to continue reading.

In this article:Alibaba Cloud, technology, technology adaption, technology investment

HEADLINES

Oracle named a Leader in Gartner Magic Quadrant for warehouse management systems

For the 10th year in a row, Oracle was recognized based on its Ability to Execute and Completeness of Vision for Oracle Fusion Cloud Warehouse...

Upgrade Staff4 hours ago

HEADLINES

Microsoft survey among mothers explores intersection of AI technology and modern-day motherhood

For generations, many moms have leaned on the wisdom and guidance passed down from their mothers and grandmothers. But in today's geographically dispersed, tech-forward...

Upgrade Staff4 hours ago

HEADLINES

Huawei named a Leader in managed infrastructure services for telcos by GlobalData

GlobalData, a research and analysis firm, released its 2025 Competitive Landscape Assessment report on Managed Infrastructure Services for Telcos. The report named Huawei as...

Upgrade Staff4 hours ago

HEADLINES

Apple Music, Universal Music Group (UMG) introduce Sound Therapy

Available exclusively on Apple Music, Sound Therapy blends songs subscribers already know and love with special sound waves designed to enhance users’ daily routines,...

Upgrade Staff4 hours ago

HEADLINES

EY.ai for tax, built with IBM watsonx, now available

EY.ai for tax, built with IBM watsonx, is powered by open-source AI models, including IBM Granite, to help organizations automate tax compliance and streamline...

Upgrade Staff5 hours ago

HEADLINES

Jobstreet by SEEK reveals in-demand jobs for the first half of 2025

Accounting is now the most in-demand job specialization, making up 11.81% of total job postings. While the demand for STEM-related courses is high, Jobstreet...

Upgrade Staff1 day ago

HEADLINES

Vertiv initiates distribution partnership with VST ECS

The distribution partnership reflects Vertiv’s strategic focus on expanding nationwide access to critical digital infrastructure technologies and supporting business resilience amid accelerating digital transformation.

Upgrade Staff1 day ago

HEADLINES

Converge announces strategic changes, fortifies executive leadership

The Board appointed Benjamin Rex Emilio B. Azada as the new Chief Operations Officer effective July 1, 2025, to handle the reins of the...

Upgrade Staff1 day ago

Search UpgradeMag.com

HEADLINES

Oracle named a Leader in Gartner Magic Quadrant for warehouse management systems

HEADLINES

Microsoft survey among mothers explores intersection of AI technology and modern-day motherhood

HEADLINES

Alaska Airlines now an airline partner of Philippine Airlines

HEADLINES

Huawei named a Leader in managed infrastructure services for telcos by GlobalData

HEADLINES

Apple Music, Universal Music Group (UMG) introduce Sound Therapy

HEADLINES

EY.ai for tax, built with IBM watsonx, now available

HEADLINES

Jobstreet by SEEK reveals in-demand jobs for the first half of 2025

HEADLINES

Vertiv initiates distribution partnership with VST ECS

HEADLINES

MERALCO subsidiary, MIESCOR, teams up with PLDT Enterprise to improve field communication and operational speed

HEADLINES

TESDA, GCash partner to expand access to digital learning and job opportunities

HEADLINES

DoubleVerify’s 2025 Global Insights Report reveals state of TV advertising in streaming

COMPUTERS

Lenovo unveils new generation of business devices for modern workplaces

HEADLINES

Oracle and Infobip enhance partnership

Laptops

NVIDIA GeForce RTX 5060 GPUs-equipped configurations launched

MOBILE PRODUCTS

Casio announces latest addition to G-SHOCK brand of shock-resistant watches

HEADLINES

IBM unveils technologies that break down longstanding barriers to scaling enterprise AI

Like Us On Facebook

You May Also Like

HEADLINES

Oracle named a Leader in Gartner Magic Quadrant for warehouse management systems

HEADLINES

Microsoft survey among mothers explores intersection of AI technology and modern-day motherhood

HEADLINES

Huawei named a Leader in managed infrastructure services for telcos by GlobalData

HEADLINES

Apple Music, Universal Music Group (UMG) introduce Sound Therapy

HEADLINES

EY.ai for tax, built with IBM watsonx, now available

HEADLINES

Jobstreet by SEEK reveals in-demand jobs for the first half of 2025

HEADLINES

Vertiv initiates distribution partnership with VST ECS

HEADLINES

Converge announces strategic changes, fortifies executive leadership