The analysis of news dissemination is of utmost importance since the credibility of information and the identification of disinformation and misinformation affect society as a whole. Given the large amounts of news data published daily on the Web, the empirical analysis of news with regard to research questions and the detection of problematic news content on the Web require computational methods that work at scale. Today's online news are typically disseminated in a multimodal form, including various presentation modalities such as text, image, audio, and video. Recent developments in multimodal machine learning now make it possible to capture basic "descriptive" relations between modalities-such as correspondences between words and phrases, on the one hand, and corresponding visual depictions of the verbally expressed information on the other. Although such advances have enabled tremendous progress in tasks like image captioning, text-to-image generation and visual question answering, in domains such as news dissemination, there is a need to go further. In this paper, we introduce a novel framework for the computational analysis of multimodal news. We motivate a set of more complex image-text relations as well as multimodal news values based on real examples of news reports and consider their realization by computational approaches. To this end, we provide (a) an overview of existing literature from where detailed proposals have been made for taxonomies covering diverse image-text relations generalisable to any domain; (b) an overview of computational work that derives models of image-text relations from data; and (c) an overview of a particular class of news-centric attributes developed in journalism studies called news values. The result is a novel framework for multimodal news analysis that closes existing gaps in previous work while maintaining and combining the strengths of those accounts. We assess and discuss the elements of the framework with real-world examples and use cases, setting out research directions at the intersection of multimodal learning, multimodal analytics and computational social sciences that can benefit from our approach.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10185854 | PMC |
http://dx.doi.org/10.3389/frai.2023.1125533 | DOI Listing |
Nature
January 2025
Department of Radiation Oncology, Stanford University School of Medicine, Stanford, CA, USA.
Clinical decision-making is driven by multimodal data, including clinical notes and pathological characteristics. Artificial intelligence approaches that can effectively integrate multimodal data hold significant promise in advancing clinical care. However, the scarcity of well-annotated multimodal datasets in clinical settings has hindered the development of useful models.
View Article and Find Full Text PDFNat Med
December 2024
National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou, China.
In many clinical and research settings, the scarcity of high-quality medical imaging datasets has hampered the potential of artificial intelligence (AI) clinical applications. This issue is particularly pronounced in less common conditions, underrepresented populations and emerging imaging modalities, where the availability of diverse and comprehensive datasets is often inadequate. To address this challenge, we introduce a unified medical image-text generative model called MINIM that is capable of synthesizing medical images of various organs across various imaging modalities based on textual instructions.
View Article and Find Full Text PDFJMIR Form Res
November 2024
Institute of Medical Information and Library, Chinese Academy of Medical Sciences/Peking Union Medical College, Beijing, China.
Background: Nutrient needs vary over the lifespan. Improving knowledge of both population groups and care providers can help with healthier food choices, thereby promoting population health and preventing diseases. Providing evidence-based food knowledge online is credible, low cost, and easily accessible.
View Article and Find Full Text PDFPest Manag Sci
February 2025
Key Laboratory of National Health and Family Planning Commission on Parasitic Disease Control and Prevention, Jiangsu Provincial Key Laboratory on Parasite and Vector Control Technology, Jiangsu Institute of Parasitic Diseases, Wuxi, China.
Vector-borne diseases (VBDs) represent a critical global public health concern, with approximately 80% of the world's population at risk of one or more VBD. Manual disease vector identification is time-consuming and expert-dependent, hindering disease control efforts. Deep learning (DL), widely used in image, text, and audio tasks, offers automation potential for disease vector identification.
View Article and Find Full Text PDFSci Data
October 2024
School of Economics and Management, University of Chinese Academy of Sciences, Beijing, 100190, China.
In this rapidly evolving era of multimodal generation, diffusion models exhibit impressive generative capabilities, significantly enhancing the realm of creative image synthesis by intricately textual prompts. Yet, their effectiveness is limited in certain niche sectors, like depicting Chinese ancient architecture. This limitation is primarily due to the insufficient data that fails to encompass the unique architectural features and corresponding text information.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!