Purpose: To assess the feasibility of a second-generation (44-channel) suprachoroidal retinal prosthesis for provision of functional vision in recipients with end-stage retinitis pigmentosa (RP) over 2.7 years.
Design: Prospective, single-arm, unmasked interventional clinical trial.
IEEE Trans Image Process
April 2024
Unsupervised object discovery (UOD) refers to the task of discriminating the whole region of objects from the background within a scene without relying on labeled datasets, which benefits the task of bounding-box-level localization and pixel-level segmentation. This task is promising due to its ability to discover objects in a generic manner. We roughly categorize existing techniques into two main directions, namely the generative solutions based on image resynthesis, and the clustering methods based on self-supervised models.
View Article and Find Full Text PDFWe propose a novel generative saliency prediction framework that adopts an informative energy-based model as a prior distribution. The energy-based prior model is defined on the latent space of a saliency generator network that generates the saliency map based on a continuous latent variables and an observed image. Both the parameters of saliency generator and the energy-based prior are jointly trained via Markov chain Monte Carlo-based maximum likelihood estimation, in which the sampling from the intractable posterior and prior distributions of the latent variables are performed by Langevin dynamics.
View Article and Find Full Text PDFVision transformers have shown great success on numerous computer vision tasks. However, their central component, softmax attention, prohibits vision transformers from scaling up to high-resolution images, due to both the computational complexity and memory footprint being quadratic. Linear attention was introduced in natural language processing (NLP) which reorders the self-attention mechanism to mitigate a similar issue, but directly applying existing linear attention to vision may not lead to satisfactory results.
View Article and Find Full Text PDFPurpose: Accurate mapping of phosphene locations from visual prostheses is vital to encode spatial information. This process may involve the subject pointing to evoked phosphene locations with their finger. Here, we demonstrate phosphene mapping for a retinal implant using eye movements and compare it with retinotopic electrode positions and previous results using conventional finger-based mapping.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
June 2022
Conventional object detection models require large amounts of training data. In comparison, humans can recognize previously unseen objects by merely knowing their semantic description. To mimic similar behavior, zero-shot object detection (ZSD) aims to recognize and localize "unseen" object instances by using only their semantic information.
View Article and Find Full Text PDFWith the help of the deep learning paradigm, many point cloud networks have been invented for visual analysis. However, there is great potential for development of these networks since the given information of point cloud data has not been fully exploited. To improve the effectiveness of existing networks in analyzing point cloud data, we propose a plug-and-play module, PnP-3D, aiming to refine the fundamental point cloud feature representations by involving more local context and global bilinear response from explicit 3D space and implicit feature space.
View Article and Find Full Text PDFDeep convolutional neural networks perform better on images containing spatially invariant degradations, also known as synthetic degradations; however, their performance is limited on real-degraded photographs and requires multiple-stage network modeling. To advance the practicability of restoration algorithms, this article proposes a novel single-stage blind real image restoration network (R²Net) by employing a modular architecture. We use a residual on the residual structure to ease low-frequency information flow and apply feature attention to exploit the channel dependencies.
View Article and Find Full Text PDFPurpose: To report the initial safety and efficacy results of a second-generation (44-channel) suprachoroidal retinal prosthesis at 56 weeks after device activation.
Methods: Four subjects, with advanced retinitis pigmentosa and bare-light perception only, enrolled in a phase II trial (NCT03406416). A 44-channel electrode array was implanted in a suprachoroidal pocket.
Purpose: In a clinical trial (NCT03406416) of a second-generation (44-channel) suprachoroidal retinal prosthesis implanted in subjects with late-stage retinitis pigmentosa (RP), we assessed performance in real-world functional visual tasks and emotional well-being.
Methods: The Functional Low-Vision Observer Rated Assessment (FLORA) and Impact of Vision Impairment-Very Low Vision (IVI-VLV) instruments were administered to four subjects before implantation and after device fitting. The FLORA contains 13 self-reported and 35 observer-reported items ranked for ease of conducting task (impossible-easy, central tendency given as mode).
We propose the first stochastic framework to employ uncertainty for RGB-D saliency detection by learning from the data labeling process. Existing RGB-D saliency detection models treat this task as a point estimation problem by predicting a single saliency map following a deterministic learning pipeline. We argue that, however, the deterministic solution is relatively ill-posed.
View Article and Find Full Text PDFPurpose: To investigate oculomotor behavior in response to dynamic stimuli in retinal implant recipients.
Methods: Three suprachoroidal retinal implant recipients performed a four-alternative forced-choice motion discrimination task over six sessions longitudinally. Stimuli were a single white bar ("moving bar") or a series of white bars ("moving grating") sweeping left, right, up, or down across a 42″ monitor.
IEEE Trans Pattern Anal Mach Intell
August 2021
The advances made in predicting visual saliency using deep neural networks come at the expense of collecting large-scale annotated data. However, pixel-wise annotation is labor-intensive and overwhelming. In this paper, we propose to learn saliency prediction from a single noisy labelling, which is easy to obtain (e.
View Article and Find Full Text PDFSuper-Resolution convolutional neural networks have recently demonstrated high-quality restoration for single images. However, existing algorithms often require very deep architectures and long training times. Furthermore, current convolutional neural networks for super-resolution are unable to exploit features at multiple scales and weigh them equally or at only static scale only, limiting their learning capability.
View Article and Find Full Text PDFOptical colonoscopy is known as a gold standard screening method in detecting and removing cancerous polyps. During this procedure, some polyps may be undetected due to their positions, not being covered by the camera or missed by the surgeon. In this Letter, the authors introduce a novel convolutional neural network (ConvNet) algorithm to map the internal colon surface to a 2D map (visibility map), which can be used to increase the awareness of clinicians about areas they might miss.
View Article and Find Full Text PDFRetinal prostheses are designed to restore a basic sense of sight to people with profound vision loss. They require a relatively intact posterior visual pathway (optic nerve, lateral geniculate nucleus and visual cortex). Retinal implants are options for people with severe stages of retinal degenerative disease such as retinitis pigmentosa and age-related macular degeneration.
View Article and Find Full Text PDFPrevious studies of age-related macular degeneration (AMD) report impaired facial expression recognition even with enlarged face images. Here, we test potential benefits of caricaturing (exaggerating how the expression's shape differs from neutral) as an image enhancement procedure targeted at mid- to high-level cortical vision. Experiment 1 provides proof-of-concept using normal vision observers shown blurred images as a partial simulation of AMD.
View Article and Find Full Text PDFAims: Previous studies and community information about everyday difficulties in age-related macular degeneration (AMD) have focussed on domains such as reading and driving. Here, we provide the first in-depth examination of how impaired face perception impacts social interactions and quality of life in AMD. We also develop a Faces and Social Life in AMD brochure and information sheet, plus accompanying conversation starter, aimed at AMD patients and those who interact with them (family, friends, nursing home staff).
View Article and Find Full Text PDFThere are multiple well-established situations in which humans' face recognition performance is poor, including for low-resolution images, other-race faces, and in older adult observers. Here we show that caricaturing faces-that is, exaggerating their appearance away from an average face-can provide a useful applied method for improving face recognition across all these circumstances. We employ a face-name learning task offering a number of methodological advantages (e.
View Article and Find Full Text PDFPatients with age-related macular degeneration (AMD) have difficulty recognising people's faces. We tested whether this could be improved using caricaturing: an image enhancement procedure derived from cortical coding in a perceptual 'face-space'. Caricaturing exaggerates the distinctive ways in which an individual's face shape differs from the average.
View Article and Find Full Text PDFPurpose: Previous behavioural studies demonstrate that face caricaturing can provide an effective image enhancement method for improving poor face identity perception in low vision simulations (e.g., age-related macular degeneration, bionic eye).
View Article and Find Full Text PDFIEEE Trans Image Process
March 2018
The purpose of this paper is to recover dense correspondence between non-rigid shapes for anatomical objects, which is a key element of disease diagnosis and analysis. We proposed a shape matching framework based on Markov random fields to obtain non-rigid correspondence. We constructed an energy function by summing up two terms where one was a unary term and the other was a binary term.
View Article and Find Full Text PDFTactile vision substitution devices present visual images as tactile representations on the skin. In this study we have tested the performance of a prototype 96-tactor vibrotactile using a subset of 64 tactors. We have determined the tactile spatial acuity and intensity discrimination in 14 naïve subjects.
View Article and Find Full Text PDFPurpose: Retinal prostheses provide vision to blind patients by eliciting phosphenes through electrical stimulation. This study explored whether character identification and image localization could be achieved through direct multiple-electrode stimulation with a suprachoroidal retinal prosthesis.
Methods: Two of three retinitis pigmentosa patients implanted with a suprachoroidal electrode array were tested on three psychophysical tasks.
The visual prosthesis (or "bionic eye") has become a reality but provides a low resolution view of the world. Simulating prosthetic vision in normal-vision observers, previous studies report good face recognition ability using tasks that allow recognition to be achieved on the basis of information that survives low resolution well, including basic category (sex, age) and extra-face information (hairstyle, glasses). Here, we test within-category individuation for face-only information (e.
View Article and Find Full Text PDF