Utilizing the idea of long-term cumulative return, reinforcement learning (RL) has shown remarkable performance in various fields. We follow the formulation of landmark localization in 3D medical images as an RL problem. Whereas value-based methods have been widely used to solve RL-based localization problems, we adopt an actor-critic based direct policy search method framed in a temporal difference learning approach. In RL problems with large state and/or action spaces, learning the optimal behavior is challenging and requires many trials. To improve the learning, we introduce a partial policy-based reinforcement learning to enable solving the large problem of localization by learning the optimal policy on smaller partial domains. Independent actors efficiently learn the corresponding partial policies, each utilizing their own independent critic. The proposed policy reconstruction from the partial policies ensures a robust and efficient localization, where the sub-agents uniformly contribute to the state-transitions based on their simple partial policies mapping to binary actions. Experiments with three different localization problems in 3D CT and MR images showed that the proposed reinforcement learning requires a significantly smaller number of trials to learn the optimal behavior compared to the original behavior learning scheme in RL. It also ensures a satisfactory performance when trained on fewer images.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TMI.2019.2946345DOI Listing

Publication Analysis

Top Keywords

reinforcement learning
16
partial policies
12
learning
9
partial policy-based
8
policy-based reinforcement
8
landmark localization
8
localization medical
8
medical images
8
localization problems
8
learning optimal
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!