IEEE Trans Neural Netw Learn Syst
August 2024
For on-policy reinforcement learning (RL), discretizing action space for continuous control can easily express multiple modes and is straightforward to optimize. However, without considering the inherent ordering between the discrete atomic actions, the explosion in the number of discrete actions can possess undesired properties and induce a higher variance for the policy gradient (PG) estimator. In this article, we introduce a straightforward architecture that addresses this issue by constraining the discrete policy to be unimodal using Poisson probability distributions.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
October 2023
Predicting future trajectories of pairwise traffic agents in highly interactive scenarios, such as cut-in, yielding, and merging, is challenging for autonomous driving. The existing works either treat such a problem as a marginal prediction task or perform single-axis factorized joint prediction, where the former strategy produces individual predictions without considering future interaction, while the latter strategy conducts conditional trajectory-oriented prediction via agentwise interaction or achieves conditional rollout-oriented prediction via timewise interaction. In this article, we propose a novel double-axis factorized joint prediction pipeline, namely, conditional goal-oriented trajectory prediction (CGTP) framework, which models future interaction both along the agent and time axes to achieve goal and trajectory interactive prediction.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
September 2023
Communication-based multiagent reinforcement learning (MARL) has shown promising results in promoting cooperation by enabling agents to exchange information. However, the existing methods have limitations in large-scale multiagent systems due to high information redundancy, and they tend to overlook the unstable training process caused by the online-trained communication protocol. In this work, we propose a novel method called neighboring variational information flow (NVIF), which enhances communication among neighboring agents by providing them with the maximum information set (MIS) containing more information than the existing methods.
View Article and Find Full Text PDFRecent works have demonstrated that transformer can achieve promising performance in computer vision, by exploiting the relationship among image patches with self-attention. They only consider the attention in a single feature layer, but ignore the complementarity of attention in different layers. In this article, we propose broad attention to improve the performance by incorporating the attention relationship of different layers for vision transformer (ViT), which is called BViT.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
July 2024
Existing model-based value expansion (MVE) methods typically leverage a world model for value estimation with a fixed rollout horizon to assist policy learning. However, a proper horizon setting is essential to world-model-based policy learning. Meanwhile, choosing an appropriate horizon value is time-consuming, especially for visual control tasks.
View Article and Find Full Text PDFIn single-agent Markov decision processes, an agent can optimize its policy based on the interaction with the environment. In multiplayer Markov games (MGs), however, the interaction is nonstationary due to the behaviors of other players, so the agent has no fixed optimization objective. The challenge becomes finding equilibrium policies for all players.
View Article and Find Full Text PDFBackground: Peri-operative chemo-radiotherapyplayed important rolein locally advanced gastric cancer. Whether preoperative strategy can improve the long-term prognosis compared with postoperative treatment is unclear. The study purpose to compare oncologic outcomes in locally advanced gastric cancer patients treated with preoperative chemo-radiotherapy (pre-CRT) and postoperative chemo-radiotherapy (post-CRT).
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
August 2023
Communicating agents with each other in a distributed manner and behaving as a group are essential in multi-agent reinforcement learning. However, real-world multi-agent systems suffer from restrictions on limited bandwidth communication. If the bandwidth is fully occupied, some agents are not able to send messages promptly to others, causing decision delay and impairing cooperative effects.
View Article and Find Full Text PDFObjective: The predictive effect of preoperative chemoradiotherapy (CRT) is low and difficult in guiding individualized treatment. We examined a surrogate endpoint for long-term outcomes in locally advanced gastric cancer patients after preoperative CRT.
Methods: From April 2012 to April 2019, 95 patients with locally advanced gastric cancer who received preoperative concurrent CRT and who were enrolled in three prospective studies were included.
Multisensor fusion-based road segmentation plays an important role in the intelligent driving system since it provides a drivable area. The existing mainstream fusion method is mainly to feature fusion in the image space domain which causes the perspective compression of the road and damages the performance of the distant road. Considering the bird's eye views (BEVs) of the LiDAR remains the space structure in the horizontal plane, this article proposes a bidirectional fusion network (BiFNet) to fuse the image and BEV of the point cloud.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
April 2023
Multiagent reinforcement learning methods, such as VDN, QMIX, and QTRAN, that adopt centralized training with decentralized execution (CTDE) framework have shown promising results in cooperation and competition. However, in some multiagent scenarios, the number of agents and the size of the action set actually vary over time. We call these unshaped scenarios, and the methods mentioned above fail in performing satisfyingly.
View Article and Find Full Text PDFAlthough neural the architecture search (NAS) can bring improvement to deep models, it always neglects precious knowledge of existing models. The computation and time costing property in NAS also means that we should not start from scratch to search, but make every attempt to reuse the existing knowledge. In this article, we discuss what kind of knowledge in a model can and should be used for a new architecture design.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
September 2022
Efficient neural architecture search (ENAS) achieves novel efficiency for learning architecture with high-performance via parameter sharing and reinforcement learning (RL). In the phase of architecture search, ENAS employs deep scalable architecture as search space whose training process consumes most of the search cost. Moreover, time-consuming model training is proportional to the depth of deep scalable architecture.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
March 2022
The Nash equilibrium is an important concept in game theory. It describes the least exploitability of one player from any opponents. We combine game theory, dynamic programming, and recent deep reinforcement learning (DRL) techniques to online learn the Nash equilibrium policy for two-player zero-sum Markov games (TZMGs).
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
February 2022
The 3-D object detection is crucial for many real-world applications, attracting many researchers' attention. Beyond 2-D object detection, 3-D object detection usually needs to extract appearance, depth, position, and orientation information from light detection and ranging (LiDAR) and camera sensors. However, due to more degrees of freedom and vertices, existing detection methods that directly transform from 2-D to 3-D still face several challenges, such as exploding increase of anchors' number and inefficient or hard-to-optimize objective.
View Article and Find Full Text PDFTongue diagnosis plays a pivotal role in traditional Chinese medicine (TCM) for thousands of years. As one of the most important tongue characteristics, tooth-marked tongue is related to spleen deficiency and can greatly contribute to the symptoms differentiation and treatment selection. Yet, the tooth-marked tongue recognition for TCM practitioners is subjective and challenging.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
June 2020
This paper investigates the automatic exploration problem under the unknown environment, which is the key point of applying the robotic system to some social tasks. The solution to this problem via stacking decision rules is impossible to cover various environments and sensor properties. Learning-based control methods are adaptive for these scenarios.
View Article and Find Full Text PDFBackground: The prognostic relevance of gastric tumor location has been reported and debated. Our study was conducted to examine the differences in clinicopathological features, prognostic factors, and overall survival (OS) between patients with proximal gastric cancer (PGC) and distal gastric cancer (DGC).
Patients And Methods: Patients with PGC or DGC were identified from the China National Cancer Center Gastric Cancer Database (NCCGCDB) during 1997-2017.
IEEE Trans Cybern
August 2019
This paper is concerned about the nonlinear optimization problem of nonzero-sum (NZS) games with unknown drift dynamics. The data-based integral reinforcement learning (IRL) method is proposed to approximate the Nash equilibrium of NZS games iteratively. Furthermore, we prove that the data-based IRL method is equivalent to the model-based policy iteration algorithm, which guarantees the convergence of the proposed method.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
May 2018
The sixteen papers in this special section focus on deep reinforcement learning and adaptive dynamic programming (deep RL/ADP). Deep RL is able to output control signal directly based on input images, which incorporates both the advantages of the perception of deep learning (DL) and the decision making of RL or adaptive dynamic programming (ADP). This mechanism makes the artificial intelligence much closer to human thinking modes.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
February 2018
The ability to detect online changes in stationarity or time variance in a data stream is a hot research topic with striking implications. In this paper, we propose a novel probability density function-free change detection test, which is based on the least squares density-difference estimation method and operates online on multidimensional inputs. The test does not require any assumption about the underlying data distribution, and is able to operate immediately after having been configured by adopting a reservoir sampling mechanism.
View Article and Find Full Text PDFSum of squares (SOS) polynomials have provided a computationally tractable way to deal with inequality constraints appearing in many control problems. It can also act as an approximator in the framework of adaptive dynamic programming. In this paper, an approximate solution to the optimal control of polynomial nonlinear systems is proposed.
View Article and Find Full Text PDFPurpose: Lactate dehydrogenase (LDH), which was an indirect marker of hypoxia, was a potentially prognostic factor in several malignancies. There is a lack of evidence about the prognostic value of serum LDH level in patients with hepatocellular carcinoma (HCC) receiving sorafenib treatment from hepatitis B virus endemic areas.
Materials And Methods: A total of 119 HBV-related HCC patients treated by sorafenib from a Chinese center were included into the study.
IEEE Trans Neural Netw Learn Syst
January 2018
In this paper, the robust control problem for a class of continuous-time nonlinear system with unmatched uncertainties is investigated using an event-based control method. First, the robust control problem is transformed into a corresponding optimal control problem with an augmented control and an appropriate cost function. Under the event-based mechanism, we prove that the solution of the optimal control problem can asymptotically stabilize the uncertain system with an adaptive triggering condition.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
March 2017
H control is a powerful method to solve the disturbance attenuation problems that occur in some control systems. The design of such controllers relies on solving the zero-sum game (ZSG). But in practical applications, the exact dynamics is mostly unknown.
View Article and Find Full Text PDF