A novel method-based reinforcement learning with deep temporal difference network for flexible double shop scheduling problem.

Xiao Wang Peisi Zhong Mei Liu Chao Zhang Shihao Yang

Sci Rep

College of Mechanical and Electronic Engineering, Shandong University of Science and Technology, Qingdao, 266590, China.

Published: April 2024

This paper studies the flexible double shop scheduling problem (FDSSP) that considers simultaneously job shop and assembly shop. It brings about the problem of scheduling association of the related tasks. To this end, a reinforcement learning algorithm with a deep temporal difference network is proposed to minimize the makespan. Firstly, the FDSSP is defined as the mathematical model of the flexible job-shop scheduling problem joined to the assembly constraint level. It is translated into a Markov decision process that directly selects behavioral strategies according to historical machining state data. Secondly, the proposed ten generic state features are input into the deep neural network model to fit the state value function. Similarly, eight simple constructive heuristics are used as candidate actions for scheduling decisions. From the greedy mechanism, optimally combined actions of all machines are obtained for each decision step. Finally, a deep temporal difference reinforcement learning framework is established, and a large number of comparative experiments are designed to analyze the basic performance of this algorithm. The results showed that the proposed algorithm was better than most other methods, which contributed to solving the practical production problem of the manufacturing industry.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11031591	PMC
http://dx.doi.org/10.1038/s41598-024-59414-8	DOI Listing

Publication Analysis

Top Keywords

reinforcement learning

deep temporal

temporal difference

scheduling problem

difference network

flexible double

double shop

shop scheduling

scheduling

problem

Similar Publications

Using Inhibitory Learning Theories to Optimise Treatment for Children with Anxiety Disorders.

Curr Top Behav Neurosci

January 2025

Black Dog Institute, University of New South Wales, Sydney, Australia.

Wenting Chen Melissa Aji Chloe Y S Lim Annabel Songco Jennifer L Hudson

Anxiety disorders in children lead to substantial impairment in functioning and development. Even the most effective gold standard treatments for childhood anxiety have 50% remission rates, suggesting a critical need to improve current treatments. Optimising exposure, the key component of anxiety treatments, represents a promising way to do so.

View Article and Find Full Text PDF

Similar Publications

Taming chimeras in coupled oscillators using soft actor-critic based reinforcement learning.

Chaos

January 2025

Complex Systems Group, Department of Mathematics and Statistics, The University of Western Australia, Crawley, Western Australia 6009, Australia.

Jianpeng Ding Youming Lei Michael Small

We propose a universal method based on deep reinforcement learning (specifically, soft actor-critic) to control the chimera state in the coupled oscillators. The policy for control is learned by maximizing the expectation of the cumulative reward in the reinforcement learning framework. With the aid of the local order parameter, we design a class of reward functions for controlling the chimera state, specifically confining the spatial position of coherent and incoherent domains to any desired lateral position of oscillators.

View Article and Find Full Text PDF

Similar Publications

Dataset of noise signals generated by smart attackers for disrupting state of health and state of charge estimations of battery energy storage systems.

Data Brief

February 2025

School of Engineering and Technology, University of New South Wales, Canberra, Australia.

Alaa Selim Huadong Mo Hemanshu Pota

This dataset is generated from real-time simulations conducted in MATLAB/Simscape, focusing on the impact of smart noise signals on battery energy storage systems (BESS). Using Deep Reinforcement Learning (DRL) agent known as Proximal Policy Optimization (PPO), noise signals in the form of subtle millivolt and milliampere variations are strategically created to represent realistic cases of False Data Injection Attacks (FDIA). These signals are designed to disrupt the State of Charge (SoC) and State of Health (SoH) estimation blocks within Unscented Kalman Filters (UKF).

View Article and Find Full Text PDF

Similar Publications

Complex Cardiovascular Management in an 80-Year-Old Female With Gastrointestinal Bleeds and Klatskin Tumor: A Case of ST-Elevation Myocardial Infarction (MI) Management Complicated by Severe Anemia and Post-MI Ventricular Septal Defect Development.

Cureus

December 2024

Interventional Cardiology, Lee Health, Fort Myers, USA.

Jesse O'Rorke Greyson Butler Ramesh Chandra

Managing acute coronary syndrome (ACS) in patients with a recent history of gastrointestinal bleeding presents a unique and challenging clinical dilemma, necessitating a careful balance between minimizing ischemic risk and avoiding potentially life-threatening rebleeding. Standard treatment for ACS typically involves dual antiplatelet therapy (DAPT) to prevent recurrent thrombotic events. However, in patients with recent gastrointestinal hemorrhage or significant anemia, these therapies may substantially increase the risk of life-threatening bleeding, complicating the decision-making process and often leading to conservative management strategies.

View Article and Find Full Text PDF

Similar Publications

Adaptive Fixed-time tracking control for large-scale nonlinear systems based on improved simplified optimized backstepping strategy.

ISA Trans

January 2025

College of Control Science and Engineering, Bohai University, Jinzhou 121013, Liaoning, China. Electronic address:

Yushan Cen Liang Cao Hongru Ren Yingnan Pan

This paper investigates the optimal fixed-time tracking control problem for a class of nonstrict-feedback large-scale nonlinear systems with prescribed performance. In the process of optimal control design, the new critic and actor neural network updating laws are proposed by adopting the fixed-time technique and the simplified reinforcement learning algorithm, which both guarantee the simplified optimal control algorithm and accelerate the convergence rate. Furthermore, the prescribed performance method is contemplated simultaneously, which ensures tracking errors can converge within the prescribed performance bounds in fixed time.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!