TY - JOUR
T1 - Collaborative Computing in Non-Terrestrial Networks
T2 - A Multi-Time-Scale Deep Reinforcement Learning Approach
AU - Cao, Yang
AU - Lien, Shao Yu
AU - Liang, Ying Chang
AU - Niyato, Dusit
AU - Shen, Xuemin
N1 - Publisher Copyright:
© 2002-2012 IEEE.
PY - 2024/5/1
Y1 - 2024/5/1
N2 - Constructing earth-fixed cells with low-earth orbit (LEO) satellites in non-terrestrial networks (NTNs) has been the most promising paradigm to enable global coverage. The limited computing capabilities on LEO satellites however render tackling resource optimization within a short duration a critical challenge. Although the sufficient computing capabilities of the ground infrastructures can be utilized to assist the LEO satellite, different time-scale control cycles and coupling decisions between the space- and ground-segments still obstruct the joint optimization design for computing agents at different segments. To address the above challenges, in this paper, a multi-time-scale deep reinforcement learning (DRL) scheme is developed for achieving the radio resource optimization in NTNs, in which the LEO satellite and user equipment (UE) collaborate with each other to perform individual decision-making tasks with different control cycles. Specifically, the UE updates its policy toward improving value functions of both the satellite and UE, while the LEO satellite only performs finite-step rollout for decision-makings based on the reference decision trajectory provided by the UE. Most importantly, rigorous analysis to guarantee the performance convergence of the proposed scheme is provided. Comprehensive simulations are conducted to justify the effectiveness of the proposed scheme in balancing the transmission performance and computational complexity.
AB - Constructing earth-fixed cells with low-earth orbit (LEO) satellites in non-terrestrial networks (NTNs) has been the most promising paradigm to enable global coverage. The limited computing capabilities on LEO satellites however render tackling resource optimization within a short duration a critical challenge. Although the sufficient computing capabilities of the ground infrastructures can be utilized to assist the LEO satellite, different time-scale control cycles and coupling decisions between the space- and ground-segments still obstruct the joint optimization design for computing agents at different segments. To address the above challenges, in this paper, a multi-time-scale deep reinforcement learning (DRL) scheme is developed for achieving the radio resource optimization in NTNs, in which the LEO satellite and user equipment (UE) collaborate with each other to perform individual decision-making tasks with different control cycles. Specifically, the UE updates its policy toward improving value functions of both the satellite and UE, while the LEO satellite only performs finite-step rollout for decision-makings based on the reference decision trajectory provided by the UE. Most importantly, rigorous analysis to guarantee the performance convergence of the proposed scheme is provided. Comprehensive simulations are conducted to justify the effectiveness of the proposed scheme in balancing the transmission performance and computational complexity.
KW - Non-terrestrial networks (NTNs)
KW - beam management
KW - deep reinforcement learning (DRL)
KW - earth-fixed cell
KW - multi-time-scale Markov decision process (MMDPs)
KW - resource allocation
UR - http://www.scopus.com/inward/record.url?scp=85181558473&partnerID=8YFLogxK
U2 - 10.1109/TWC.2023.3323554
DO - 10.1109/TWC.2023.3323554
M3 - Article
AN - SCOPUS:85181558473
SN - 1536-1276
VL - 23
SP - 4932
EP - 4949
JO - IEEE Transactions on Wireless Communications
JF - IEEE Transactions on Wireless Communications
IS - 5
ER -