Scalable Deep Neural Network Training: Overcoming Memory Constraints with Performance Preservation
Main Article Content
Abstract
This paper introduces a novel method for training Deep Neural Networks (DNNs) with batch sizes that surpass memory constraints, without compromising model performance. Conventional methods struggle with memory limitations when using large batch sizes, often resulting in reduced efficiency and degraded training outcomes. Our proposed solution, Small-Batch Processing (SBP), overcomes these challenges by leveraging a batch streaming mechanism and a loss normalization strategy based on gradient accumulation. SBP processes smaller micro-batches sequentially, enabling large-batch training on a single device without requiring memory augmentation or additional GPUs. Experimental evaluations confirm that SBP matches the performance of traditional approaches while enabling the use of batch sizes that were previously infeasible due to memory constraints. This technique significantly enhances the potential for training DNNs in resource-limited environments, driving improvements in both scalability and training efficiency for deep learning applications.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
References
Ibrahim, G. J.; Rashid, T. A.; Akinsolu, M. O. An energy efficient service composition mechanism using a hybrid meta-heuristic algorithm in a mobile cloud environment. J. Parallel Distrib. Comput. 2020, 143, 77–87. https://doi.org/10.1016/j.jpdc.2020.05.002
Raj, D. J. S. Optimized mobile edge computing framework for IoT based medical sensor network nodes. J. Ubiquitous Comput. Commun. Technol. 2021, 3(1), 33–42. https://doi.org/10.36548/jucct.2021.1.004
Cong, P.; Zhou, J.; Li, L.; Cao, K.; Wei, T.; Li, K. A survey of hierarchical energy optimization for mobile edge computing: A perspective from end devices to the cloud. ACM Comput. Surv. 2020, 53(2), 1–44. https://doi.org/10.1145/3378935
Naouri, A.; Wu, H.; Nouri, N. A.; Dhelim, S.; Ning, H. A novel framework for mobile-edge computing by optimizing task offloading. IEEE Internet Things J. 2021, 8(16), 13065–13076. https://doi.org/10.1109/JIOT.2021.3064225
Li, G.; Yan, J.; Chen, L.; Wu, J.; Lin, Q.; Zhang, Y. Energy consumption optimization with a delay threshold in cloud-fog cooperation computing. IEEE Access 2019, 7, 159688–159697. https://doi.org/10.1109/ACCESS.2019.2950443
Tang, C.; Wei, X.; Zhu, C.; Wang, Y.; Jia, W. Mobile vehicles as fog nodes for latency optimization in smart cities. IEEE Trans. Veh. Technol. 2020, 69 (9), 9364–9375. https://doi.org/10.1109/TVT.2020.2970763
Zhang, W. Z.; Elgendy, I. A.; Hammad, M.; Iliyasu, A. M.; Du, X.; Guizani, M.; Abd El-Latif, A. A. Secure and optimized load balancing for multitier IoT and edge-cloud computing systems. IEEE Internet Things J. 2020, 8(10), 8119–8132. https://doi.org/10.1109/JIOT.2020.3042433
Wei, X.; Tang, C.; Fan, J.; Subramaniam, S. Joint optimization of energy consumption and delay in cloud-to-thing continuum. IEEE Internet Things J. 2019, 6 (2), 2325–2337. https://doi.org/10.1109/JIOT.2019.2906287
Yuvaraj, N.; Kousik, N. V.; Jayasri, S.; Daniel, A.; Rajakumar, P. A survey on various load balancing algorithm to improve the task scheduling in cloud computing environment. J. Adv. Res. Dyn. Control Syst. 2019, 11(08), 2397–2406.
Kai, C.; Zhou, H.; Yi, Y.; Huang, W. Collaborative cloud-edge-end task offloading in mobile-edge computing networks with limited communication capability. IEEE Trans. Cogn. Commun. Netw. 2020, 7(2), 624–634. https://doi.org/10.1109/TCCN.2020.3018159
Peng, H.; Wen, W. S.; Tseng, M. L.; Li, L. L. Joint optimization method for task scheduling time and energy consumption in mobile cloud computing environment. Appl. Soft Comput. 2019, 80, 534–545. https://doi.org/10.1016/j.asoc.2019.04.027
Wu, H.; Wolter, K.; Jiao, P.; Deng, Y.; Zhao, Y.; Xu, M. EEDTO: An energy-efficient dynamic task offloading algorithm for blockchain-enabled IoT-edge-cloud orchestrated computing. IEEE Internet Things J. 2020, 8(4), 2163–2176. https://doi.org/10.1109/JIOT.2020.3033521
Velmurugadass, P.; Dhanasekaran, S.; Anand, S. S.; Vasudevan, V. Enhancing Blockchain security in cloud computing with IoT environment using ECIES and cryptography hash algorithm. Mater. Today Proc. 2021, 37, 2653–2659. https://doi.org/10.1016/j.matpr.2020.08.519
Yang, P.; Zhang, Y.; Lv, J. Load optimization based on edge collaboration in software defined ultra-dense networks. IEEE Access 2020, 8, 30664–30674. https://doi.org/10.1109/ACCESS.2020.2973041
Saad, M. M.; Khan, M. T. R.; Srivastava, G.; Jhaveri, R. H.; Islam, M.; Kim, D. Cooperative vehicular networks: An optimal and machine learning approach. Comput. Electr. Eng. 2022, 103, 108348. https://doi.org/10.1016/j.compeleceng.2022.108348.