FedLPS：具有本地参数共享的多任务异构联邦学习

2024_03_31_e466c6a8c219a200c2b8g

Yongzhe Jia , Xuyun Zhang , Amin Beheshti , Wanchun Dou
贾永哲，张旭云，阿敏·贝赫什蒂，窦万春 State Key Laboratory for Novel Software Technology,
国家重点实验室（新软件技术）Department of Computer Science and Technology, Nanjing University, China
南京大学计算机科学与技术系，中国 School of Computing, Macquarie University, Australia
澳大利亚麦考瑞大学计算机学院jiayz@smail.nju.edu.cn, {xuyun.zhang, amin.beheshti}@mq.edu.au, douwc@nju.edu.cn
jiayz@smail.nju.edu.cn, {xuyun.zhang, amin.beheshti}@mq.edu.au, douwc@nju.edu.cn jiayz@smail.nju.edu.cn, {xuyun.zhang, amin.beheshti}@mq.edu.au, douwc@nju.edu.cn

Abstract 摘要

Federated Learning (FL) has emerged as a promising solution in Edge Computing (EC) environments to process the proliferation of data generated by edge devices. By collaboratively optimizing the global machine learning models on distributed edge devices, FL circumvents the need for transmitting raw data and enhances user privacy. Despite practical successes, FL still confronts significant challenges including constrained edge device resources, multiple tasks deployment, and data heterogeneity. However, existing studies focus on mitigating the FL training costs of each single task whereas neglecting the resource consumption across multiple tasks in heterogeneous FL scenarios. In this paper, we propose heterogeneous FEDerated learning with Local Parameter Sharing (FedLPS) to fill this gap. FedLPS leverages principles from transfer learning to facilitate the deployment of multiple tasks on a single device by dividing the local model into a shareable encoder and task-specific predictors. To further reduce resource consumption, a channel-wise model pruning algorithm that shrinks the footprint of local models while accounting for both data and system heterogeneity is employed in FedLPS. Additionally, a novel heterogeneous model aggregation algorithm is proposed to aggregate the heterogeneous predictors in FedLPS. We implemented the proposed FedLPS on a real FL platform and compared it with stateof-the-art (SOTA) FL frameworks. The experimental results on five popular datasets and two modern DNN models illustrate that the proposed FedLPS significantly outperforms the SOTA FL frameworks by up to and reduces the computational resource consumption by . Our code is available at: https://github.com/jyzgh/FedLPS.
联邦学习（FL）已成为边缘计算（EC）环境中处理边缘设备生成的大量数据的一种有前景的解决方案。通过在分布式边缘设备上协同优化全局机器学习模型，FL 避免了传输原始数据并增强了用户隐私。尽管取得了实际成功，FL 仍面临着重要挑战，包括受限的边缘设备资源、多任务部署和数据异构性。然而，现有研究侧重于减少每个单一任务的 FL 训练成本，而忽视了在异构 FL 场景中跨多个任务的资源消耗。在本文中，我们提出了具有本地参数共享的异构联邦学习（FedLPS）来填补这一空白。FedLPS 利用迁移学习的原理，通过将本地模型分为可共享的编码器和任务特定的预测器，便于在单个设备上部署多个任务。为了进一步减少资源消耗，在 FedLPS 中采用了一种基于通道的模型修剪算法，该算法在考虑数据和系统异构性的同时缩小了本地模型的占用空间。此外，我们提出了一种新颖的异构模型聚合算法，用于在 FedLPS 中聚合异构预测器。我们在一个真实的 FL 平台上实现了所提出的 FedLPS，并与最先进的 FL 框架进行了比较。在五个流行数据集和两个现代 DNN 模型上的实验结果表明，所提出的 FedLPS 在性能上显著优于最先进的 FL 框架，提高了，并降低了计算资源消耗。我们的代码可在以下链接找到：https://github.com/jyzgh/FedLPS。

Introduction 介绍

Over the past decade, there has been a remarkable surge in the generation of massive amounts of data from billions of Internet of Things (IoT) devices (Khan et al. 2021). In the context of Edge Computing (EC) environments, Federated Learning (FL) has emerged as a promising solution for processing such extensive data at the edge (McMahan et al. 2017; Khan et al. 2020). FL enables various edge devices to collaboratively optimize a global Machine Learning (ML) model with the assistance of an edge server. This approach
在过去的十年中，来自数十亿物联网设备的大量数据生成出现了显著增长（Khan 等，2021）。在边缘计算（EC）环境中，联邦学习（FL）已经成为一种有前景的解决方案，用于在边缘处理如此庞大的数据（McMahan 等，2017；Khan 等，2020）。FL 使得各种边缘设备能够在边缘服务器的协助下共同优化全局机器学习（ML）模型。这种方法

entails clients (i.e., edge devices) updating the ML model using their private local data, while the central server is responsible for aggregating these updated local models. In contrast to traditional centralized ML, FL facilitates data processing on distributed edge devices where the data is generated. In addition, the distributed learning nature of FL eliminates the need for transmitting raw data to the central server, thus avoiding unnecessary communication costs and enhancing user privacy (McMahan et al. 2017; Ezzeldin et al. 2023).
涉及客户（即边缘设备）使用其私有本地数据更新机器学习模型，而中央服务器负责聚合这些更新的本地模型。与传统的集中式机器学习相比，联邦学习在生成数据的分布式边缘设备上进行数据处理。此外，联邦学习的分布式学习特性消除了将原始数据传输到中央服务器的需求，从而避免了不必要的通信成本，并增强了用户的隐私（McMahan 等，2017 年; Ezzeldin 等，2023 年）。

Despite the successful practical applications in FL, it still faces the following critical challenges: a) Limited resource budgets of edge devices. Popular Deep Neural Networks (DNNs) commonly possess larger than hundreds of megabytes of parameters that need to be trained (He et al. 2016), whereas the computational resources of edge devices are limited in EC environments (Kairouz et al. 2021). Training these unleashing DNN models on edge devices heavily hampered the learning efficiency of FL. b) Multitasking on a single device. Ideally, each edge device maintains a task-specific DNN model for each task, resulting in linear growth of training costs on edge devices. Therefore, directly deploying multiple unleashing DNN models for each specific task on edge devices is impractical (Fu et al. 2021; Ma et al. 2019; Wallingford et al. 2022). c) Data heterogeneity and system heterogeneity. On one hand, the numbers of data samples and the data distributions of edge devices are typically various (i.e., non-identically distributed data, non-IID data), resulting in the accuracy degradation of the global model (Gao, Yao, and Yang 2022; Luo et al. 2021). On the other hand, the system capabilities of edge devices, such as CPU, GPU, memory, battery power, etc., also can be various. The devices with weaker system capabilities (i.e., "stragglers") will fail to finish local training, therefore hampering the learning process of FL (Gao, Yao, and Yang 2022; Jiang et al. 2022).
尽管在联邦学习中取得了成功的实际应用，但仍面临以下关键挑战：a）边缘设备的资源预算有限。流行的深度神经网络（DNNs）通常具有超过数百兆字节的参数需要进行训练（He 等，2016 年），而边缘设备的计算资源在边缘计算环境中是有限的（Kairouz 等，2021 年）。在边缘设备上训练这些庞大的 DNN 模型严重阻碍了联邦学习的学习效率。b）单个设备上的多任务处理。理想情况下，每个边缘设备为每个任务维护一个特定的 DNN 模型，导致边缘设备上的训练成本呈线性增长。因此，在边缘设备上直接部署多个庞大的 DNN 模型来处理每个特定任务是不可行的（Fu 等，2021 年；Ma 等，2019 年；Wallingford 等，2022 年）。c）数据异构性和系统异构性。一方面，边缘设备的数据样本数量和数据分布通常各不相同（即非独立同分布数据，非 IID 数据），导致全局模型的准确性下降（Gao、Yao 和 Yang，2022 年；Luo 等，2021 年）。另一方面，边缘设备的系统能力，如 CPU、GPU、内存、电池电力等，也可能各不相同。系统能力较弱的设备（即“落后者”）将无法完成本地训练，从而阻碍了联邦学习的学习过程（高、姚和杨 2022 年；江等 2022 年）。

Several pioneering work have made efforts to mitigate these challenges through various solutions (Caldas et al. 2018; Li et al. 2020; Wang et al. 2020b; Jiang et al. 2022;

et al. 2021). Li et al. (Li et al. 2020) propose FedProx to address the data heterogeneity problem by incentivizing participants to preserve similarity with the global model, while addressing the system heterogeneity problem by accommodating low-end devices (i.e., devices with fewer system capabilities) to carry out a reduced number of local up-
几项开创性的工作通过各种解决方案努力缓解这些挑战（Caldas 等，2018 年; Li 等，2020 年; Wang 等，2020b 年; Jiang 等，2022 年; et al.，2021 年）。Li 等人（Li 等，2020 年）提出了 FedProx 来解决数据异质性问题，通过激励参与者保持与全局模型的相似性，同时通过适应低端设备（即具有较少系统能力的设备）来解决系统异质性问题，以执行较少数量的本地更新。
dates. However, FedProx does not involve model compression techniques, making it difficult to save storage resources or communication overhead. Recent work (Jiang et al. 2022; Li et al. 2021) propose to leverage model pruning techniques and personalized model aggregation to reduce the footprint of the ML model meanwhile mitigating the model accuracy loss caused by the heterogeneity problems. However, these approaches focus only on each single ML model used for a specific task and thus contribute limitedly to reducing resource consumption across multiple tasks.
然而，FedProx 并不涉及模型压缩技术，这使得难以节省存储资源或通信开销。最近的研究（Jiang 等，2022 年；Li 等，2021 年）提出利用模型修剪技术和个性化模型聚合来减小机器学习模型的占用空间，同时减轻异构问题导致的模型准确性损失。然而，这些方法仅关注于用于特定任务的每个单独的机器学习模型，因此在减少多个任务的资源消耗方面贡献有限。

In this paper, we propose heterogeneous FEDerated learning with Local Parameter Sharing (FedLPS), a novel FL framework for reducing resource consumption of multiple tasks in the heterogeneous FL environment. Specifically, we leverage the spirit of transfer learning (Yosinski et al. 2014; Zhuang et al. 2020) to allow the multiple tasks on a single device to share partial parameters of the ML models during the FL process. In contrast to existing FL frameworks, the proposed FedLPS reduces the resource consumption of devices not only in the context of a single task but also in the context of multiple tasks. We adopt channelwise model pruning techniques to reduce the footprint of the local models and satisfy the strict resource budget of edge devices. Different from pioneering work that use uniform model pruning techniques (e.g., FedDrop (Caldas et al. 2018)), FedLPS generates tailored models for each participant device to mitigate data heterogeneity problem and system heterogeneity problem. Moreover, considering that aggregating the tailored models with the popular FL aggregation algorithms (e.g., FedAvg (McMahan et al. 2017)) leads to the degradation of model performance, we further design a heterogeneous aggregation algorithm for FedLPS to generate the aggregated global model. We summarize our contributions as follows:
在本文中，我们提出了一种异构联邦学习与本地参数共享（FedLPS）的新型 FL 框架，用于减少异构 FL 环境中多个任务的资源消耗。具体而言，我们利用迁移学习的思想（Yosinski 等人，2014 年；Zhuang 等人，2020 年），允许单个设备上的多个任务在 FL 过程中共享 ML 模型的部分参数。与现有的 FL 框架相比，所提出的 FedLPS 不仅在单个任务的上下文中减少设备的资源消耗，还在多个任务的上下文中减少设备的资源消耗。我们采用通道模型修剪技术来减小本地模型的占用空间，并满足边缘设备的严格资源预算。与使用统一模型修剪技术的开创性工作（例如 FedDrop（Caldas 等人，2018 年））不同，FedLPS 为每个参与设备生成定制模型，以缓解数据异质性问题和系统异质性问题。此外，考虑到使用流行的 FL 聚合算法（例如 FedAvg（McMahan 等人） 2017 年导致模型性能下降，我们进一步设计了一种异构聚合算法用于 FedLPS 以生成聚合的全局模型。我们将我们的贡献总结如下：

We propose a novel FL framework FedLPS to reduce resource consumption of edge devices that deployed with multiple tasks. By dividing the local models on edge devices into shared encoders and task-specific predictors, FedLPS reduces the training cost across multiple tasks on edge devices.
我们提出了一种新颖的 FL 框架 FedLPS，以减少部署多个任务的边缘设备的资源消耗。通过将边缘设备上的本地模型划分为共享编码器和任务特定的预测器，FedLPS 减少了边缘设备上多个任务的训练成本。
We design a channel-wise model pruning algorithm for FedLPS to reduce the footprints of the predictors. By applying various pruning ratios, FedLPS adaptively shrinks model footprints of heterogeneous devices.
我们为 FedLPS 设计了一种逐通道模型剪枝算法，以减少预测器的占用空间。通过应用不同的剪枝比例，FedLPS 可以自适应地缩小异构设备的模型占用空间。
We present a heterogeneous model aggregation algorithm for FedLPS to aggregate heterogeneous taskspecific predictors. By utilizing the knowledge within the pre-trained backbone model, FedLPS efficiently aggregates the heterogeneous predictors.
我们为 FedLPS 提出了一种异构模型聚合算法，用于聚合异构的任务特定预测器。通过利用预训练的骨干模型中的知识，FedLPS 能够高效地聚合异构的预测器。
We implement the proposed FedLPS in a real-world FL platform FedML and extensively evaluate FedLPS with state-of-the-art FL frameworks. The experimental results demonstrated that FedLPS is effective in reducing resource consumption of edge devices while realizing heightened model accuracy.
我们在现实世界的联邦学习平台 FedML 中实施了提出的 FedLPS，并广泛评估了 FedLPS 与最先进的联邦学习框架。实验结果表明，FedLPS 在降低边缘设备资源消耗的同时实现了更高的模型准确性。

Heterogeneous Federated Learning
异构联邦学习

In the context of edge computing, federated learning is proposed to train ML models with distributed local data among edge devices (McMahan et al. 2017; Khan et al. 2020; Gao, Yao, and Yang 2022; Zhang et al. 2023). In federated learning, the raw data on edge devices will be kept locally to provide better user privacy and avoid unnecessary communication costs, and only intermediate results (e.g., parameters of models) are transmitted between the server and the devices (Khan et al. 2021). Federated learning in the edge computing environments is commonly heterogeneous in several aspects such as statistics, systems, data spaces, and models (Gao, Yao, and Yang 2022). In this paper, we focus on two of the main heterogeneous aspects: statistical heterogeneity and system heterogeneity. Statistical heterogeneity leads to non-IID distribution of data among edge devices, while system heterogeneity leads to variations in the capabilities of these devices.
在边缘计算的背景下，联邦学习被提出用于在边缘设备之间训练机器学习模型（McMahan 等，2017 年; Khan 等，2020 年; Gao，Yao 和 Yang，2022 年; Zhang 等，2023 年）。在联邦学习中，边缘设备上的原始数据将被保留在本地，以提供更好的用户隐私并避免不必要的通信成本，只有中间结果（例如模型的参数）在服务器和设备之间传输（Khan 等，2021 年）。边缘计算环境中的联邦学习通常在统计、系统、数据空间和模型等多个方面存在异质性（Gao，Yao 和 Yang，2022 年）。在本文中，我们关注两个主要的异质性方面：统计异质性和系统异质性。统计异质性导致边缘设备之间的数据分布不是独立同分布的，而系统异质性导致这些设备的能力存在差异。

Transfer Learning 迁移学习

Transfer learning is a promising machine learning methodology for transferring knowledge across different domains following different probability distributions (Yosinski et al. 2014; Long et al. 2017; Zhuang et al. 2020; Tan et al. 2023). Di et al. (Di et al. 2017) propose to transfer the knowledge of images that are taken from a certain location, aiming to alleviate the adverse impact caused by various conditions such as different weather and illumination conditions in transportation applications. Yu et al. (Yu et al. 2022) propose SPATL for addressing the resource consumption problem and data heterogeneity problem in FL. In SPATL, transfer learning is adopted to address the data heterogeneity problem by transferring the knowledge of a shared encoder to the predictors on heterogeneous clients. Tu et al. (Tu et al. 2021) propose FedDL to capture the potential relationships between users and transfer knowledge between the related users in FL, aiming to improve the performance of Human Activity Recognition (HAR) task with unbalanced and sparse user data. However, few existing work explore the transferability of ML models for multiple tasks on edge devices in FL. In contrast, we fill this gap in this paper and demonstrate the feasibility of leveraging transfer learning to reduce resource consumption of edge devices deployed with multiple tasks.
迁移学习是一种有前景的机器学习方法，可以在不同领域之间传递知识，这些领域遵循不同的概率分布（Yosinski 等，2014 年；Long 等，2017 年；Zhuang 等，2020 年；Tan 等，2023 年）。Di 等人（Di 等，2017 年）提出了从特定位置拍摄的图像的知识转移，旨在减轻交通应用中不同天气和照明条件等各种条件带来的不利影响。Yu 等人（Yu 等，2022 年）提出了 SPATL 来解决 FL 中的资源消耗问题和数据异构问题。在 SPATL 中，通过将共享编码器的知识转移到异构客户端上的预测器，采用迁移学习来解决数据异构问题。Tu 等人（Tu 等，2021 年）提出了 FedDL，以捕捉 FL 中用户之间的潜在关系，并在相关用户之间传递知识，旨在改善具有不平衡和稀疏用户数据的人体活动识别（HAR）任务的性能。然而，目前很少有研究探索 FL 中边缘设备上多任务机器学习模型的可迁移性。相比之下，我们在本文中填补了这一空白，并展示了利用迁移学习来减少部署多个任务的边缘设备的资源消耗的可行性。

Model Pruning 模型修剪

Model pruning techniques are proposed to accelerate the training and inference processes of DNN by removing the redundant parameters and structures in the DNN model (Liu et al. 2017; He, Zhang, and Sun 2017; Li et al. 2021; Jiang et al. 2022; Ye et al. 2023). Liu et al. (Liu et al. 2017) propose the network slimming scheme for Convolutional Neural Networks (CNNs), aiming to identify and remove insignificant parameters in CNNs by imposing L1 regularization on the scaling factors in batch normalization layers. Caldas et al. (Caldas et al. 2018) propose FedDrop to reduce the computational burden of local training and the cor-
模型修剪技术旨在通过删除 DNN 模型中的冗余参数和结构来加速 DNN 的训练和推理过程（Liu 等，2017 年; He，Zhang 和 Sun，2017 年; Li 等，2021 年; Jiang 等，2022 年; Ye 等，2023 年）。Liu 等人（Liu 等，2017 年）提出了卷积神经网络（CNN）的网络瘦身方案，旨在通过对批归一化层中的缩放因子施加 L1 正则化来识别和删除 CNN 中的无关参数。Caldas 等人（Caldas 等，2018 年）提出了 FedDrop 来减轻本地训练的计算负担和相关性。

Figure 1: Overview of the proposed FedLPS framework. In FedLPS, the backbone model within each client is divided into the shared encoder and task-specific predictors. The predictors subsequently pruned to reduce resource consumption. During the training process, the encoder parameters remain frozen, while the predictor parameters are updated to handle specific tasks and transmitted between the central server and the client. To elaborate, (1) local data for each task is fed into the encoder to generate embeddings. (2) The task-specific predictors utilize these embeddings to update their parameters. (3) The client sends the updated predictors to the central server. (4) The central server aggregates the predictors that have been updated on different clients but belong to the same task. (5) The central server sends the aggregated predictors back to the clients for further training rounds.
图 1：FedLPS 框架的概述。在 FedLPS 中，每个客户端内的骨干模型被分为共享编码器和任务特定的预测器。预测器随后被修剪以减少资源消耗。在训练过程中，编码器参数保持冻结，而预测器参数被更新以处理特定任务，并在中央服务器和客户端之间传输。具体而言，（1）将每个任务的本地数据输入编码器以生成嵌入。（2）任务特定的预测器利用这些嵌入来更新其参数。（3）客户端将更新后的预测器发送到中央服务器。（4）中央服务器聚合在不同客户端上已更新但属于同一任务的预测器。（5）中央服务器将聚合的预测器发送回客户端进行进一步的训练轮次。

responding communication costs of FL. FedDrop leverages lossy compression techniques to shrink the footprint of the ML model and generates identical compact local models for all devices. Jiang et al. (Jiang et al. 2022) propose FedMP to address system heterogeneity problem meanwhile saving communication bandwidth. FedMP adopts a multi-armed bandit-based online learning algorithm to calculate personalized pruning ratios for heterogeneous edge devices and a Residual Recovery Synchronous Parallel (R2SP) scheme to aggregate parameters. However, most of the existing work is on model pruning of native ML models on edge devices, and very little work has focused on transferable models applicable to multi-task scenarios. In contrast, we design an adaptive channel-wise model pruning algorithm for the transferable models in FedLPS to reduce unnecessary resource consumption in multi-task scenarios.
FL 的响应通信成本。FedDrop 利用有损压缩技术来缩小 ML 模型的占用空间，并为所有设备生成相同的紧凑本地模型。江等人（Jiang et al. 2022）提出了 FedMP 来解决系统异构性问题，同时节省通信带宽。FedMP 采用基于多臂赌博机的在线学习算法来计算异构边缘设备的个性化修剪比例，并采用剩余恢复同步并行（R2SP）方案来聚合参数。然而，现有工作大部分是关于在边缘设备上对原生 ML 模型进行模型修剪的，很少有工作专注于适用于多任务场景的可转移模型。相比之下，我们设计了一种自适应的通道级模型修剪算法，用于在 FedLPS 中的可转移模型，以减少多任务场景中不必要的资源消耗。

Design of FedLPS FedLPS 的设计

In this section, we first present an overview of the proposed FedLPS. Then, we describe how transfer learning can be used to train models for multiple tasks. Subsequently, we employ an adaptive channel-wise model pruning approach to reduce the resource consumption caused by training task-specific predictors. Finally, we present a heterogeneous model aggregation algorithm for aggregating heterogeneous predictors updated by different clients.
在本节中，我们首先介绍了提出的 FedLPS 的概述。然后，我们描述了如何使用迁移学习来训练多个任务的模型。随后，我们采用了一种自适应的逐通道模型剪枝方法，以减少训练任务特定预测器引起的资源消耗。最后，我们提出了一种用于聚合由不同客户端更新的异构预测器的异构模型聚合算法。

Overview 概述

In this paper, we propose FedLPS to efficiently train multiple task-specific models on individual clients in the con- text of FL. Fig. 1 illustrates the workflow of the proposed FedLPS framework. Distinct from existing FL frameworks that optimize a specialized model for each task on the client, in FedLPS, the backbone model

in individual client

is partitioned into a shared encoder

, and

task-specific predictors

for each task

, where

denote the task set. The backbone model can be either trained on the public dataset or trained on local data of arbitrary tasks on the client and is accessible to both the client and the central server. During the federated training process, the encoder

will be frozen while the task-specific predictor

will be updated on the local data of the task

.
在本文中，我们提出了 FedLPS，以在 FL 的背景下高效地在各个客户端上训练多个任务特定模型。图 1 展示了提出的 FedLPS 框架的工作流程。与现有的 FL 框架不同，这些框架在客户端上为每个任务优化一个专门的模型，而在 FedLPS 中，个别客户端中的骨干模型

被分成一个共享的编码器

和

每个任务

的任务特定预测器

，其中

表示任务集合。骨干模型可以在公共数据集上训练，也可以在客户端上任意任务的本地数据上训练，并且对客户端和中央服务器都是可访问的。在联邦训练过程中，编码器

将被冻结，而任务特定预测器

将在任务

的本地数据上进行更新。

We outline the federated training steps as follows: (1) The local data for each task is fed into the encoder

to generate embeddings. (2) The task-specific predictors

utilize these embeddings generated by local data of tasks

to update to update their parameters. (3) Each client

sends the updated predictors

to the central server. (4) The central server aggregates the predictors that have been updated on different clients (e.g., predictors on client

and predictors on client

in Fig. 1) but belong to the same task

. (5) The central server sends the aggregated predictors back to the clients for further training rounds.
我们将联邦训练的步骤概述如下：（1）将每个任务的本地数据输入编码器

以生成嵌入。（2）任务特定的预测器

利用任务

的本地数据生成的这些嵌入来更新其参数。（3）每个客户端

将更新后的预测器

发送到中央服务器。（4）中央服务器聚合在不同客户端上已更新的预测器（例如，图 1 中客户端

和客户端

上的预测器），但属于同一任务

。（5）中央服务器将聚合的预测器发送回客户端进行进一步的训练轮次。

Existing FL frameworks (e.g., FedAvg (McMahan et al. 2017), FedDrop (Caldas et al. 2018), Hermes (Li et al. 2021)) commonly adopt an approach where a specific model is optimized for each FL task on the client. However, this practice of optimizing multiple models on the clients leads
现有的联邦学习框架（例如 FedAvg（McMahan 等，2017 年），FedDrop（Caldas 等，2018 年），Hermes（Li 等，2021 年））通常采用一种方法，即在客户端为每个联邦学习任务优化特定模型。然而，这种在客户端上优化多个模型的做法导致
to significant resource consumption, especially for edge devices with limited capabilities. Although some efforts (Li et al. 2020; Jiang et al. 2022) have been made to reduce the training overhead on the client, few of them focus on effectively reducing training overhead across multiple FL tasks. In this subsection, we propose a novel federated training method that leverages local parameter sharing on the clients to mitigate the resource consumption associated with training multiple models for multiple tasks, meanwhile maintaining a satisfying model accuracy.

对于资源有限的边缘设备来说，这会导致显著的资源消耗。尽管已经有一些努力（Li 等人，2020 年；Jiang 等人，2022 年）试图减少客户端的训练开销，但很少有人专注于有效减少多个联邦学习任务的训练开销。在本小节中，我们提出了一种新颖的联邦训练方法，利用客户端上的本地参数共享来减轻为多个任务训练多个模型所带来的资源消耗，同时保持令人满意的模型准确性。

Inspired by the spirit of the transfer learning (Zhuang et al. 2020; Weiss, Khoshgoftaar, and Wang 2016; Yosinski et al. 2014), we explore enabling the multiple tasks on each client to share a part of model parameters, thus effectively reducing the training overhead. In parameter-sharingbased transfer learning, the lower layers of the neural networks capture more generalized features, making them suitable for sharing across multiple tasks. Conversely, the upper layers tend to capture higher-level abstract features, making them more task-specific. Building upon this observation, we divide the local backbone model

on each client into a shareable encoder

and multiple task-specific predictors

The shared encoder

comprises the first

layers of the backbone model, while the task-specific predictors

consist of the remaining

layers, where

is a tunable hyper-parameter and the

represents the total number of layers in the backbone model. In order to facilitate knowledge transfer across various tasks, the weights of shared encoder

are initialized using pre-trained values. Throughout the training process, the weights in shared encoder

remains frozen, ensuring consistent utilization across all tasks. The pre-trained weights can be sourced either from a backbone model trained on publicly available datasets or from client-participated FL tasks. In addition, to reduce the resource consumption caused by training multiple predictors, an adaptive channel-wise model pruning method is proposed for shrinking the footprints of the predictors, which is described in the next subsection in detail.
受到迁移学习的启发（Zhuang 等人，2020 年；Weiss，Khoshgoftaar 和 Wang，2016 年；Yosinski 等人，2014 年），我们探索了在每个客户端上使多个任务共享部分模型参数的方法，从而有效减少训练开销。在基于参数共享的迁移学习中，神经网络的较低层捕捉到更加通用的特征，使它们适合在多个任务之间共享。相反，较高层次的层次往往捕捉到更高级的抽象特征，使它们更加任务特定。基于这一观察，我们将每个客户端上的本地主干模型

中的权重保持冻结，确保在所有任务中的一致利用。预训练的权重可以来自在公开可用数据集上训练的骨干模型，也可以来自客户参与的联邦学习任务。此外，为了减少训练多个预测器所带来的资源消耗，提出了一种自适应的通道级模型修剪方法，用于缩小预测器的占用空间，详细描述在下一小节中。

The local training algorithm based on local parameter sharing of the proposed FedLPS framework is presented in Algorithm 1. In each communication round of FL, the client first prunes each task-specific predictor

with a pruning ratio

, aiming to reduce subsequent training costs (in line 2). Then, the client conducts forward propagation on the shared encoder

using local data

specific to each task

. The resulting embedding is denoted as

(in line 3). Subsequently, each pruned predictor

is updated with the embedding

(in line 4). Specifically, the update operation can be formulated as follows:
基于 FedLPS 框架的本地参数共享的本地训练算法在算法 1 中呈现。在每一轮 FL 的通信中，客户端首先使用修剪比率

修剪每个任务特定的预测器

，旨在减少后续的训练成本（第 2 行）。然后，客户端使用特定于每个任务

的本地数据

在共享编码器

上进行前向传播。结果嵌入表示为

（第 3 行）。随后，每个修剪的预测器

使用嵌入

进行更新（第 4 行）。具体而言，更新操作可以表示为：

where

is the learning rate,

is the loss function of task

, and

is the local gradients of the predictors

.
其中

是学习率，

是任务

的损失函数，

是预测器

的局部梯度。

Algorithm 1: Local Parameter Sharing-based Training Algorithm of FedLPS.
算法 1：基于本地参数共享的 FedLPS 训练算法。

Input: Task set

, local data

, pre-trained encoder

, pruning ratio

for pruning predictors
输入：任务集

，本地数据

，预训练编码器

，修剪预测器的修剪比例

Output: Updated predictors

输出：更新的预测器

: for task

do
对于任务

进行

Prune the predictor

with pruning ratio

using Eq. 2
使用公式 2，按照剪枝比例

修剪预测器

。

Forward propagation on shared encoder

with local data

Update weights in predictor

using Eq. 1 end for
3: 在共享编码器上进行前向传播，使用本地数据更新预测器中的权重，使用公式 1 进行更新，结束循环。

return Updated predictors

返回更新的预测器

Finally, the updated task-specific predictors

will be sent to the central server for aggregation.
最后，更新的任务特定预测器将被发送到中央服务器进行聚合。

Adaptive Channel-wise Model Pruning
自适应逐通道模型剪枝

By leveraging the division of the backbone model and the freezing of the pre-trained shared encoder, FedLPS effectively mitigates the training cost associated with the shared encoder. However, the training costs incurred by the predictors continue to pose challenges for clients due to the generally larger footprints associated with these predictors compared to the encoder. Thus, in this subsection, we proposed an adaptive channel-wise model pruning method for FedLPS to reduce the training cost of the task-specific predictors.
通过利用骨干模型的分割和预训练共享编码器的冻结，FedLPS 有效地减轻了与共享编码器相关的训练成本。然而，与编码器相比，预测器所带来的训练成本仍然对客户端构成挑战，因为这些预测器通常具有更大的占用空间。因此，在本小节中，我们提出了一种自适应的逐通道模型修剪方法，用于减少 FedLPS 的任务特定预测器的训练成本。

In the first communication round of FL, each client prunes the predictors with a punning ratio

. Differing from existing model pruning methods in FL that adopt a uniform punning ratio for each client like FedDrop (Caldas et al. 2018), FedLPS prunes the predictors on heterogeneous clients with different punning ratios

that are determined by the clients' system capability. Firstly, for each task

on the client, FedLPS evaluates the importance scores of channels in each layer of the predictor

by L1-norm (Liu et al. 2017; Li et al. 2016). Then, a fraction

of channels corresponding to the smallest importance scores is removed to achieve model pruning. Specifically, the pruning operation can be formulated as follows:
在 FL 的第一轮通信中，每个客户端使用一个修剪比例

修剪预测器。与现有的 FL 模型修剪方法（如 FedDrop）不同，FedLPS 在具有不同修剪比例

的异构客户端上修剪预测器，这些比例由客户端的系统能力确定。首先，对于客户端上的每个任务

，FedLPS 通过 L1 范数（Liu 等人，2017；Li 等人，2016）评估预测器

中每层通道的重要性分数。然后，移除与最小重要性分数对应的一部分通道，以实现模型修剪。具体而言，修剪操作可以表示为：

where

denotes the element-wise multiplication, and

is a binary mask matrix used to determine the channels to be pruned. In the mask matrix

, elements with a value of 0 indicate channels that will be pruned, while elements with a value of 1 indicate channels that will be retained. Finally, the pruned predictors

will be updated with Eq. 1.
其中

表示逐元素乘法，

是用于确定要修剪的通道的二进制掩码矩阵。在掩码矩阵

中，值为 0 的元素表示将被修剪的通道，而值为 1 的元素表示将被保留的通道。最后，修剪后的预测器

将使用公式 1 进行更新。

Heterogeneous Predictor Aggregation
异构预测器聚合

The adaptive pruning operation on the predictors is capable to shrink their footprint and thereby reducing the resource consumption of the heterogeneous clients. However, the adaptive pruning operation leads to multiple heterogeneous predictors that can not be aggregated by popular FL aggregation algorithms (e.g., FedAvg (McMahan et al.
自适应修剪操作对预测器进行了压缩，从而减少了异构客户端的资源消耗。然而，自适应修剪操作会导致多个异构预测器，这些预测器无法通过常见的联邦学习聚合算法（例如 FedAvg）进行聚合（McMahan 等人）。

2017)). Existing heterogeneous aggregation algorithms such as Hermes (Li et al. 2021) and FedMP (Jiang et al. 2022), aggregate the overlapped parameters of the heterogeneous models. Unfortunately, these algorithms result in a degradation in model performance due to the pruned parameters lacking any significant contribution to the aggregated global model. Moreover, the remained parameters are difficult to learn knowledge from other clients when these parameters are pruned in other clients. Thus, in this subsection, we propose a novel aggregation algorithm that leverages the knowledge within the pre-trained backbone model to aggregate the heterogeneous predictors.
2017 年）。现有的异构聚合算法，如 Hermes（Li 等，2021 年）和 FedMP（Jiang 等，2022 年），聚合了异构模型的重叠参数。不幸的是，由于被修剪的参数对聚合的全局模型没有显著贡献，这些算法导致了模型性能的下降。此外，当这些参数在其他客户端被修剪时，剩余的参数很难从其他客户端学习知识。因此，在本小节中，我们提出了一种新的聚合算法，利用预训练的骨干模型内的知识来聚合异构预测器。

The heterogeneous predictor aggregation algorithm of the proposed FedLPS framework is presented in Algorithm 2. In each FL communication round, the central server receives

task-specific predictors from each selected client. FedLPS first recovers the pruned parameters in each taskspecific predictor

using the backbone predictor

extracted from the backbone model and the mask matrix

on task

of client

(in line 3). The recovery operation can be formulated as follows:
在提出的 FedLPS 框架中，呈现了异构预测器聚合算法（算法 2）。在每个 FL 通信轮次中，中央服务器从每个选定的客户端接收

个任务特定的预测器。FedLPS 首先使用从骨干模型提取的骨干预测器

和任务

上的掩码矩阵

，恢复每个任务特定预测器

中的修剪参数（在第 3 行）。恢复操作可以表示为：

where

represents the recovered predictor containing the updated weights on task

of client

Subsequently, FedLPS can aggregate these recovered predictors of task

by weighted averaging (in line 5). Formally, the aggregation operation can be represented as follows:
其中

表示包含客户端

任务

上更新权重的恢复的预测器。随后，FedLPS 可以通过加权平均（在第 5 行）来聚合这些任务

的恢复的预测器。形式上，聚合操作可以表示如下：

where

denotes the entire data of task

across all clients and

denotes the local data of task

on the client

. For each task

, FedLPS aggregates the predictor

using Eq. 4 until all predictors have been aggregated.
其中

表示所有客户端上任务

的全部数据，

表示客户端

上任务

的本地数据。对于每个任务

，FedLPS 使用公式 4 对预测器

进行聚合，直到所有预测器都被聚合。

Finally, the central server sends the aggregated global predictors

back to the selected clients for the further round of local training.
最后，中央服务器将聚合的全局预测器

发送回选定的客户端，以进行进一步的本地训练。

Experimental Evaluation 实验评估

In this section, we first implement the proposed FedLPS framework on a real federated learning platform FedML (He et al. 2020) and conduct a comprehensive performance comparison of FedLPS against five state-of-the-art (SOTA) frameworks in multiple tasks-enabled FL environments. Next, we evaluate the effect of varying pruning ratios on the learning performance of the FedLPS. Finally, we evaluate the effect of the layer number

of the shared encoder on the learning performance of the FedLPS.
在本节中，我们首先在一个真实的联邦学习平台 FedML（He 等人，2020）上实现了提出的 FedLPS 框架，并在多任务启用的联邦学习环境中对 FedLPS 与五个最先进的框架进行了全面的性能比较。接下来，我们评估了不同剪枝比例对 FedLPS 学习性能的影响。最后，我们评估了共享编码器的层数

对 FedLPS 学习性能的影响。

Experimental Setting 实验设置

FL environments. In our experiments, we simulate 10 heterogeneous clients and deploy 5 classification tasks on each client. Both the ResNet18 model (He et al. 2016) and the ShuffleNetV2 model (Zhang et al. 2018) are adopted to conduct these classification tasks. The heterogeneous client set

are uniformly divided into 5 levels according to their system capabilities, denoted as

. The system capabilities are decreasing from

linearly. Only the clients in

can conduct these 5 classification tasks without resource optimizing techniques (i.e., model pruning). All the heterogeneous clients are selected to perform FL training. The experiments are conducted on a GPU server with 2 NVIDIA RTX 3080Ti GPUs, and each experiment is executed three times for calculating average metrics.
FL 环境。在我们的实验中，我们模拟了 10 个异构客户端，并在每个客户端上部署了 5 个分类任务。我们采用了 ResNet18 模型（He 等人，2016 年）和 ShuffleNetV2 模型（Zhang 等人，2018 年）来进行这些分类任务。异构客户端集合

根据其系统能力均匀分为 5 个级别，表示为

。系统能力从

到

线性递减。只有在

中的客户端可以在不进行资源优化技术（即模型修剪）的情况下执行这 5 个分类任务。选择所有异构客户端进行 FL 训练。实验在一台配备有 2 个 NVIDIA RTX 3080Ti GPU 的 GPU 服务器上进行，并且每个实验执行三次以计算平均指标。

Datasets and data partition. In our experiments, we adopt five widely recognized datasets: MNIST (LeCun et al. 1998), FashionMNIST (Xiao, Rasul, and Vollgraf 2017), SVHN (Netzer et al. 2011), CIFAR10 (Krizhevsky, Hinton et al. 2009), and CIFAR100 (Krizhevsky, Hinton et al. 2009) datasets to simulate the classification tasks. In the IID setting, each dataset is equally assigned to the clients. In the non-IID setting, we use the Latent Dirichlet Allocation (LDA) (Luo et al. 2021; Wang et al. 2020a) method to build the non-IID data. In LDA, a concentration parameter

is used to control the data heterogeneity. We adopt the conventional setting of

in our experiments to construct the non-IID data.
数据集和数据分割。在我们的实验中，我们采用了五个广泛认可的数据集：MNIST（LeCun 等，1998 年），FashionMNIST（Xiao，Rasul 和 Vollgraf，2017 年），SVHN（Netzer 等，2011 年），CIFAR10（Krizhevsky，Hinton 等，2009 年）和 CIFAR100（Krizhevsky，Hinton 等，2009 年）数据集来模拟分类任务。在 IID 设置中，每个数据集被平均分配给客户端。在非 IID 设置中，我们使用潜在狄利克雷分配（LDA）（Luo 等，2021 年；Wang 等，2020a 年）方法构建非 IID 数据。在 LDA 中，使用浓度参数

来控制数据的异质性。在我们的实验中，我们采用常规设置的

来构建非 IID 数据。

Comparison frameworks. We compare the proposed FedLPS framework with FedAvg (McMahan et al. 2017), FedDrop (Caldas et al. 2018), FedProx (Li et al. 2020), Hermes (Li et al. 2021), and FedMP (Jiang et al. 2022). FedAvg is the classical FL framework that needs each client to update the entire model, thus can involve only the clients in

to perform FL training in the heterogeneous FL environment. FedDrop leverages model pruning techniques to generate a compact global model that can be updated by the clients in

, thus involving all clients to participate in the FL training. FedProx encourages local models to maintain similarity with the global model by introducing a regularization term meanwhile allowing the clients in

to perform fewer local updates. Hermes extracts a tailored submodel for each client by structured pruning and aggregates only the intersection of the local models. FedMP prunes each local model with a dynamic pruning ratio in each round thereby enabling all clients to participate in the FL training.
比较框架。我们将提出的 FedLPS 框架与 FedAvg（McMahan 等人，2017 年），FedDrop（Caldas 等人，2018 年），FedProx（Li 等人，2020 年），Hermes（Li 等人，2021 年）和 FedMP（Jiang 等人，2022 年）进行比较。FedAvg 是经典的联邦学习框架，需要每个客户端更新整个模型，因此只能在异构联邦学习环境中涉及到

的客户端进行联邦学习训练。FedDrop 利用模型修剪技术生成一个紧凑的全局模型，可以由

的客户端进行更新，从而使所有客户端参与联邦学习训练。FedProx 通过引入正则化项鼓励本地模型与全局模型保持相似，同时允许

的客户端进行较少的本地更新。Hermes 通过结构化修剪为每个客户端提取一个定制的子模型，并仅聚合本地模型的交集。FedMP 在每一轮中使用动态修剪比率修剪每个本地模型，从而使所有客户端参与联邦学习训练。

Data

partition

FL frameworks

MNIST

Fashion-

MNIST

SVHN

CIFAR10

CIFAR100

Average

IID

FedAvg(McMahan et al. 2017)

97.07

86.35

89.32

64.15

25.27

72.43

FedDrop(Caldas et al. 2018)

77.92

73.26

37.28

56.77

23.97

53.84

FedProx(Li et al. 2020)

96.63

86.10

89.41

73.88

38.65

76.93

Hermes(Li et al. 2021)

97.92

87.73

91.24

76.32

38.92

78.43

FedMP(Jiang et al. 2022)

97.08

87.29

88.45

74.20

38.78

77.16

FedLPS (Ours)

Non-IID

FedAvg(McMahan et al. 2017)

56.03

59.36

79.23

26.39

20.04

48.21

FedDrop(Caldas et al. 2018)

77.55

61.81

36.37

41.91

23.06

48.14

FedProx(Li et al. 2020)

93.77

83.16

82.96

62.14

37.17

71.84

Fedmes(Li et al. 2021)

95.32

82.53

86.29

60.09

38.26

72.50

FedLPS (Jiang et al. 2022)

95.60

82.34

61.59

37.19

72.19

Table 1: Comparison of model accuracy (%) on the ShuffleNetV2 model with both IID and non-IID data.
表 1：ShuffleNetV2 模型在 IID 和非 IID 数据上的模型准确率（%）比较。

Data

partition

FL frameworks

MNIST

Fashion-

MNIST

SVHN

CIFAR10

CIFAR100

Average

IID

FedAvg(McMahan et al. 2017)

98.38

88.82

93.74

76.74

35.06

78.55

FedDrop(Caldas et al. 2018)

88.59

83.19

69.37

58.39

23.41

64.59

FedProx(Li et al. 2020)

98.60

90.30

95.05

85.31

54.37

84.73

FedMP(Jiang et al. 2022)

98.67

91.32

95.46

81.58

54.54

84.31

FedLPS (Ours)

FedAvg(McMahan et al. 2017)

65.82

66.38

84.58

31.52

27.59

55.18

FedDrop(Caldas et al. 2018)

82.19

77.24

69.05

42.55

19.73

58.15

Non-IID

FedProx(Li et al. 2020)

95.87

84.07

91.05

67.60

53.18

78.35

partition

Hermes(Li et al. 2021)

97.05

84.00

71.81

53.53

79.70

FedMP(Jiang et al. 2022)

96.60

83.78

87.66

67.42

42.53

75.60

FedLPS (Ours)

91.98

Table 2: Comparison of model accuracy (%) on the ResNet18 model with both IID and non-IID data.
表 2：在 ResNet18 模型上使用 IID 和非 IID 数据进行模型准确率（%）的比较。

To conduct fair comparisons, we adopt the same training hyper-parameters for FedLPS and the comparison frameworks in our experiments. The training hyper-parameters are provided in the supplemental materials in detail.
为了进行公平比较，在我们的实验中，我们采用了与 FedLPS 和比较框架相同的训练超参数。训练超参数的详细信息已在补充材料中提供。

Learning Performance across Multiple Tasks
学习表现在多个任务中的表现

We compare the model accuracy of the FedLPS with FedAvg, FedDrop, FedProx, Hermes, and FedMP on both IID and non-IID data. In this experiment, the layer number of the shared encoder in FedLPS is set as

, i.e., the first

layers within the backbone model are used to build the shared encoder. For FedLPS, Hermes, and FedMP frameworks, the pruning ratios

adopted by the 5 levels of clients are

, respectively. The backbone model is pre-trained on ImageNet (Russakovsky et al. 2015).
我们在 IID 和非 IID 数据上比较了 FedLPS 与 FedAvg、FedDrop、FedProx、Hermes 和 FedMP 的模型准确性。在这个实验中，FedLPS 中共享编码器的层数设置为

，即在骨干模型中的前

层用于构建共享编码器。对于 FedLPS、Hermes 和 FedMP 框架，5 个级别的客户端采用的剪枝比例分别为

。骨干模型在 ImageNet 上进行了预训练（Russakovsky 等人，2015 年）。

Table 1 shows the accuracy on the ShuffleNetV2 model with both IID and non-IID data, while Table 2 shows the accuracy on the ResNet18 model. Additionally, we provide a comparative analysis of communication overhead in the supplemental materials, assessed through the footprints of the transmitted models. On both the ShuffleNetV2 and ResNet 18 models, FedLPS outperforms the comparison frameworks in terms of average model accuracy. The superiority of FedLPS can be attributed to three reasons: firstly, the pre-trained shareable encoder is sophisticated to extract the low-level features and produce general embedding from the local data. Secondly, the task-specific predictors are pruned elaborately to satisfy the resource constraints, and the pruned predictors are trained on each task separately therefore suitable for performing the specific tasks. Thirdly, the heterogeneous predictor aggregation algorithm used in FedLPS leverages the knowledge in the backbone model to assist the aggregation of the local predictors, thus making the predictors learn from other clients better.
表 1 显示了 ShuffleNetV2 模型在 IID 和非 IID 数据上的准确性，而表 2 显示了 ResNet18 模型的准确性。此外，我们通过传输模型的占用空间来提供通信开销的比较分析，具体内容请参见补充材料。在 ShuffleNetV2 和 ResNet18 模型上，FedLPS 在平均模型准确性方面优于其他比较框架。FedLPS 的优越性可以归因于三个原因：首先，预训练的可共享编码器能够复杂地提取低级特征并从本地数据中生成通用嵌入。其次，任务特定的预测器经过精心修剪以满足资源限制，并且修剪后的预测器在每个任务上单独训练，因此适合执行特定任务。第三，FedLPS 中使用的异构预测器聚合算法利用了骨干模型中的知识来辅助本地预测器的聚合，从而使预测器能够更好地从其他客户端中学习。

Effect of Pruning Ratios
修剪比例的影响

In the model pruning-enabled FL frameworks, the resource consumption of the training local model is significantly re-
在支持模型剪枝的联邦学习框架中，训练本地模型的资源消耗显著减少

(a) CIFAR10 CIFAR10

(b) CIFAR100 CIFAR100
Figure 2: Model accuracy of Hermes, FedMP, and FedLPS on the RseNet 18 model with different pruning ratios

on the non-IID setting of the CIFAR10 and CIFAR100 datasets.
图 2：Hermes、FedMP 和 FedLPS 在 RseNet 18 模型上，针对 CIFAR10 和 CIFAR100 数据集的非独立同分布设置下，使用不同剪枝比率

的模型准确率。

duced, whereas the model accuracy will decrease when the pruning ratio exceeds a threshold. Thus, we investigate the effect of the pruning ratio on model accuracy in this subsection. In this experiment, the proposed FedLPS framework is compared with two SOTA model pruning-enabled frameworks Hermes and FedMP, and the pruning ratio

is set as 0.2, 0.4, 0.6, and 0.8. Fig. 2(a) and Fig. 2(b) show the model accuracy of the ResNet18 model on the non-IID setting of the CIFAR 10 dataset and the CIFAR100 dataset during 100 FL rounds, respectively. FedLPS outperforms Hermes and FedMP when the pruning ratio

ranges from 0.2 to 0.8 in both the CIFAR10 dataset and the CIFAR100 dataset, although all of them adopt model pruning techniques to shrink the model footprint. The detailed footprints of the predictors within FedLPS are provided in the supplemental materials.
当剪枝比例超过阈值时，模型准确性将下降。因此，在本小节中，我们研究了剪枝比例对模型准确性的影响。在这个实验中，我们将提出的 FedLPS 框架与两个 SOTA 模型剪枝框架 Hermes 和 FedMP 进行比较，剪枝比例设置为 0.2、0.4、0.6 和 0.8。图 2(a)和图 2(b)分别显示了 ResNet18 模型在 CIFAR 10 数据集和 CIFAR100 数据集的非 IID 设置下，在 100 个 FL 轮次中的模型准确性。当剪枝比例从 0.2 到 0.8 时，FedLPS 在 CIFAR10 数据集和 CIFAR100 数据集中表现优于 Hermes 和 FedMP，尽管它们都采用了模型剪枝技术来缩小模型的占用空间。FedLPS 中预测器的详细占用空间情况请参见补充材料。

In FedLPS, the larger the layer number

assigned to the shared encoder, the more training costs can be saved. In this experiment, we investigate the effect of the layer number

of the shared encoder on the training cost and model accuracy. The layer number

is set as

, and

, respectively. The floating-point operations (FLOPs) of the model are measured to indicate the training cost. In this experiment, the ResNet 18 model containing 34.36M FLOPs is adopted. Fig. 3 shows the model accuracy of MNIST, FashionMNIST, SVHN, CIFAR10, and CIFAR 100 datasets while Table 3 reports the FLOPs of the sum of the shared encoder and all task-specific predictors (before adaptive pruning). In FedLPS, the FLOPs linearly decrease as the layer number

increases, resulting in a substantial reduction in training costs. Thus, FedLPS are more suitable for clients that are deployed with multiple tasks than existing FL frameworks as the training cost of multiple tasks can be easily reduced by building a shared encoder with

-layer. Take the clients in

that deployed with 5 tasks as an example, in Fig. 3 and Table 3, FedLPS achieves

and

of average accuracy with

fewer FLOPs (i.e.,

in the IID setting and non-IID setting, respectively. Whereas the SOTA frameworks only realize

and

of average accuracy (in Table 2) with

FLOPs (

FLOPs of the original ResNet18 model).
在 FedLPS 中，分配给共享编码器的层号

越大，可以节省更多的训练成本。在这个实验中，我们研究了共享编码器的层号

对训练成本和模型准确性的影响。层号

分别设置为

和

。浮点运算（FLOPs）用于衡量模型的训练成本。在这个实验中，采用了包含 34.36M FLOPs 的 ResNet 18 模型。图 3 显示了 MNIST、FashionMNIST、SVHN、CIFAR10 和 CIFAR 100 数据集的模型准确性，而表 3 报告了共享编码器和所有任务特定预测器（自适应修剪之前）的 FLOPs。在 FedLPS 中，随着层号

的增加，FLOPs 线性减少，从而大大降低了训练成本。因此，与现有的 FL 框架相比，FedLPS 更适合部署有多个任务的客户端，因为通过构建一个具有

层的共享编码器，可以轻松减少多个任务的训练成本。以部署有 5 个任务的

客户端为例，在图中。 3 和表 3 中，FedLPS 在平均准确率上分别比 SOTA 框架高

和

，并且 FLOPs 更少（即在 IID 设置和非 IID 设置下分别为

）。而 SOTA 框架仅在表 2 中实现了平均准确率的

和

，但需要

的 FLOPs（原始 ResNet18 模型的

倍）。

(a) IID setting (a) IID 设置

(b) Non-IID setting (b) 非独立同分布设置
Figure 3: Model accuracy of FedLPS with different layer numbers

on the IID setting and the non-IID setting. Where F-MNIST represents FashionMNIST dataset.
图 3：在 IID 设置和非 IID 设置下，FedLPS 在不同层数

上的模型准确性。其中 F-MNIST 代表 FashionMNIST 数据集。

Layer

number

FLOPs

Reduction

on FLOPs

Average

accuracy(%)

Table 3: Total FLOPs of FedLPS with different layer numbers

and corresponding model average accuracy on both IID data and non-IID data. (The accuracy of IID data is outside the parentheses and the accuracy of non-IID data is inside the parentheses.)
表 3：FedLPS 在不同层数

下的总 FLOPs 以及对应的 IID 数据和非 IID 数据的模型平均准确率。（IID 数据的准确率在括号外，非 IID 数据的准确率在括号内。）

Conclusion 结论

In this paper, we have proposed FedLPS for multiple-tasksenabled heterogeneous FL environments, aiming to reduce the resource consumption of the clients during the FL training process while maintaining satisfying model accuracy. FedLPS realizes local parameter sharing by dividing the local model into a shareable encoder and multiple task-specific predictors, thus achieving the reduction of training costs across multiple tasks on individual clients. To tackle the system heterogeneity problem, an adaptive channel-wise model pruning method is proposed for FedLPS to allow the heterogeneous clients to participate in the FL training with heterogeneous task-specific predictors. Furthermore, a novel aggregation algorithm is proposed for FedLPS to efficiently aggregate the heterogeneous predictors with the assistance of the knowledge within the pre-trained backbone model. The comparison results on five popular datasets and two modern DNN models demonstrated the superiority of the FedLPS in terms of both average model accuracy and resource consumption.
本文提出了 FedLPS，用于多任务启用的异构 FL 环境，旨在在保持满意的模型准确性的同时减少客户端在 FL 训练过程中的资源消耗。FedLPS 通过将本地模型分为可共享的编码器和多个任务特定的预测器来实现本地参数共享，从而在各个客户端上实现多任务训练成本的降低。为了解决系统异构性问题，提出了一种自适应的通道级模型修剪方法，使得异构客户端可以使用异构的任务特定预测器参与 FL 训练。此外，还提出了一种新颖的聚合算法，通过预训练的骨干模型的知识，高效地聚合异构的预测器。在五个流行数据集和两个现代 DNN 模型上的比较结果表明，FedLPS 在平均模型准确性和资源消耗方面具有优势。

Limitations and Prospects for Future Research. The hyper-parameter

employed in FedLPS currently lacks the flexibility for dynamic adjustments during FL training, a promising future direction involves delving into the nuanced interactions among the layer number

, the pruning ratio

, and the ensuing model accuracy. Moreover, the scope of this study is confined to utilizing FedLPS exclusively for classification tasks. To extend the applicability of FedLPS, an exciting direction involves comprehensively exploring its performance across diverse task domains.
未来研究的局限性和前景。FedLPS 目前使用的超参数

在 FL 训练过程中缺乏动态调整的灵活性，一个有前景的未来方向是深入研究层次数

、修剪比例

和随后的模型准确性之间微妙的相互作用。此外，本研究的范围仅限于将 FedLPS 专门用于分类任务。为了扩展 FedLPS 的适用性，一个令人兴奋的方向是全面探索其在不同任务领域中的性能。

Acknowledgments 致谢

This research is supported part by the National Key Research and Development Program of China No.2020YFB1707601, National Natural Science Foundation of China No. 92267104.
这项研究得到了中国国家重点研发计划的部分支持，编号为 2020YFB1707601，以及中国国家自然科学基金的支持，编号为 92267104。

References 参考文献

Caldas, S.; Konečny, J.; McMahan, H. B.; and Talwalkar, A. 2018. Expanding the reach of federated learning by reducing client resource requirements. arXiv preprint arXiv:1812.07210.
Caldas, S.; Konečny, J.; McMahan, H. B.; and Talwalkar, A. 2018. 通过减少客户端资源需求扩展联邦学习的范围。arXiv 预印本 arXiv:1812.07210。

Di, S.; Zhang, H.; Li, C.-G.; Mei, X.; Prokhorov, D.; and Ling, H. 2017. Cross-domain traffic scene understanding: A dense correspondence-based transfer learning approach. IEEE transactions on intelligent transportation systems, 19(3): 745-757.
迪，S.；张，H.；李，C.-G.；梅，X.；普罗霍罗夫，D.；和凌，H. 2017 年。跨领域交通场景理解：一种基于密集对应的迁移学习方法。《IEEE 智能交通系统交易》，19（3）：745-757。

Ezzeldin, Y. H.; Yan, S.; He, C.; Ferrara, E.; and Avestimehr, A. S. 2023. Fairfed: Enabling group fairness in federated learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, 7494-7502.
Ezzeldin, Y. H.; Yan, S.; He, C.; Ferrara, E.; and Avestimehr, A. S. 2023. Fairfed: 在联邦学习中实现群体公平性。在人工智能 AAAI 会议论文集中，卷 37，7494-7502。

Fu, C.; Huang, H.; Chen, X.; Tian, Y.; and Zhao, J. 2021. Learn-to-share: A hardware-friendly transfer learning framework exploiting computation and parameter sharing. In International Conference on Machine Learning, 34693479. PMLR.
傅，陈，黄，田和赵。2021 年。学习共享：一种利用计算和参数共享的硬件友好的迁移学习框架。在机器学习国际会议上，34693479。PMLR。

Gao, D.; Yao, X.; and Yang, Q. 2022. A Survey on Heterogeneous Federated Learning. arXiv preprint arXiv:2210.04505.
高，D.；姚，X.；和杨，Q. 2022 年。关于异构联邦学习的调查。arXiv 预印本 arXiv:2210.04505。

He, C.; Li, S.; So, J.; Zhang, M.; Wang, H.; Wang, X.; Vepakomma, P.; Singh, A.; Qiu, H.; Shen, L.; Zhao, P.; Kang, Y.; Liu, Y.; Raskar, R.; Yang, Q.; Annavaram, M.; and Avestimehr, S. 2020. FedML: A Research Library and Benchmark for Federated Machine Learning. arXiv preprint arXiv:2007.13518.
他，C.；李，S.；苏，J.；张，M.；王，H.；王，X.；Vepakomma，P.；辛格，A.；邱，H.；沈，L.；赵，P.；康，Y.；刘，Y.；Raskar，R.；杨，Q.；Annavaram，M.；和 Avestimehr，S. 2020. FedML：联邦机器学习的研究库和基准。arXiv 预印本 arXiv:2007.13518。

He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770-778.
何凯明，张晓东，任少远和孙剑。2016 年。深度残差学习用于图像识别。在计算机视觉和模式识别的 IEEE 会议论文集中，770-778 页。

He, Y.; Zhang, X.; and Sun, J. 2017. Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE international conference on computer vision, 13891397.
他，Y.；张，X.；和孙，J. 2017 年。用于加速非常深的神经网络的通道修剪。在 IEEE 国际计算机视觉会议论文集中，1389-1397 页。

Jiang, Z.; Xu, Y.; Xu, H.; Wang, Z.; Qiao, C.; and Zhao, Y. 2022. FedMP: Federated Learning through Adaptive Model Pruning in Heterogeneous Edge Computing. In 2022 IEEE 38th International Conference on Data Engineering (ICDE), 767-779.
江，Z.；徐，Y.；徐，H.；王，Z.；乔，C.；赵，Y. 2022 年。FedMP：异构边缘计算中通过自适应模型修剪进行联邦学习。在 2022 年 IEEE 第 38 届国际数据工程大会（ICDE）中，767-779 页。

Kairouz, P.; McMahan, H. B.; Avent, B.; Bellet, A.; Bennis, M.; Bhagoji, A. N.; Bonawitz, K.; Charles, Z.; Cormode, G.; Cummings, R.; et al. 2021. Advances and open problems in federated learning. Foundations and Trends® in Machine Learning, 14(1-2): 1-210.
凯鲁兹，P.；麦克马汉，H.B.；阿文特，B.；贝莱特，A.；贝尼斯，M.；巴戈吉，A.N.；博纳维茨，K.；查尔斯，Z.；科莫德，G.；卡明斯，R.；等。2021 年。联邦学习中的进展和开放问题。《机器学习基础与趋势》, 14(1-2): 1-210。

Khan, L. U.; Pandey, S. R.; Tran, N. H.; Saad, W.; Han, Z.; Nguyen, M. N.; and Hong, C. S. 2020. Federated learning for edge networks: Resource optimization and incentive mechanism. IEEE Communications Magazine, 58(10): 8893.
汗，L. U.；潘迪，S. R.；特兰，N. H.；萨德，W.；韩，Z.；阮，M. N.；和洪，C. S. 2020 年。边缘网络的联邦学习：资源优化和激励机制。IEEE 通信杂志，58（10）：8893。
Khan, L. U.; Saad, W.; Han, Z.; Hossain, E.; and Hong, C. S. 2021. Federated Learning for Internet of Things: Recent Advances, Taxonomy, and Open Challenges. IEEE Communications Surveys & Tutorials, 23(3): 1759-1799.
汗，L. U.；萨德，W.；韩，Z.；侯赛因，E.；和洪，C. S. 2021 年。物联网的联邦学习：最新进展、分类和开放挑战。IEEE 通信调查与教程，23（3）：1759-1799。

Krizhevsky, A.; Hinton, G.; et al. 2009. Learning multiple layers of features from tiny images. Technical report.
Krizhevsky, A.; Hinton, G.; 等. 2009. 从微小图像中学习多层特征。技术报告。

LeCun, Y.; Bottou, L.; Bengio, Y.; and Haffner, P. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11): 2278-2324.
LeCun, Y.; Bottou, L.; Bengio, Y.; and Haffner, P. 1998. 基于梯度的学习应用于文档识别。IEEE 会议论文集，86(11): 2278-2324。

Li, A.; Sun, J.; Li, P.; Pu, Y.; Li, H.; and Chen, Y. 2021. Hermes: an efficient federated learning framework for heterogeneous mobile clients. In Proceedings of the 27th Annual International Conference on Mobile Computing and Networking, 420-437.
李，A.；孙，J.；李，P.；普，Y.；李，H.；陈，Y. 2021 年。Hermes：一种用于异构移动客户端的高效联邦学习框架。在第 27 届国际移动计算与网络会议论文集中，420-437 页。

Li, H.; Kadav, A.; Durdanovic, I.; Samet, H.; and Graf, H. P. 2016. Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710.
李，H.；卡达夫，A.；杜尔达诺维奇，I.；萨梅特，H.；和格拉夫，H. 2016 年。用于高效卷积神经网络的滤波器修剪。arXiv 预印本 arXiv:1608.08710。

Li, T.; Sahu, A. K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; and Smith, V. 2020. Federated optimization in heterogeneous networks. Proceedings of Machine Learning and Systems, 2: 429-450.
李，T.；萨胡，A. K.；扎希尔，M.；桑贾比，M.；塔尔瓦卡尔，A.；史密斯，V. 2020 年。异构网络中的联邦优化。机器学习与系统会议论文集，2：429-450。

Liu, Z.; Li, J.; Shen, Z.; Huang, G.; Yan, S.; and Zhang, C. 2017. Learning efficient convolutional networks through network slimming. In Proceedings of the IEEE international conference on computer vision, 2736-2744.
刘，李，沈，黄，严，和张。2017 年。通过网络瘦身学习高效的卷积网络。在计算机视觉国际会议的论文集中，2736-2744 页。

Long, M.; Zhu, H.; Wang, J.; and Jordan, M. I. 2017. Deep transfer learning with joint adaptation networks. In International conference on machine learning, 2208-2217. PMLR.
长，M.；朱，H.；王，J.；和乔丹，M. I. 2017. 具有联合适应网络的深度迁移学习。在机器学习国际会议上，2208-2217。PMLR。

Luo, M.; Chen, F.; Hu, D.; Zhang, Y.; Liang, J.; and Feng, J. 2021. No Fear of Heterogeneity: Classifier Calibration for Federated Learning with Non-IID Data. In Ranzato, M.; Beygelzimer, A.; Dauphin, Y.; Liang, P.; and Vaughan, J. W., eds., Advances in Neural Information Processing Systems, volume 34, 5972-5984. Curran Associates, Inc.
罗，M.；陈，F.；胡，D.；张，Y.；梁，J.；和冯，J. 2021 年。不惧异质性：针对非独立同分布数据的联邦学习分类器校准。在 Ranzato，M.；Beygelzimer，A.；Dauphin，Y.；Liang，P.；和 Vaughan，J. W.，编辑，《神经信息处理系统进展》，卷 34，5972-5984 页。Curran Associates，Inc.

Ma, J.; Zhao, Z.; Chen, J.; Li, A.; Hong, L.; and Chi, E. H. 2019. Snr: Sub-network routing for flexible parameter sharing in multi-task learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, 216-223.
马，J.；赵，Z.；陈，J.；李，A.；洪，L.；和池，E. H. 2019 年。Snr：多任务学习中灵活参数共享的子网络路由。在人工智能 AAAI 会议论文集中的论文，卷 33，216-223 页。

McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; and y Arcas, B. A. 2017. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, 1273-1282. PMLR.
麦克马汉，B.；摩尔，E.；拉马奇，D.；汉普森，S.；以及阿卡斯，B. A. 2017 年。从分散数据中高效学习深度网络的通信方法。在人工智能和统计学中，1273-1282 页。PMLR。

Netzer, Y.; Wang, T.; Coates, A.; Bissacco, A.; Wu, B.; and Ng, A. Y. 2011. Reading Digits in Natural Images with Unsupervised Feature Learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011.
Netzer, Y.; Wang, T.; Coates, A.; Bissacco, A.; Wu, B.; and Ng, A. Y. 2011. 用无监督特征学习在自然图像中读取数字。在 2011 年 NIPS 深度学习和无监督特征学习研讨会上。

Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M. S.; Berg, A. C.; and Li, F. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, 115(3): 211-252.
Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M. S.; Berg, A. C.; and Li, F. 2015. ImageNet 大规模视觉识别挑战。国际计算机视觉杂志，115（3）：211-252。

Tan, Y.; Liu, Y.; Long, G.; Jiang, J.; Lu, Q.; and Zhang, C. 2023. Federated learning on non-iid graphs via structural knowledge sharing. In Proceedings of the AAAI conference on artificial intelligence, volume 37, 9953-9961.
谭，Y.；刘，Y.；龙，G.；江，J.；卢，Q.；和张，C. 2023 年。通过结构知识共享在非独立同分布图上进行联邦学习。在人工智能 AAAI 会议论文集中的论文，卷 37，9953-9961。

Tu, L.; Ouyang, X.; Zhou, J.; He, Y.; and Xing, G. 2021. Feddl: Federated learning via dynamic layer sharing for human activity recognition. In Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems, 1528.
2021 年，Tu, L.; Ouyang, X.; Zhou, J.; He, Y.; 和 Xing, G. 提出了 Feddl: 基于动态层共享的联邦学习用于人体活动识别。在第 19 届 ACM 嵌入式网络传感器系统会议上发表。

Wallingford, M.; Li, H.; Achille, A.; Ravichandran, A.; Fowlkes, C.; Bhotika, R.; and Soatto, S. 2022. Task adaptive parameter sharing for multi-task learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7561-7570.
华灵福德，李华，阿奇尔，拉维钱德兰，福尔克斯，布蒂卡和索托。2022 年。多任务学习的任务自适应参数共享。在 IEEE/CVF 计算机视觉和模式识别会议论文集中，7561-7570 页。

Wang, H.; Yurochkin, M.; Sun, Y.; Papailiopoulos, D. S.; and Khazaeni, Y. 2020a. Federated Learning with Matched Averaging. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net.
王，H.；Yurochkin，M.；Sun，Y.；Papailiopoulos，D. S.；和 Khazaeni，Y. 2020a. 匹配平均的联邦学习。在第 8 届国际学习表示会议 ICLR 2020 中，2020 年 4 月 26 日至 30 日，埃塞俄比亚，亚的斯亚贝巴。OpenReview.net。

Wang, J.; Liu, Q.; Liang, H.; Joshi, G.; and Poor, H. V. 2020b. Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization. In Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M.; and Lin, H., eds., Advances in Neural Information Processing Systems, volume 33, 7611-7623. Curran Associates, Inc.
王，J.；刘，Q.；梁，H.；Joshi，G.；和 Poor，H.V. 2020b. 解决异构联邦优化中的目标不一致问题。在 Larochelle，H.；Ranzato，M.；Hadsell，R.；Balcan，M.；和 Lin，H.，编辑，神经信息处理系统的进展，卷 33，7611-7623。Curran Associates，Inc.

Weiss, K.; Khoshgoftaar, T. M.; and Wang, D. 2016. A survey of transfer learning. Journal of Big data, 3(1): 1-40.
魏斯，K.；Khoshgoftaar，T. M.；和王，D. 2016 年。迁移学习综述。大数据杂志，3（1）：1-40。

Xiao, H.; Rasul, K.; and Vollgraf, R. 2017. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747.
小，H.；拉苏尔，K.；和沃尔格拉夫，R. 2017 年。Fashion-mnist：用于基准机器学习算法的新型图像数据集。arXiv 预印本 arXiv:1708.07747。

Ye, H.; Zhang, B.; Chen, T.; Fan, J.; and Wang, B. 2023. Performance-aware Approximation of Global Channel Pruning for Multitask CNNs. IEEE Transactions on Pattern Analysis and Machine Intelligence.
叶，H.；张，B.；陈，T.；范，J.；王，B. 2023 年。面向性能的多任务 CNN 全局通道剪枝的近似方法。IEEE 模式分析与机器智能交易。

Yosinski, J.; Clune, J.; Bengio, Y.; and Lipson, H. 2014. How transferable are features in deep neural networks? In Ghahramani, Z.; Welling, M.; Cortes, C.; Lawrence, N.; and Weinberger, K., eds., Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc.
Yosinski, J.; Clune, J.; Bengio, Y.; and Lipson, H. 2014. 深度神经网络中的特征有多大的可迁移性？在 Ghahramani, Z.; Welling, M.; Cortes, C.; Lawrence, N.; 和 Weinberger, K. 编辑的《神经信息处理系统进展》第 27 卷。Curran Associates, Inc.

Yu, S.; Nguyen, P.; Abebe, W.; Qian, W.; Anwar, A.; and Jannesari, A. 2022. SPATL: salient parameter aggregation and transfer learning for heterogeneous federated learning. In 2022 SC22: International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 495-508. IEEE Computer Society.
于，S.；阮，P.；阿贝贝，W.；钱，W.；安瓦尔，A.；和贾内萨里，A. 2022 年。SPATL：异构联邦学习的显著参数聚合和迁移学习。在 2022 年 SC22：高性能计算、网络、存储和分析国际会议（SC）中，495-508 页。IEEE 计算机学会。

Zhang, J.; Hua, Y.; Wang, H.; Song, T.; Xue, Z.; Ma, R.; and Guan, H. 2023. Fedala: Adaptive local aggregation for personalized federated learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, 1123711244 .
张，J.；华，Y.；王，H.；宋，T.；薛，Z.；马，R.；和关，H. 2023 年。Fedala：个性化联邦学习的自适应本地聚合。在人工智能 AAAI 会议论文集中，卷 37，1123711244。

Zhang, X.; Zhou, X.; Lin, M.; and Sun, J. 2018. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE conference on computer vision and pattern recognition, 6848-6856.
张，X.；周，X.；林，M.；孙，J. 2018 年。Shufflenet：一种极其高效的用于移动设备的卷积神经网络。在计算机视觉和模式识别 IEEE 会议论文集中，6848-6856 页。

Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H.; Xiong, H.; and He, Q. 2020. A comprehensive survey on transfer learning. Proceedings of the IEEE, 109(1): 43-76.
庄，F.；齐，Z.；段，K.；席，D.；朱，Y.；朱，H.；熊，H.；和何，Q. 2020 年。关于迁移学习的综合调查。IEEE 会议论文集，109（1）：43-76。

In this work, we make the fundamental assumption that the model structure remains consistent across all tasks, and this work does not focus on addressing the issue of model heterogeneity.
在这项工作中，我们基本假设模型结构在所有任务中保持一致，而这项工作不着重解决模型异质性的问题。

For presentation convenience, we omit the superscript when describing local operation of each client.
为了方便演示，我们在描述每个客户端的本地操作时省略了上标。
It is noteworthy that these updated weights differ from the initial predictor, despite using the same symbol.
值得注意的是，尽管使用相同的符号，但这些更新的权重与初始预测器不同。

FedLPS: Heterogeneous Federated Learning for Multiple Tasks with Local Parameter Sharing FedLPS：具有本地参数共享的多任务异构联邦学习

Abstract 摘要

Introduction 介绍

Related Work 相关工作

Heterogeneous Federated Learning异构联邦学习

Transfer Learning 迁移学习

Model Pruning 模型修剪

Design of FedLPS FedLPS 的设计

Overview 概述

Local Parameter Sharing across Multiple Tasks本地参数在多个任务之间共享

Adaptive Channel-wise Model Pruning自适应逐通道模型剪枝

Heterogeneous Predictor Aggregation异构预测器聚合

Experimental Evaluation 实验评估

Experimental Setting 实验设置

Learning Performance across Multiple Tasks学习表现在多个任务中的表现

Effect of Pruning Ratios修剪比例的影响

Effect of Local Parameter Sharing本地参数共享的效果

Conclusion 结论

Acknowledgments 致谢

References 参考文献

FedLPS: Heterogeneous Federated Learning for Multiple Tasks with Local Parameter Sharing
FedLPS：具有本地参数共享的多任务异构联邦学习

Heterogeneous Federated Learning
异构联邦学习

Local Parameter Sharing across Multiple Tasks
本地参数在多个任务之间共享

Adaptive Channel-wise Model Pruning
自适应逐通道模型剪枝

Heterogeneous Predictor Aggregation
异构预测器聚合

Learning Performance across Multiple Tasks
学习表现在多个任务中的表现

Effect of Pruning Ratios
修剪比例的影响

Effect of Local Parameter Sharing
本地参数共享的效果