这是用户在 2024-3-11 23:36 为 https://ar5iv.labs.arxiv.org/html/1906.07927?_immersive_translate_auto_translate=1 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?
11institutetext: The Chinese University of Hong Kong, Shenzhen
香港中文大学(深圳)
22institutetext: University of Michigan, Ann Arbor
密西根大学,安娜堡
33institutetext: The Chinese University of Hong Kong
香港中文大学
44institutetext: University of Illinois Urbana-Champaign
伊利诺伊大学厄巴纳-香槟分校
55institutetext: Uber ATG, San Francisco
优步ATG,旧金山

SemanticAdv: Generating Adversarial Examples
via Attribute-conditioned Image Editing
通过属性条件图像编辑生成对抗样本

Haonan Qiu  邱浩楠Alphabetical ordering; The first three authors contributed equally.11
按字母顺序排列;前三位作者贡献相同。11
   Chaowei Xiao*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT 肖朝伟 *{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT 22    Lei Yang*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT 雷扬 *{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT 33    Xinchen Yan  严欣宸Work partially done as a PhD student at University of Michigan.2255
密歇根大学博士生部分工作完成。2255
  
Honglak Lee  李洪樂
22
   Bo Li  玻璃44
Abstract 摘要

Deep neural networks (DNNs) have achieved great successes in various vision applications due to their strong expressive power. However, recent studies have shown that DNNs are vulnerable to adversarial examples which are manipulated instances targeting to mislead DNNs to make incorrect predictions. Currently, most such adversarial examples try to guarantee “subtle perturbation” by limiting the Lpsubscript𝐿𝑝L_{p}italic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT norm of the perturbation. In this paper, we propose SemanticAdv to generate a new type of semantically realistic adversarial examples via attribute-conditioned image editing. Compared to existing methods, our SemanticAdv enables fine-grained analysis and evaluation of DNNs with input variations in the attribute space. We conduct comprehensive experiments to show that our adversarial examples not only exhibit semantically meaningful appearances but also achieve high targeted attack success rates under both whitebox and blackbox settings. Moreover, we show that the existing pixel-based and attribute-based defense methods fail to defend against SemanticAdv. We demonstrate the applicability of SemanticAdv on both face recognition and general street-view images to show its generalization. Such non-Lpsubscript𝐿𝑝L_{p}italic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT bounded adversarial examples with controlled attribute manipulation can shed light on further understanding about vulnerabilities of DNNs as well as novel defense approaches.
深度神经网络(DNNs)由于其强大的表达能力在各种视觉应用中取得了巨大成功。然而,最近的研究表明,DNNs对抗性示例很容易受到攻击,这些被操纵的实例旨在误导DNNs做出不正确的预测。目前,大多数此类对抗性示例尝试通过限制扰动的范数来保证“微妙的扰动”。在本文中,我们提出SemanticAdv,通过属性条件图像编辑生成一种新的语义逼真的对抗性示例。与现有方法相比,我们的SemanticAdv能够在属性空间中具有输入变化的细粒度分析和评估DNNs。我们开展了全面的实验,展示了我们的对抗性示例不仅表现出语义上有意义的外观,而且在白盒和黑盒环境下均实现了高目标攻击成功率。此外,我们证明现有基于像素和属性的防御方法无法抵御SemanticAdv。我们展示了SemanticAdv在人脸识别和一般街景图像上的适用性以展示其泛化能力。 具有受控属性操纵的非有界对抗样本可以帮助进一步了解深度神经网络的漏洞以及新颖的防御方法。

1 Introduction 1介绍

Deep neural networks (DNNs) have demonstrated great successes in advancing the state-of-the-art performance in various vision tasks [36, 61, 64, 23, 59, 41, 77, 11] and have been widely used in many safety-critical applications such as face verification and autonomous driving [79]. At the same time, several studies [65, 21, 45, 51, 10, 71, 72, 70] have revealed the vulnerablity of DNNs against input variations. For example, carefully crafted Lpsubscript𝐿𝑝L_{p}italic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT bounded perturbations added to the pristine input images can introduce arbitrary prediction errors during testing time. While being visually imperceptible, Lpsubscript𝐿𝑝L_{p}italic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT bounded adversarial attacks have certain limitations as they only capture the variations in the raw pixel space and cannot guarantee the semantic realism for the generated instances. Recent works [72, 30, 69] have shown the limitations of only measuring and evaluating the Lpsubscript𝐿𝑝L_{p}italic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT bounded perturbation (e.g., cannot handle variations in lighting conditions). Therefore, understanding the failure modes of deep neural networks beyond raw pixel variations including semantic perturbations requires further understanding and exploration.
深度神经网络(DNNs)已经在多种视觉任务中取得了巨大成功,推动了最先进性能的发展,并被广泛应用于许多安全关键应用,如人脸验证和自动驾驶。与此同时,几项研究揭示了DNN对输入变化的脆弱性。例如,精心设计的边界扰动添加到原始输入图像中会在测试时引入任意的预测错误。虽然在视觉上无法察觉,但边界对抗攻击有一定的局限性,因为它们仅捕捉原始像素空间的变化,不能保证生成实例的语义真实性。最近的研究显示,仅计算和评估边界扰动的局限性(如无法处理光照条件的变化)。因此,理解深度神经网络在原始像素变化之外的故障模式,包括语义扰动,需要进一步的理解和探索。

In this work, we focus on studying how DNNs respond towards semantically meaningful perturbations in the visual attribute space. In the visual recognition literature, visual attributes [19, 37, 52] are properties observable in images that have human-designated properties (e.g., black hair and blonde hair). As illustrated in Figure 1 (left), given an input image with known attributes, we would like to craft semantically meaningful (attribute-conditioned) adversarial examples via image editing along a single attribute or a subset of attributes while keeping the rest unchanged. Compared to traditional Lpsubscript𝐿𝑝L_{p}italic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT bounded adversarial perturbations or semantic perturbations on global color and texture [5], such attribute-based image editing enables the users to conduct a fine-grained analysis and evaluation of the DNN models through removing one or a set of visual aspects or adding one object into the scene. We believe our attribute-conditioned image editing is a natural way of introducing semantic perturbations, and it preserves clear interpretability as: wearing a new pair of glasses or having the hair dyed with a different color.
在这项工作中,我们专注于研究深度神经网络对视觉属性空间中语义相关扰动的响应。在视觉识别文献中,视觉属性[19, 37, 52]是可在图像中观察到的具有人为属性的特征(例如黑发和金发)。如左图所示,针对具有已知属性的输入图像,我们希望通过沿着单个属性或属性子集进行图像编辑,制作语义相关(属性条件下的)对抗样本,同时保持其他部分不变。与传统的有界对抗扰动或全局颜色和纹理的语义扰动相比[5],这种基于属性的图像编辑使用户能够通过删除一个或一组视觉方面或在场景中添加一个对象来进行细粒度的分析和评估DNN模型。我们相信我们的属性条件下的图像编辑是引入语义扰动的一种自然方式,并保留清晰可解释性,例如:戴上一副新眼镜或将头发染成不同颜色。

Refer to caption
Figure 1: Pipeline of SemanticAdv Left: Each row shows a pair of images differ in only one semantic aspect. One of them is sampled from the ground-truth dataset, while the other one is created by our conditional image generator, which is adversary to the recognition model (e.g., face identification network and semantic segmentation network). Right: Overview of the proposed attribute-conditioned SemanticAdv against the face identity verification model
图1:SemanticAdv的流程。左侧:每行显示一对图片,仅在一个语义方面有所不同。其中一张是从真实数据集中采样的,而另一张是由我们的有条件图像生成器创建的,该生成器对识别模型具有对抗性(例如,人脸识别网络和语义分割网络)。右侧:属性条件下的SemanticAdv对抗人脸身份验证模型的概述。

To facilitate the generation of semantic adversarial perturbations along a single attribute dimension, we take advantage of the disentangled representation in deep image generative models [55, 31, 6, 75, 12, 3, 76, 28]. Such disentangled representation allows us to explore the variations for a specific semantic factor while keeping the other factors unchanged. As illustrated in Figure 1 (right), we first leverage an attribute-conditioned image editing model [12] to construct a new instance which is very similar to the source except one semantic aspect (the source image is given as input). Given such pair of images, we synthesize the adversarial example by interpolating between the pair of images in the feature-map space. As the interpolation is constrained by the image pairs, the appearance of the resulting semantic adversarial example resembles both of them.
为了促进沿着单一属性维度生成语义对抗性扰动,我们利用深度图像生成模型中的分解表示。这种分解表示使我们能够探索特定语义因素的变化,同时保持其他因素不变。如右图1所示,我们首先利用属性条件的图像编辑模型来构建一个新实例,该实例与源图像非常相似,只有一个语义方面不同(源图像作为输入提供)。在给定这对图像的情况下,我们通过在特征映射空间中插值生成对抗样本。由于插值受到图像对的约束,因此生成的语义对抗样本的外观类似于它们两者。

To validate the effectiveness of our proposed SemanticAdv by attribute-conditioned image editing, we consider two real-world tasks, including face verification and landmark detection. We conduct both qualitative and quantitative evaluations on CelebA dataset [40]. The results show that our SemanticAdv not only achieves high targeted attack success rate and also preserves the semantic meaning of the corresponding input images. To further demonstrate the applicability of our SemanticAdv beyond face domain, we extend the framework to generate adversarial street-view images. We treat semantic layouts as input attributes and use the layout-conditioned image editing model [24] pre-trained on Cityscape dataset [14]. Our results show that a well-trained semantic segmentation model can be successfully attacked to neglect the pedestrian if we insert another object by the side using our image editing model. In addition, we show that existing adversarial training-based defense method is less effective against our attack method, which motivates further defense strategies against such semantic adversarial examples.
验证我们提出的SemanticAdv在属性条件图像编辑中的有效性,考虑两个现实世界任务,包括人脸验证和地标检测。我们在CelebA数据集上进行定性和定量评估。结果表明我们的SemanticAdv不仅实现了较高的目标攻击成功率,还保留了对应输入图像的语义含义。为进一步展示我们的SemanticAdv在人脸领域以外的适用性,我们扩展该框架以生成对抗街景图像。我们将语义布局视为输入属性,并使用在Cityscape数据集上预训练的布局条件图像编辑模型。结果显示,一个训练良好的语义分割模型可以通过我们的图像编辑模型成功攻击忽视行人,如果我们通过侧面插入另一个物体。此外,我们展示现有基于对抗训练的防御方法对我们的攻击方法不太有效,这促使进一步针对这种语义对抗示例的防御策略。

Our contributions are summarized as follows: (1) We propose a novel method SemanticAdv to generate semantically meaningful adversarial examples via attribute-conditioned image editing based on feature-space interpolation. Compared to existing adversarial attacks, our method enables fine-grained attribute analysis as well as further evaluation of vulnerabilities for DNN models. Such semantic adversarial examples also provide explainable analysis for different attributes in terms of their robustness and editing flexibility. (2) We conduct extensive experiments and show that the proposed feature-space interpolation strategy can generate high quality attribute-conditioned adversarial examples more effectively than the simple attribute-space interpolation. Additionally, our SemanticAdv exhibits high attack transferability as well as 67.7% query-free black-box attack success rate on a real-world face verification platform. (3) We empirically show that, compared to Lpsubscript𝐿𝑝L_{p}italic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT attacks, the existing per-pixel based as well as attribute-based defense methods fail to defend against our SemanticAdv, which indicates that such semantic adversarial examples identify certain unexplored vulnerable landscape of DNNs. (4) To demonstrate the applicability and generalization of SemanticAdv beyond the face recognition domain, we extend the framework to generate adversarial street-view images that fool semantic segmentation models effectively.
我们的贡献总结如下:(1) 我们提出了一种新颖的方法SemanticAdv,通过基于特征空间插值的属性条件图像编辑来生成语义上有意义的对抗样本。与现有的对抗攻击相比,我们的方法能够进行细粒度的属性分析,并进一步评估深度神经网络模型的漏洞性。这种语义对抗样本还可以提供关于不同属性的鲁棒性和编辑灵活性的可解释分析。(2) 我们进行了大量实验证明,所提出的特征空间插值策略能够比简单的属性空间插值更有效地生成高质量的属性条件对抗样本。此外,我们的SemanticAdv在真实的人脸验证平台上表现出高攻击可传递性,以及67.7%的无查询黑盒攻击成功率。 我们通过实证表明,与基于像素的攻击相比,现有的基于像素和属性的防御方法无法抵御我们的SemanticAdv,这表明这样的语义对抗样本识别出DNNs中某些未被探索的易受攻击的风险领域。为了展示SemanticAdv在超越人脸识别领域的适用性和泛化能力,我们将这个框架扩展到生成欺骗语义分割模型的对抗街景图像。

2 Related Work 相关工作

Semantic image editing. 语义图像编辑。

Semantic image synthesis and manipulation is a popular research topic in machine learning, graphics and vision. Thanks to recent advances in deep generative models [34, 20, 50] and the empirical analysis of deep classification networks [36, 61, 64], past few years have witnessed tremendous breakthroughs towards high-fidelity pure image generation [55, 31, 6], attribute-to-image generation [75, 12], text-to-image generation [44, 56, 49, 48, 78, 28], and image-to-image translation [26, 81, 39, 68, 24].
语义图像合成和操作是机器学习、图形学和视觉领域中的研究热点。由于深度生成模型的最新进展以及对深度分类网络的经验分析,过去几年已经取得了对于高保真纯图像生成、属性到图像生成、文本到图像生成以及图像到图像翻译等方面巨大的突破。

Adversarial examples. 对抗样本。

Generating Lpsubscript𝐿𝑝L_{p}italic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT bounded adversarial perturbation has been extensively studied recently [65, 21, 45, 51, 10, 71]. To further explore diverse adversarial attacks and potentially help inspire defense mechanisms, it is important to generate the so-called “unrestricted” adversarial examples which contain unrestricted magnitude of perturbation while still preserve perceptual realism [7]. Recently, [72, 18] propose to spatially transform the image patches instead of adding pixel-wise perturbation, while such spatial transformation does not consider semantic information. Our proposed semanticAdv focuses on generating unrestricted perturbation with semantically meaningful patterns guided by visual attributes.
最近,生成 Lpsubscript𝐿𝑝L_{p}italic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT 受限制的对抗性扰动已被广泛研究[65, 21, 45, 51, 10, 71]。为了进一步探索多样的对抗攻击并潜在地帮助启发防御机制,生成所谓的“无限制”对抗性示例至关重要,这些示例包含无限制的扰动幅度,同时仍保持知觉逼真性[7]。最近,[72, 18]提出对图像补丁进行空间变换,而不是添加逐像素扰动,然而,这种空间变换不考虑语义信息。我们提出的semanticAdv专注于生成具有语义上有意义的模式的无限制扰动,该扰动由视觉属性指导。

Relevant to our work, [62] proposed to synthesize adversarial examples with an unconditional generative model. [5] studied semantic transformation in only the color or texture space. Compared to these works, semanticAdv is able to generate adversarial examples in a controllable fashion using specific visual attributes by performing manipulation in the feature space. We further analyze the robustness of the recognition system by generating adversarial examples guided by different visual attributes. Concurrent to our work, [29] proposed to generate semantic-based attacks against a restricted binary classifier, while our attack is able to mislead the model towards arbitrary adversarial targets. They conduct the manipulation within the attribution space which is less flexible and effective than our proposed feature-space interpolation.
与我们的工作相关,[62]提出使用无条件生成模型合成对抗样本。[5]仅在颜色或纹理空间中研究语义转换。与这些工作相比,semanticAdv能够通过在特征空间中进行操作,以特定的视觉属性以可控方式生成对抗样本。我们通过引导不同视觉属性生成对抗样本来进一步分析识别系统的鲁棒性。与我们的工作同时进行,[29]提出针对受限二元分类器生成基于语义的攻击,而我们的攻击能够将模型引向任意对抗目标。他们在属性空间内进行操作,比我们提出的特征空间插值方法不够灵活和有效。

3 SemanticAdv 3语义优势

3.1 Problem Definition 问题定义

Let \mathcal{M}caligraphic_M be a machine learning model trained on a dataset 𝒟={(𝐱,𝐲)}𝒟𝐱𝐲\mathcal{D}=\left\{(\mathbf{x},\mathbf{y})\right\}caligraphic_D = { ( bold_x , bold_y ) } consisting of image-label pairs, where 𝐱H×W×DI𝐱superscript𝐻𝑊subscript𝐷𝐼\mathbf{x}\in\mathbb{R}^{H\times W\times D_{I}}bold_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_H × italic_W × italic_D start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and 𝐲DL𝐲superscriptsubscript𝐷𝐿\mathbf{y}\in\mathbb{R}^{D_{L}}bold_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_D start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT end_POSTSUPERSCRIPT denote the image and the ground-truth label, respectively. Here, H𝐻Hitalic_H, W𝑊Witalic_W, DIsubscript𝐷𝐼D_{I}italic_D start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT, and DLsubscript𝐷𝐿D_{L}italic_D start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT denote the image height, image width, number of image channels, and label dimensions, respectively. For each image 𝐱𝐱\mathbf{x}bold_x, our model \mathcal{M}caligraphic_M makes a prediction 𝐲^=(𝐱)DL^𝐲𝐱superscriptsubscript𝐷𝐿\hat{\mathbf{y}}=\mathcal{M}(\mathbf{x})\in\mathbb{R}^{D_{L}}over^ start_ARG bold_y end_ARG = caligraphic_M ( bold_x ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_D start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. Given a target image-label pair (𝐱tgt,𝐲tgt)superscript𝐱tgtsuperscript𝐲tgt(\mathbf{x}^{\text{tgt}},\mathbf{y}^{\text{tgt}})( bold_x start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT , bold_y start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT ) and 𝐲𝐲tgt𝐲superscript𝐲tgt\mathbf{y}\neq\mathbf{y}^{\text{tgt}}bold_y ≠ bold_y start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT, a traditional attacker aims to synthesize adversarial examples 𝐱advsuperscript𝐱adv\mathbf{x}^{\text{adv}}bold_x start_POSTSUPERSCRIPT adv end_POSTSUPERSCRIPT by adding pixel-wise perturbations to or spatially transforming the original image 𝐱𝐱\mathbf{x}bold_x such that (𝐱adv)=𝐲tgtsuperscript𝐱advsuperscript𝐲tgt\mathcal{M}(\mathbf{x}^{\text{adv}})=\mathbf{y}^{\text{tgt}}caligraphic_M ( bold_x start_POSTSUPERSCRIPT adv end_POSTSUPERSCRIPT ) = bold_y start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT. In this work, we consider a semantic attacker that generates semantically meaningful perturbation via attribute-conditioned image editing with a conditional generative model 𝒢𝒢\mathcal{G}caligraphic_G. Compared to the traditional attacker, the proposed attack method generates adversarial examples in a more controllable fashion by editing a single semantic aspect through attribute-conditioned image editing.
假设 \mathcal{M}caligraphic_M 是一个机器学习模型,该模型在由图像-标签对组成的数据集 𝒟={(𝐱,𝐲)}𝒟𝐱𝐲\mathcal{D}=\left\{(\mathbf{x},\mathbf{y})\right\}caligraphic_D = { ( bold_x , bold_y ) } 上进行训练,其中 𝐱H×W×DI𝐱superscript𝐻𝑊subscript𝐷𝐼\mathbf{x}\in\mathbb{R}^{H\times W\times D_{I}}bold_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_H × italic_W × italic_D start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT end_POSTSUPERSCRIPT𝐲DL𝐲superscriptsubscript𝐷𝐿\mathbf{y}\in\mathbb{R}^{D_{L}}bold_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_D start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT end_POSTSUPERSCRIPT 分别表示图像和地面真实标签。这里, H𝐻Hitalic_HW𝑊Witalic_WDIsubscript𝐷𝐼D_{I}italic_D start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPTDLsubscript𝐷𝐿D_{L}italic_D start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT 分别表示图像的高度、宽度、图像通道数量和标签维度。对于每个图像 𝐱𝐱\mathbf{x}bold_x ,我们的模型 \mathcal{M}caligraphic_M 进行预测 𝐲^=(𝐱)DL^𝐲𝐱superscriptsubscript𝐷𝐿\hat{\mathbf{y}}=\mathcal{M}(\mathbf{x})\in\mathbb{R}^{D_{L}}over^ start_ARG bold_y end_ARG = caligraphic_M ( bold_x ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_D start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT end_POSTSUPERSCRIPT 。给定目标图像-标签对 (𝐱tgt,𝐲tgt)superscript𝐱tgtsuperscript𝐲tgt(\mathbf{x}^{\text{tgt}},\mathbf{y}^{\text{tgt}})( bold_x start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT , bold_y start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT )𝐲𝐲tgt𝐲superscript𝐲tgt\mathbf{y}\neq\mathbf{y}^{\text{tgt}}bold_y ≠ bold_y start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT ,传统的攻击者旨在通过向原始图像 𝐱𝐱\mathbf{x}bold_x 添加像素级扰动或对其进行空间变换来合成对抗性示例 𝐱advsuperscript𝐱adv\mathbf{x}^{\text{adv}}bold_x start_POSTSUPERSCRIPT adv end_POSTSUPERSCRIPT ,使得 (𝐱adv)=𝐲tgtsuperscript𝐱advsuperscript𝐲tgt\mathcal{M}(\mathbf{x}^{\text{adv}})=\mathbf{y}^{\text{tgt}}caligraphic_M ( bold_x start_POSTSUPERSCRIPT adv end_POSTSUPERSCRIPT ) = bold_y start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT 。在这项工作中,我们考虑一种语义攻击者,通过属性条件图像编辑使用条件生成模型 𝒢𝒢\mathcal{G}caligraphic_G 生成语义上有意义的扰动。与传统攻击者相比,提出的攻击方法通过属性条件图像编辑以更可控的方式生成对抗性示例,通过编辑单个语义方面。

3.2 Attribute-conditioned Image Editing
3.2基于属性的图像编辑

In order to produce semantically meaningful perturbations, we first introduce how to synthesize attribute-conditioned images through interpolation.
为了产生有语义意义的扰动,我们首先介绍如何通过插值合成属性条件图像。

Semantic image editing. 语义图像编辑。

For simplicity, we start with the formulation where the input attribute is represented as a compact vector. This formulation can be directly extended to other input attribute formats including semantic layouts. Let 𝐜DC𝐜superscriptsubscript𝐷𝐶\mathbf{c}\in\mathbb{R}^{D_{C}}bold_c ∈ blackboard_R start_POSTSUPERSCRIPT italic_D start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT end_POSTSUPERSCRIPT be an attribute representation reflecting the semantic factors (e.g., expression or hair color of a portrait image) of image 𝐱𝐱\mathbf{x}bold_x, where DCsubscript𝐷𝐶D_{C}italic_D start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT indicates the attribute dimension and ci{0,1}subscript𝑐𝑖01c_{i}\in\{0,1\}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ { 0 , 1 } indicates the existence of i𝑖iitalic_i-th attribute. We are interested in performing semantic image editing using the attribute-conditioned image generator 𝒢𝒢\mathcal{G}caligraphic_G. For example, given a portrait image of a girl with black hair and the new attribute blonde hair, our generator is supposed to synthesize a new image that turns the girl’s hair color from black to blonde while keeping the rest of appearance unchanged. The synthesized image is denoted as 𝐱new=𝒢(𝐱,𝐜new)superscript𝐱new𝒢𝐱superscript𝐜new\mathbf{x}^{\text{new}}=\mathcal{G}(\mathbf{x},\mathbf{c}^{\text{new}})bold_x start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT = caligraphic_G ( bold_x , bold_c start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT ) where 𝐜newDCsuperscript𝐜newsuperscriptsubscript𝐷𝐶\mathbf{c}^{\text{new}}\in\mathbb{R}^{D_{C}}bold_c start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_D start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is the new attribute. In the special case when there is no attribute change (𝐜=𝐜new𝐜superscript𝐜new\mathbf{c}=\mathbf{c}^{\text{new}}bold_c = bold_c start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT), the generator simply reconstructs the input: 𝐱=𝒢(𝐱,𝐜)superscript𝐱𝒢𝐱𝐜\mathbf{x^{\prime}}=\mathcal{G}(\mathbf{x},\mathbf{c})bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = caligraphic_G ( bold_x , bold_c ) (ideally, we hope 𝐱superscript𝐱\mathbf{x^{\prime}}bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT equals to 𝐱𝐱\mathbf{x}bold_x). As our attribute representation is disentangled and the change of attribute value is sufficiently small (e.g., we only edit a single semantic attribute), our synthesized image 𝐱newsuperscript𝐱new\mathbf{x}^{\text{new}}bold_x start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT is expected to be close to the data manifold [4, 57, 55]. In addition, we can generate many similar images by linearly interpolating between the image pair 𝐱𝐱\mathbf{x}bold_x and 𝐱newsuperscript𝐱new\mathbf{x}^{\text{new}}bold_x start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT in the attribute-space or the feature-space of the image-conditioned generator 𝒢𝒢\mathcal{G}caligraphic_G, which is supported by the previous work [75, 55, 3]
为简便起见,我们从将输入属性表示为紧凑向量的公式开始。这种公式可以直接扩展到其他输入属性格式,包括语义布局。 设 𝐜DC𝐜superscriptsubscript𝐷𝐶\mathbf{c}\in\mathbb{R}^{D_{C}}bold_c ∈ blackboard_R start_POSTSUPERSCRIPT italic_D start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT end_POSTSUPERSCRIPT 是反映图像 𝐱𝐱\mathbf{x}bold_x 的语义因素(例如肖像图像的表情或头发颜色)的属性表示,其中 DCsubscript𝐷𝐶D_{C}italic_D start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT 表示属性维度, ci{0,1}subscript𝑐𝑖01c_{i}\in\{0,1\}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ { 0 , 1 } 表示属性的存在。我们有兴趣使用属性条件的图像生成器 𝒢𝒢\mathcal{G}caligraphic_G 执行语义图像编辑。例如,给定一幅黑发女孩的肖像图像和新属性金发,我们的生成器应该合成一幅新图像,将女孩的头发颜色从黑色变为金色,同时保持其余外观不变。合成图像表示为 𝐱new=𝒢(𝐱,𝐜new)superscript𝐱new𝒢𝐱superscript𝐜new\mathbf{x}^{\text{new}}=\mathcal{G}(\mathbf{x},\mathbf{c}^{\text{new}})bold_x start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT = caligraphic_G ( bold_x , bold_c start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT ) ,其中 𝐜newDCsuperscript𝐜newsuperscriptsubscript𝐷𝐶\mathbf{c}^{\text{new}}\in\mathbb{R}^{D_{C}}bold_c start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_D start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT end_POSTSUPERSCRIPT 是新属性。在没有属性变化的特殊情况下( 𝐜=𝐜new𝐜superscript𝐜new\mathbf{c}=\mathbf{c}^{\text{new}}bold_c = bold_c start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT ),生成器仅重建输入: 𝐱=𝒢(𝐱,𝐜)superscript𝐱𝒢𝐱𝐜\mathbf{x^{\prime}}=\mathcal{G}(\mathbf{x},\mathbf{c})bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = caligraphic_G ( bold_x , bold_c ) (理想情况下,我们希望 𝐱superscript𝐱\mathbf{x^{\prime}}bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT 等于 𝐱𝐱\mathbf{x}bold_x )。 由于我们的属性表示是解缰的,属性值的变化足够小(例如,我们只编辑一个语义属性),我们合成的图像 𝐱newsuperscript𝐱new\mathbf{x}^{\text{new}}bold_x start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT 预计会接近数据流形。[4, 57, 55]。此外,我们可以在图像条件生成器的属性空间或特征空间中通过线性插值生成许多与图像对 𝐱𝐱\mathbf{x}bold_x𝐱newsuperscript𝐱new\mathbf{x}^{\text{new}}bold_x start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT 相似的图像,该方法得到了之前研究的支持[75, 55, 3]

Attribute-space interpolation.
属性空间插值。

Given a pair of attributes 𝐜𝐜\mathbf{c}bold_c and 𝐜newsuperscript𝐜new\mathbf{c}^{\text{new}}bold_c start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT, we introduce an interpolation parameter α(0,1)𝛼01\alpha\in(0,1)italic_α ∈ ( 0 , 1 ) to generate the augmented attribute vector 𝐜*DCsuperscript𝐜superscriptsubscript𝐷𝐶\mathbf{c}^{*}\in\mathbb{R}^{D_{C}}bold_c start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_D start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT end_POSTSUPERSCRIPT (see Eq. 1). Given augmented attribute 𝐜*superscript𝐜\mathbf{c}^{*}bold_c start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT and original image 𝐱𝐱\mathbf{x}bold_x, we produce the image 𝐱*superscript𝐱*\mathbf{x}^{\text{*}}bold_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT by the generator 𝒢𝒢\mathcal{G}caligraphic_G through attribute-space interpolation.
给定一对属性 𝐜𝐜\mathbf{c}bold_c𝐜newsuperscript𝐜new\mathbf{c}^{\text{new}}bold_c start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT ,我们引入一个插值参数 α(0,1)𝛼01\alpha\in(0,1)italic_α ∈ ( 0 , 1 ) 来生成增强的属性向量 𝐜*DCsuperscript𝐜superscriptsubscript𝐷𝐶\mathbf{c}^{*}\in\mathbb{R}^{D_{C}}bold_c start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_D start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT end_POSTSUPERSCRIPT (见公式1)。给定增强属性 𝐜*superscript𝐜\mathbf{c}^{*}bold_c start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT 和原始图像 𝐱𝐱\mathbf{x}bold_x ,通过属性空间插值,我们通过生成器 𝒢𝒢\mathcal{G}caligraphic_G 生成图像 𝐱*superscript𝐱*\mathbf{x}^{\text{*}}bold_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT

𝐱*superscript𝐱*\displaystyle\mathbf{x}^{\text{*}}bold_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT =𝒢(𝐱,𝐜*)absent𝒢𝐱superscript𝐜*\displaystyle=\mathcal{G}(\mathbf{x},\mathbf{c}^{\text{*}})= caligraphic_G ( bold_x , bold_c start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT )
𝐜*superscript𝐜*\displaystyle\mathbf{c}^{\text{*}}bold_c start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT =α𝐜+(1α)𝐜new, where α[0,1]absent𝛼𝐜1𝛼superscript𝐜new, where α[0,1]\displaystyle=\alpha\cdot\mathbf{c}+(1-\alpha)\cdot\mathbf{c}^{\text{new}}\text{, where $\alpha\in[0,1]$}= italic_α ⋅ bold_c + ( 1 - italic_α ) ⋅ bold_c start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT , where italic_α ∈ [ 0 , 1 ] (1)

Feature-map interpolation.
特征图插值。

Alternatively, we propose to interpolate using the feature map produced by the generator 𝒢=𝒢dec𝒢enc𝒢subscript𝒢decsubscript𝒢enc\mathcal{G}=\mathcal{G}_{\text{dec}}\circ\mathcal{G}_{\text{enc}}caligraphic_G = caligraphic_G start_POSTSUBSCRIPT dec end_POSTSUBSCRIPT ∘ caligraphic_G start_POSTSUBSCRIPT enc end_POSTSUBSCRIPT. Here, 𝒢encsubscript𝒢enc\mathcal{G}_{\text{enc}}caligraphic_G start_POSTSUBSCRIPT enc end_POSTSUBSCRIPT is the encoder module that takes the image as input and outputs the feature map. Similarly, 𝒢decsubscript𝒢dec\mathcal{G}_{\text{dec}}caligraphic_G start_POSTSUBSCRIPT dec end_POSTSUBSCRIPT is the decoder module that takes the feature map as input and outputs the synthesized image. Let 𝐟*=𝒢enc(𝐱,𝐜)HF×WF×CFsuperscript𝐟subscript𝒢enc𝐱𝐜superscriptsubscript𝐻𝐹subscript𝑊𝐹subscript𝐶𝐹\mathbf{f^{*}}=\mathcal{G}_{\text{enc}}(\mathbf{x},\mathbf{c})\in\mathbb{R}^{H_{F}\times W_{F}\times C_{F}}bold_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = caligraphic_G start_POSTSUBSCRIPT enc end_POSTSUBSCRIPT ( bold_x , bold_c ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_H start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT × italic_W start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT × italic_C start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_POSTSUPERSCRIPT be the feature map of an intermediate layer in the generator, where HFsubscript𝐻𝐹H_{F}italic_H start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT, WFsubscript𝑊𝐹W_{F}italic_W start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT and CFsubscript𝐶𝐹C_{F}italic_C start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT indicate the height, width, and number of channels in the feature map.
另外,我们建议使用生成器生成的特征图进行插值。这里, 𝒢encsubscript𝒢enc\mathcal{G}_{\text{enc}}caligraphic_G start_POSTSUBSCRIPT enc end_POSTSUBSCRIPT 是将图像作为输入并输出特征图的编码器模块。类似地, 𝒢decsubscript𝒢dec\mathcal{G}_{\text{dec}}caligraphic_G start_POSTSUBSCRIPT dec end_POSTSUBSCRIPT 是将特征图作为输入并输出合成图像的解码器模块。设 𝐟*=𝒢enc(𝐱,𝐜)HF×WF×CFsuperscript𝐟subscript𝒢enc𝐱𝐜superscriptsubscript𝐻𝐹subscript𝑊𝐹subscript𝐶𝐹\mathbf{f^{*}}=\mathcal{G}_{\text{enc}}(\mathbf{x},\mathbf{c})\in\mathbb{R}^{H_{F}\times W_{F}\times C_{F}}bold_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = caligraphic_G start_POSTSUBSCRIPT enc end_POSTSUBSCRIPT ( bold_x , bold_c ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_H start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT × italic_W start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT × italic_C start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_POSTSUPERSCRIPT 为生成器中间层的特征图, HFsubscript𝐻𝐹H_{F}italic_H start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPTWFsubscript𝑊𝐹W_{F}italic_W start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPTCFsubscript𝐶𝐹C_{F}italic_C start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT 表示特征图的高度、宽度和通道数。

𝐱*superscript𝐱*\displaystyle\mathbf{x}^{\text{*}}bold_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT =𝒢dec(𝐟*)absentsubscript𝒢decsuperscript𝐟*\displaystyle=\mathcal{G}_{\text{dec}}(\mathbf{f}^{\text{*}})= caligraphic_G start_POSTSUBSCRIPT dec end_POSTSUBSCRIPT ( bold_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT )
𝐟*superscript𝐟*\displaystyle\mathbf{f}^{\text{*}}bold_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT =𝜷𝒢enc(𝐱,𝐜)+(𝟏𝜷)𝒢enc(𝐱,𝐜new)absentdirect-product𝜷subscript𝒢enc𝐱𝐜direct-product1𝜷subscript𝒢enc𝐱superscript𝐜new\displaystyle=\boldsymbol{\beta}\odot\mathcal{G}_{\text{enc}}(\mathbf{x},\mathbf{c})+(\mathbf{1}-\boldsymbol{\beta})\odot\mathcal{G}_{\text{enc}}(\mathbf{x},\mathbf{c}^{\text{new}})= bold_italic_β ⊙ caligraphic_G start_POSTSUBSCRIPT enc end_POSTSUBSCRIPT ( bold_x , bold_c ) + ( bold_1 - bold_italic_β ) ⊙ caligraphic_G start_POSTSUBSCRIPT enc end_POSTSUBSCRIPT ( bold_x , bold_c start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT ) (2)

Compared to the attribute-space interpolation which is parameterized by a scalar α𝛼\alphaitalic_α, we parameterize feature-map interpolation by a tensor 𝜷HF×WF×CF𝜷superscriptsubscript𝐻𝐹subscript𝑊𝐹subscript𝐶𝐹{\boldsymbol{\beta}}\in\mathbb{R}^{H_{F}\times W_{F}\times C_{F}}bold_italic_β ∈ blackboard_R start_POSTSUPERSCRIPT italic_H start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT × italic_W start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT × italic_C start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_POSTSUPERSCRIPT (βh,w,k[0,1]subscript𝛽𝑤𝑘01\beta_{h,w,k}\in[0,1]italic_β start_POSTSUBSCRIPT italic_h , italic_w , italic_k end_POSTSUBSCRIPT ∈ [ 0 , 1 ], where 1hHF1subscript𝐻𝐹1\leq h\leq H_{F}1 ≤ italic_h ≤ italic_H start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT, 1wWF1𝑤subscript𝑊𝐹1\leq w\leq W_{F}1 ≤ italic_w ≤ italic_W start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT, and 1kCF1𝑘subscript𝐶𝐹1\leq k\leq C_{F}1 ≤ italic_k ≤ italic_C start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT) with the same shape as the feature map. Compared to linear interpolation over attribute-space, such design introduces more flexibility for adversarial attacks. Empirical results in Section 4.2 show such design is critical to maintain both attack success and good perceptual quality at the same time.
与由标量 α𝛼\alphaitalic_α 参数化的属性空间插值相比,我们通过一个与特征图具有相同形状的张量 𝜷HF×WF×CF𝜷superscriptsubscript𝐻𝐹subscript𝑊𝐹subscript𝐶𝐹{\boldsymbol{\beta}}\in\mathbb{R}^{H_{F}\times W_{F}\times C_{F}}bold_italic_β ∈ blackboard_R start_POSTSUPERSCRIPT italic_H start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT × italic_W start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT × italic_C start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_POSTSUPERSCRIPTβh,w,k[0,1]subscript𝛽𝑤𝑘01\beta_{h,w,k}\in[0,1]italic_β start_POSTSUBSCRIPT italic_h , italic_w , italic_k end_POSTSUBSCRIPT ∈ [ 0 , 1 ] ,其中 1hHF1subscript𝐻𝐹1\leq h\leq H_{F}1 ≤ italic_h ≤ italic_H start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT1wWF1𝑤subscript𝑊𝐹1\leq w\leq W_{F}1 ≤ italic_w ≤ italic_W start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT1kCF1𝑘subscript𝐶𝐹1\leq k\leq C_{F}1 ≤ italic_k ≤ italic_C start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT )来参数化特征图插值。与属性空间的线性插值相比,这种设计为对抗攻击引入了更多的灵活性。第4.2节的实证结果表明,这种设计对于同时保持攻击成功和良好的感知质量至关重要。

3.3 Generating Semantically Meaningful Adversarial Examples
生成语义上有意义的对抗样本

Existing work obtains the adversarial image 𝐱advsuperscript𝐱adv\mathbf{x}^{\text{adv}}bold_x start_POSTSUPERSCRIPT adv end_POSTSUPERSCRIPT by adding perturbations or transforming the input image 𝐱𝐱\mathbf{x}bold_x directly. In contrast, our semantic attack method requires additional attribute-conditioned image generator 𝒢𝒢\mathcal{G}caligraphic_G during the adversarial image generation through interpolation. As we see in Eq. 3, the first term of our objective function is the adversarial metric, the second term is a smoothness constraint to guarantee the perceptual quality, and λ𝜆\lambdaitalic_λ is used to control the balance between the two terms. The adversarial metric is minimized once the model \mathcal{M}caligraphic_M has been successfully attacked towards the target image-label pair (𝐱tgt,𝐲tgt)superscript𝐱tgtsuperscript𝐲tgt(\mathbf{x}^{\text{tgt}},\mathbf{y}^{\text{tgt}})( bold_x start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT , bold_y start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT ). For identify verification, 𝐲tgtsuperscript𝐲tgt\mathbf{y}^{\text{tgt}}bold_y start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT is the identity representation of the target image; For structured prediction tasks in our paper, 𝐲tgtsuperscript𝐲tgt\mathbf{y}^{\text{tgt}}bold_y start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT either represents certain coordinates (landmark detection) or semantic label maps (semantic segmentation).
现有工作通过添加扰动或直接转换输入图像 𝐱𝐱\mathbf{x}bold_x 来获得敌对图像 𝐱advsuperscript𝐱adv\mathbf{x}^{\text{adv}}bold_x start_POSTSUPERSCRIPT adv end_POSTSUPERSCRIPT 。相比之下,我们的语义攻击方法在敌对图像生成过程中需要额外的属性条件图像生成器 𝒢𝒢\mathcal{G}caligraphic_G 进行插值。如我们在公式3中所见,我们的目标函数的第一项是对抗度量,第二项是平滑性约束以保证感知质量,而 λ𝜆\lambdaitalic_λ 用于控制这两个项之间的平衡。一旦模型 \mathcal{M}caligraphic_M 成功朝向目标图像标签对 (𝐱tgt,𝐲tgt)superscript𝐱tgtsuperscript𝐲tgt(\mathbf{x}^{\text{tgt}},\mathbf{y}^{\text{tgt}})( bold_x start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT , bold_y start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT ) 进行了攻击,对抗度量就会被最小化。对于识别验证, 𝐲tgtsuperscript𝐲tgt\mathbf{y}^{\text{tgt}}bold_y start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT 是目标图像的身份表示;对于我们论文中的结构化预测任务, 𝐲tgtsuperscript𝐲tgt\mathbf{y}^{\text{tgt}}bold_y start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT 表示某些坐标(地标检测)或语义标签映射(语义分割)。

𝐱advsuperscript𝐱adv\displaystyle\mathbf{x}^{\text{adv}}bold_x start_POSTSUPERSCRIPT adv end_POSTSUPERSCRIPT =argmin𝐱*(𝐱*)absentsubscriptargminsuperscript𝐱superscript𝐱\displaystyle={\operatorname*{argmin}}_{\mathbf{x}^{*}}\mathcal{L}(\mathbf{x}^{*})= roman_argmin start_POSTSUBSCRIPT bold_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT caligraphic_L ( bold_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT )
(𝐱*)superscript𝐱\displaystyle\mathcal{L}(\mathbf{x}^{*})caligraphic_L ( bold_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) =adv(𝐱*;,𝐲tgt)+λsmooth(𝐱*)absentsubscriptadvsuperscript𝐱superscript𝐲tgt𝜆subscriptsmoothsuperscript𝐱\displaystyle=\mathcal{L}_{\text{adv}}(\mathbf{x}^{*};\mathcal{M},\mathbf{y}^{\text{tgt}})+\lambda\cdot\mathcal{L}_{\text{smooth}}(\mathbf{x}^{*})= caligraphic_L start_POSTSUBSCRIPT adv end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ; caligraphic_M , bold_y start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT ) + italic_λ ⋅ caligraphic_L start_POSTSUBSCRIPT smooth end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) (3)

Identity verification. 身份验证。

In the identity verification task, two images are considered to be the same identity if the corresponding identity embeddings from the verification model \mathcal{M}caligraphic_M are reasonably close.
在身份验证任务中,如果验证模型中对应的身份嵌入特征相近,那么两个图像被认为是同一个身份。

adv(𝐱*;,𝐲tgt)=max{κ,Φid(𝐱*,𝐱tgt)}subscriptadvsuperscript𝐱superscript𝐲tgt𝜅superscriptsubscriptΦidsuperscript𝐱superscript𝐱tgt\mathcal{L}_{\text{adv}}(\mathbf{x}^{*};\mathcal{M},\mathbf{y}^{\text{tgt}})=\max\{\kappa,\Phi_{\mathcal{M}}^{\text{id}}(\mathbf{x}^{*},\mathbf{x}^{\text{tgt}})\}caligraphic_L start_POSTSUBSCRIPT adv end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ; caligraphic_M , bold_y start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT ) = roman_max { italic_κ , roman_Φ start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT id end_POSTSUPERSCRIPT ( bold_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT ) } (4)

As we see in Eq. 4, Φid(,)superscriptsubscriptΦid\Phi_{\mathcal{M}}^{\text{id}}(\cdot,\cdot)roman_Φ start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT id end_POSTSUPERSCRIPT ( ⋅ , ⋅ ) measures the distance between two identity embeddings from the model \mathcal{M}caligraphic_M, where the normalized L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT distance is used in our setting. In addition, we introduce the parameter κ𝜅\kappaitalic_κ representing the constant related to the false positive rate (FPR) threshold computed from the development set.
如方程式4所示, Φid(,)superscriptsubscriptΦid\Phi_{\mathcal{M}}^{\text{id}}(\cdot,\cdot)roman_Φ start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT id end_POSTSUPERSCRIPT ( ⋅ , ⋅ ) 测量了来自模型 \mathcal{M}caligraphic_M 的两个身份嵌入之间的距离,我们的设置中采用了归一化的 L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT 距离。此外,我们引入了参数 κ𝜅\kappaitalic_κ ,表示与从开发集计算得到的假阳性率(FPR)阈值相关的常数。

Structured prediction. 结构化预测。

For structured prediction tasks such as landmark detection and semantic segmentation, we use Houdini objective proposed in [13] as our adversarial metric and select the target landmark (semantic segmentation) target as 𝐲tgtsuperscript𝐲tgt\mathbf{y}^{\text{tgt}}bold_y start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT. As we see in the equation, Φ(,)subscriptΦ\Phi_{\mathcal{M}}(\cdot,\cdot)roman_Φ start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT ( ⋅ , ⋅ ) is a scoring function for each image-label pair and γ𝛾\gammaitalic_γ is the threshold. In addition, l(𝐲*,𝐲tgt)𝑙superscript𝐲superscript𝐲tgtl(\mathbf{y}^{*},\mathbf{y}^{\text{tgt}})italic_l ( bold_y start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , bold_y start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT ) is task loss decided by the specific adversarial target, where 𝐲*=(𝐱*)superscript𝐲superscript𝐱\mathbf{y}^{*}=\mathcal{M}(\mathbf{x}^{*})bold_y start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = caligraphic_M ( bold_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ).
对于结构化预测任务,如地标检测和语义分割,我们将 [13] 中提出的Houdini目标作为我们的对抗性度量,并选择目标地标(语义分割)目标为 𝐲tgtsuperscript𝐲tgt\mathbf{y}^{\text{tgt}}bold_y start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT

adv(𝐱*;,𝐲tgt)=subscriptadvsuperscript𝐱superscript𝐲tgtabsent\displaystyle\mathcal{L}_{\text{adv}}(\mathbf{x}^{*};\mathcal{M},\mathbf{y}^{\text{tgt}})=caligraphic_L start_POSTSUBSCRIPT adv end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ; caligraphic_M , bold_y start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT ) = Pγ𝒩(0,1)[Φ(𝐱*,𝐲*)Φ(𝐱*,𝐲tgt)<γ]l(𝐲*,𝐲tgt)subscript𝑃similar-to𝛾𝒩01delimited-[]subscriptΦsuperscript𝐱superscript𝐲subscriptΦsuperscript𝐱superscript𝐲tgt𝛾𝑙superscript𝐲superscript𝐲tgt\displaystyle P_{\gamma\sim\mathcal{N}(0,1)}\Big{[}\Phi_{\mathcal{M}}(\mathbf{x}^{*},\mathbf{y}^{*})-\Phi_{\mathcal{M}}(\mathbf{x}^{*},\mathbf{y}^{\text{tgt}})<\gamma\Big{]}\cdot l(\mathbf{y}^{*},\mathbf{y}^{\text{tgt}})italic_P start_POSTSUBSCRIPT italic_γ ∼ caligraphic_N ( 0 , 1 ) end_POSTSUBSCRIPT [ roman_Φ start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , bold_y start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) - roman_Φ start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , bold_y start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT ) < italic_γ ] ⋅ italic_l ( bold_y start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , bold_y start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT ) (5)

Interpolation smoothness smoothsubscriptsmooth\mathcal{L}_{\text{smooth}}caligraphic_L start_POSTSUBSCRIPT smooth end_POSTSUBSCRIPT.
插值平滑程度。

As the tensor to be interpolated in the feature-map space has far more parameters compared to the attribute itself, we propose to enforce a smoothness constraint on the tensor α𝛼\alphaitalic_α used in feature-map interpolation. As we see in Eq. 6, the smoothness loss encourages the interpolation tensors to consist of piece-wise constant patches spatially, which has been widely used as a pixel-wise de-noising objective for natural image processing [43, 27].
由于特征图空间中需要插值的张量参数远远多于属性本身,我们建议对特征图插值中使用的张量 α𝛼\alphaitalic_α 施加平滑限制。从等式6中可以看到,平滑损失鼓励插值张量在空间上由分段常数块组成,这在自然图像处理中被广泛用作逐像素去噪目标。

smooth(𝜷)=h=1HF1w=1WF𝜷h+1,w𝜷h,w22+h=1HFw=1WF1𝜷h,w+1𝜷h,w22subscriptsmooth𝜷superscriptsubscript1subscript𝐻𝐹1superscriptsubscript𝑤1subscript𝑊𝐹subscriptsuperscriptnormsubscript𝜷1𝑤subscript𝜷𝑤22superscriptsubscript1subscript𝐻𝐹superscriptsubscript𝑤1subscript𝑊𝐹1subscriptsuperscriptnormsubscript𝜷𝑤1subscript𝜷𝑤22\mathcal{L}_{\text{smooth}}(\boldsymbol{\beta})=\sum_{h=1}^{H_{F}-1}\sum_{w=1}^{W_{F}}\|{\boldsymbol{\beta}}_{h+1,w}-{\boldsymbol{\beta}}_{h,w}\|^{2}_{2}+\sum_{h=1}^{H_{F}}\sum_{w=1}^{W_{F}-1}\|{\boldsymbol{\beta}}_{h,w+1}-{\boldsymbol{\beta}}_{h,w}\|^{2}_{2}caligraphic_L start_POSTSUBSCRIPT smooth end_POSTSUBSCRIPT ( bold_italic_β ) = ∑ start_POSTSUBSCRIPT italic_h = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_w = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_W start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∥ bold_italic_β start_POSTSUBSCRIPT italic_h + 1 , italic_w end_POSTSUBSCRIPT - bold_italic_β start_POSTSUBSCRIPT italic_h , italic_w end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_h = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_w = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_W start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT ∥ bold_italic_β start_POSTSUBSCRIPT italic_h , italic_w + 1 end_POSTSUBSCRIPT - bold_italic_β start_POSTSUBSCRIPT italic_h , italic_w end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (6)

4 Experiments 4实验

In the experimental section, we mainly focus on analyzing the proposed SemanticAdv in attacking state-of-the-art face recognition systems [63, 59, 80, 67] due to its wide applicability (e.g., identification for mobile payment) in the real world. We attack both face verification and face landmark detection by generating attribute-conditioned adversarial examples using annotations from CelebA dataset [40]. In addition, we extend our attack to urban street scenes with semantic label maps as the condition. We attack the semantic segmentation model DRN-D-22 [77] previously trained on Cityscape [14] by generating adversarial examples with dynamic objects manipulated (e.g., insert a car into the scene).
在实验部分,我们主要关注分析提出的SemanticAdv在攻击最先进的人脸识别系统[63, 59, 80, 67]中的应用,这是由于其在现实世界中具有广泛的适用性(例如,移动支付的识别)。我们通过使用CelebA数据集[40]的注释生成属性条件对抗样本,攻击人脸验证和人脸特征点检测。此外,我们将我们的攻击扩展到具有语义标签地图作为条件的城市街景。我们攻击先前在Cityscape[14]上训练的语义分割模型DRN-D-22[77],通过生成具有动态物体操纵的对抗样本(例如,在场景中插入一辆汽车)。

The experimental section is organized as follows. First, we analyze the quality of generated adversarial examples and qualitatively compare our method with psubscript𝑝\mathcal{L}_{p}caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT bounded pixel-wise optimization-based methods [10, 16, 73]. Second, we provide both qualitative and quantitative results by controlling single semantic attribute. In terms of attack transferability, we evaluate our proposed SemanticAdv in various settings and further demonstrate the effectiveness of our method via query-free black-box attacks against online face verification platforms. Third, we compare our method with the baseline methods against different defense methods on the face verification task. Fourth, we demonstrate that our SemanticAdv is a general framework by showing the results in other tasks including face landmark detection and street-view semantic segmentation.

4.1 Experimental Setup 4.1实验设置

Face identity verification.
人脸识别验证。

We select ResNet-50 and ResNet-101 [23] trained on MS-Celeb-1M [22, 15] as our face verification models. The models are trained using two different objectives, namely, softmax loss [63, 80] and cosine loss [67]. For simplicity, we use the notation “R-N-S” to indicate the model with N𝑁Nitalic_N-layer residual blocks as backbone trained using softmax loss, while “R-N-C” indicates the same backbone trained using cosine loss. The distance between face features is measured by normalized L2 distance. For R-101-S model, we decide the parameter κ𝜅\kappaitalic_κ based on the false positive rate (FPR) for the identity verification task. Four different FPRs have been used: 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT (with κ=1.24𝜅1.24\kappa=1.24italic_κ = 1.24), 3×1043superscript1043\times 10^{-4}3 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT (with κ=1.05𝜅1.05\kappa=1.05italic_κ = 1.05), 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT (with κ=0.60𝜅0.60\kappa=0.60italic_κ = 0.60), and <104absentsuperscript104<10^{-4}< 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT (with κ=0.30𝜅0.30\kappa=0.30italic_κ = 0.30). The distance metrics and selected thresholds are commonly used when evaluating the performance of face recognition model [35, 32]. Supplementary provides more details on the performance of face recognition models and their corresponding κ𝜅\kappaitalic_κ. To distinguish between the FPR we used in generating adversarial examples and the other FPR used in evaluation, we introduce two notations “Generation FPR (G-FPR)” and “Test FPR (T-FPR)”. For the experiment with query-free black-box API attacks, we use two online face verification services provided by Face++ [2] and AliYun [1].
我们选择在MS-Celeb-1M上训练的ResNet-50和ResNet-101作为我们的人脸验证模型。这些模型使用两种不同的目标进行训练,即softmax损失和余弦损失。为简单起见,我们使用符号“R-N-S”表示使用softmax损失训练的带有 N𝑁Nitalic_N -layer残差块作为主干的模型,而“R-N-C”表示使用余弦损失训练的相同主干。人脸特征之间的距离通过归一化的L2距离来衡量。对于R-101-S模型,我们根据身份验证任务的误识率(FPR)来确定参数 κ𝜅\kappaitalic_κ 。使用了四种不同的FPR: 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT (使用括号 κ=1.24𝜅1.24\kappa=1.24italic_κ = 1.24 )、 3×1043superscript1043\times 10^{-4}3 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT (使用括号 κ=1.05𝜅1.05\kappa=1.05italic_κ = 1.05 )、 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT (使用括号 κ=0.60𝜅0.60\kappa=0.60italic_κ = 0.60 )、和 <104absentsuperscript104<10^{-4}< 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT ( 使用括号 κ=0.30𝜅0.30\kappa=0.30italic_κ = 0.30 )。在评估人脸识别模型性能时,距离度量和选定的阈值是常用的。详情请参考附件中关于人脸识别模型性能及其相关的 κ𝜅\kappaitalic_κ 。 为了区分生成对抗样本中使用的FPR和评估中使用的另一个FPR,我们引入了两个符号“生成FPR(G-FPR)”和“测试FPR(T-FPR)”。在无需查询的黑盒API攻击实验中,我们使用由Face++ [2] 和阿里云 [1] 提供的两个在线人脸验证服务。

Semantic attacks on face images.
对人脸图像进行语义攻击。

In our experiments, we randomly sample 1,28012801,2801 , 280 distinct identities form CelebA [40] and use the StarGAN [12] for attribute-conditional image editing. In particular, we re-train our model on CelebA by aligning the face landmarks and then resizing images to resolution 112×112112112112\times 112112 × 112. We select 17 identity-preserving attributes as our analysis, as such attributes mainly reflect variations in facial expression and hair color.
在我们的实验中,我们从CelebA [40] 中随机抽取 1,28012801,2801 , 280 个不同的身份,并使用StarGAN [12] 进行属性条件的图像编辑。具体来说,我们通过将脸部地标进行对齐,然后将图像调整至分辨率 112×112112112112\times 112112 × 112 ,重新训练我们的模型。我们选择了17个保持身份的属性进行分析,因为这些属性主要反映了面部表情和发色的变化。

In feature-map interpolation, to reduce the reconstruction error brought by the generator (e.g., 𝐱𝒢(𝐱,𝐜)𝐱𝒢𝐱𝐜\mathbf{x}\neq\mathcal{G}(\mathbf{x},\mathbf{c})bold_x ≠ caligraphic_G ( bold_x , bold_c )) in practice, we take one more step to obtain the updated feature map 𝐟=𝒢enc(𝐱,𝐜)superscript𝐟subscript𝒢encsuperscript𝐱𝐜\mathbf{f}^{\prime}=\mathcal{G}_{\text{enc}}(\mathbf{x}^{\prime},\mathbf{c})bold_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = caligraphic_G start_POSTSUBSCRIPT enc end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_c ), where 𝐱=argmin𝐱𝒢(𝐱,𝐜)𝐱superscript𝐱subscriptargminsuperscript𝐱norm𝒢superscript𝐱𝐜𝐱\mathbf{x}^{\prime}=\operatorname*{argmin}_{\mathbf{x}^{\prime}}\|\mathcal{G}(\mathbf{x}^{\prime},\mathbf{c})-\mathbf{x}\|bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = roman_argmin start_POSTSUBSCRIPT bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ caligraphic_G ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_c ) - bold_x ∥.
在特征图插值中,为了减少生成器(例如, 𝐱𝒢(𝐱,𝐜)𝐱𝒢𝐱𝐜\mathbf{x}\neq\mathcal{G}(\mathbf{x},\mathbf{c})bold_x ≠ caligraphic_G ( bold_x , bold_c ) )带来的重建误差,在实践中,我们再走一步,得到更新后的特征图 𝐟=𝒢enc(𝐱,𝐜)superscript𝐟subscript𝒢encsuperscript𝐱𝐜\mathbf{f}^{\prime}=\mathcal{G}_{\text{enc}}(\mathbf{x}^{\prime},\mathbf{c})bold_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = caligraphic_G start_POSTSUBSCRIPT enc end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_c ) ,其中 𝐱=argmin𝐱𝒢(𝐱,𝐜)𝐱superscript𝐱subscriptargminsuperscript𝐱norm𝒢superscript𝐱𝐜𝐱\mathbf{x}^{\prime}=\operatorname*{argmin}_{\mathbf{x}^{\prime}}\|\mathcal{G}(\mathbf{x}^{\prime},\mathbf{c})-\mathbf{x}\|bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = roman_argmin start_POSTSUBSCRIPT bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ caligraphic_G ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_c ) - bold_x ∥

For each distinct identity pair (𝐱,𝐱tgt)𝐱superscript𝐱tgt(\mathbf{x},\mathbf{x}^{\text{tgt}})( bold_x , bold_x start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT ), we perform semanticAdv guided by each of the 17 attributes (e.g., we intentionally add or remove one specific attribute while keeping the rest unchanged). In total, for each image 𝐱𝐱\mathbf{x}bold_x, we generate 17 adversarial images with different augmented attributes. In the experiments, we select a commonly-used pixel-wise adversarial attack method [10] (referred as CW) as our baseline. Compared to our proposed method, CW does not require visual attributes as part of the system, as it only generates one adversarial example for each instance. We refer the corresponding attack success rate as the instance-wise success rate in which the attack success rate is calculated for each instance. For each instance with 17 adversarial images using different augmented attributes, if one of the 17 produced images can attack successfully, we count the attack of this instance as one success, vice verse.
对于每对不同的身份 (𝐱,𝐱tgt)𝐱superscript𝐱tgt(\mathbf{x},\mathbf{x}^{\text{tgt}})( bold_x , bold_x start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT ) ,我们进行语义增强,其中依据17个属性之一进行引导(例如,我们有意添加或移除一个特定属性,同时保持其余属性不变)。总共,对于每个图像 𝐱𝐱\mathbf{x}bold_x ,我们生成具有不同增强属性的17个对抗性图像。在实验中,我们选择了一种常用的基于像素的对抗攻击方法[10](简称为CW)作为基准线。与我们提出的方法相比,CW不需要将视觉属性作为系统的一部分,因为它仅为每个实例生成一个对抗性示例。我们将相应的攻击成功率称为实例级成功率,其中攻击成功率是为每个实例计算的。对于使用不同增强属性产生的17个对抗性图像的每个实例,如果其中一个生成的17个图像可以成功攻击,我们将该实例的攻击计为一个成功,反之亦然。

Face landmark detection.
面部特征点检测。

We select Face Alignment Network (FAN) [9] trained on 300W-LP [82] and fine-tuned on 300-W [58] for 2D landmark detection. The network is constructed by stacking Hour-Glass networks [47] with hierarchical block [8]. Given a face image as input, FAN outputs 2D heatmaps which can be subsequently leveraged to yield 68686868 2D landmarks.
我们选择在300W-LP上训练并在300-W上微调的人脸对齐网络(FAN)进行2D地标检测。该网络通过堆叠带有分层块的Hour-Glass网络构建而成。给定一张人脸图像作为输入,FAN输出2D热图,随后可以利用这些热图得到2D地标。

Semantic attacks on street-view images.
街景图像的语义攻击。

We select DRN-D-22 [77] as our semantic segmentation model and fine-tune the model on image regions with resolution 256×256256256256\times 256256 × 256. To synthesize semantic adversarial perturbations, we consider semantic label maps as the input attribute and leverage a generative image manipulation model [24] pre-trained on CityScape [14] dataset. Given an input semantic label map at resolution 256×256256256256\times 256256 × 256, we select a target object instance (e.g., a pedestrian) to attack. Then, we create a manipulated semantic label map by inserting another object instance (e.g., a car) in the vicinity of the target object. Similar to the experiments in the face domain, for both semantic label maps, we use the image manipulation encoder to extract features (with 1,02410241,0241 , 024 channels at spatial resolution 16×16161616\times 1616 × 16) and conduct feature-space interpolation. We synthesize the final image by feeding the interpolated features to the image manipulation decoder. By searching the interpolation coefficient that maximizes the attack rate, we are able to fool the segmentation model with the synthesized final image.
我们选择DRN-D-22 [77]作为我们的语义分割模型,并在分辨率 256×256256256256\times 256256 × 256 的图像区域上微调模型。为了合成语义对抗扰动,我们将语义标签映射视为输入属性,并利用在CityScape [14]数据集上预先训练的生成式图像处理模型[24]。给定分辨率为 256×256256256256\times 256256 × 256 的输入语义标签映射,我们选择一个目标对象实例(例如,一个行人)进行攻击。然后,在目标对象的附近插入另一个对象实例(例如,一辆汽车)来创建一个操纵过的语义标签映射。类似于面部领域的实验,对于两个语义标签映射,我们使用图像处理编码器提取特征(在空间分辨率为 16×16161616\times 1616 × 161,02410241,0241 , 024 通道)并进行特征空间插值。通过寻找最大化攻击率的插值系数,我们能够通过将插值后的特征馈送给图像处理解码器来愚弄合成的最终图像,从而欺骗分割模型。

4.2 SemanticAdv on Face Identity Verification
人脸识别的语义增强

Refer to caption
Figure 2: Qualitative comparisons between attribute-space and feature-space interpolation In our visualization, we set the interpolation parameter to be 0.0,0.2,0.4,0.6,0.8,1.00.00.20.40.60.81.00.0,0.2,0.4,0.6,0.8,1.00.0 , 0.2 , 0.4 , 0.6 , 0.8 , 1.0
图2:属性空间与特征空间插值之间的定性比较在我们的可视化中,我们将插值参数设置为 0.0,0.2,0.4,0.6,0.8,1.00.00.20.40.60.81.00.0,0.2,0.4,0.6,0.8,1.00.0 , 0.2 , 0.4 , 0.6 , 0.8 , 1.0
Table 1: Attack success rate by selecting attribute or different layer’s feature-map for interpolation on R-101-S(%) using G-FPR=T-FPR=103G-FPRT-FPRsuperscript103\text{G-FPR}=\text{T-FPR}=10^{-3}G-FPR = T-FPR = 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT. Here, 𝐟isubscript𝐟𝑖\mathbf{f}_{i}bold_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT indicates the feature-map after i𝑖iitalic_i-th up-sampling operation. 𝐟2subscript𝐟2\mathbf{f}_{-2}bold_f start_POSTSUBSCRIPT - 2 end_POSTSUBSCRIPT and 𝐟1subscript𝐟1\mathbf{f}_{-1}bold_f start_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT are the first and the second feature-maps after the last down-sampling operation, respectively.
表1:通过选择属性或不同层特征图进行插值,在R-101-S上的攻击成功率(%)。这里, 𝐟isubscript𝐟𝑖\mathbf{f}_{i}bold_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT 表示第 i𝑖iitalic_i 次上采样操作后的特征图。 𝐟2subscript𝐟2\mathbf{f}_{-2}bold_f start_POSTSUBSCRIPT - 2 end_POSTSUBSCRIPT𝐟1subscript𝐟1\mathbf{f}_{-1}bold_f start_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT 分别是最后一次下采样操作后的第一和第二个特征图。
Interpolation / Attack Success (%)
插值/攻击成功率 (%)
Feature 特征 Attribute 属性
𝐟2subscript𝐟2\mathbf{f}_{-2}bold_f start_POSTSUBSCRIPT - 2 end_POSTSUBSCRIPT 𝐟1subscript𝐟1\mathbf{f}_{-1}bold_f start_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT 𝐟0subscript𝐟0\mathbf{f}_{0}bold_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT 𝐟1subscript𝐟1\mathbf{f}_{1}bold_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT 𝐟2subscript𝐟2\mathbf{f}_{2}bold_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
𝐱advsuperscript𝐱adv\mathbf{x}^{\text{adv}}bold_x start_POSTSUPERSCRIPT adv end_POSTSUPERSCRIPT, G-FPR = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT 99.38 100.00 100.00 100.00 99.69 0.08
𝐱advsuperscript𝐱adv\mathbf{x}^{\text{adv}}bold_x start_POSTSUPERSCRIPT adv end_POSTSUPERSCRIPT, G-FPR = 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT
𝐱advsuperscript𝐱adv\mathbf{x}^{\text{adv}}bold_x start_POSTSUPERSCRIPT adv end_POSTSUPERSCRIPT ,G-FPR = 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT
59.53 98.44 99.45 97.58 73.52 0.00

Attribute-space vs. feature-space interpolation.
属性空间与特征空间的插值。

First, we qualitatively compare the two interpolation methods and found that both attribute-space and feature-space interpolation can generate reasonably looking samples (see Figure 2) through interpolation (these are not adversarial examples). However, we found the two interpolation methods perform differently when we optimize using the adversarial objective (Eq. 3). We measure the attack success rate of attribute-space interpolation (with G-FPR = T-FPR = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT): 0.08%percent0.080.08\%0.08 % on R-101-S, 0.31%percent0.310.31\%0.31 % on R-101-C, and 0.16%percent0.160.16\%0.16 % on both R-50-S and R-50-C, which consistently fails to attack the face verification model. Compared to attribute-space interpolation, generating adversarial examples with feature-space interpolation produces much better quantitative results (see Table 1). We conjecture that this is because the high dimensional feature space can provide more manipulation freedom. This also explains one potential reason of poor samples (e.g., blurry with many noticeable artifacts) generated by the method proposed in [29]. We select 𝐟0subscript𝐟0\mathbf{f}_{0}bold_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, the last conv layer before up-sampling layer in the generator for feature-space interpolation due to its good performance.
首先,我们在定性上比较了两种插值方法,并发现属性空间和特征空间插值都可以通过插值生成外观合理的样本(参见图2)(这些并不是对抗样本)。然而,当我们使用对抗目标进行优化时(公式3),我们发现这两种插值方法表现不同。我们测量了属性空间插值的攻击成功率(带有G-FPR = T-FPR = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT ): 0.08%percent0.080.08\%0.08 % 在R-101-S上, 0.31%percent0.310.31\%0.31 % 在R-101-C上, 0.16%percent0.160.16\%0.16 % 在R-50-S和R-50-C上,这些插值方法始终无法攻击人脸验证模型。与属性空间插值相比,使用特征空间插值生成对抗样本可以产生更好的定量结果(见表1)。我们推测这是因为高维特征空间可以提供更多的操纵自由度。这也解释了[29]中提出的方法生成的样本质量差(例如,模糊且有许多明显的伪像)的一个潜在原因。我们选择在生成器的上采样层之前最后一个卷积层 𝐟0subscript𝐟0\mathbf{f}_{0}bold_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT 进行特征空间插值,因为它的性能很好。

Refer to caption
Figure 3: Top: Qualitative comparisons between our proposed SemanticAdv and pixel-wise adversarial examples generated by CW [10]. Along with the adversarial examples, we also provide the corresponding perturbations (residual) on the right. Perturbations generated by our SemanticAdv (G-FPR = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT) are unrestricted with semantically meaningful patterns. Bottom: Qualitative analysis on single-attribute adversarial attack (G-FPR = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT). More results are shown in the supplementary
图3: 顶部:我们提出的SemanticAdv与由CW [10]生成的逐像素对抗性样本之间的定性比较。 除了对抗性示例外,我们还在右侧提供相应的扰动(残差)。 我们的SemanticAdv生成的扰动(G-FPR = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT )没有受到语义上意义的模式的限制。 底部:关于单属性对抗攻击的定性分析(G-FPR = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT )。 更多结果请参见补充说明。

Qualitative analysis. 定性分析。

Figure 3 (top) shows the generated adversarial images and corresponding perturbations against R-101-S of SemanticAdv and CW respectively. The text below each figure is the name of an augmented attribute, the sign before the name represents “adding” (in red) or “removing” (in blue) the corresponding attribute from the original image. Figure 3 (bottom) shows the adversarial examples with 17 augmented semantic attributes, respectively. The attribute names are shown in the bottom. The first row contains images generated by 𝒢(𝐱,𝐜new)𝒢𝐱superscript𝐜new\mathcal{G}(\mathbf{x},\mathbf{c}^{\text{new}})caligraphic_G ( bold_x , bold_c start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT ) with an augmented attribute 𝐜newsuperscript𝐜new\mathbf{c}^{\text{new}}bold_c start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT and the second row includes the corresponding adversarial images under feature-space interpolation. It shows that our SemanticAdv can generate examples with reasonably-looking appearance guided by the corresponding attribute. In particular, SemanticAdv is able to generate perturbations on the corresponding regions correlated with the augmented attribute, while the perturbations of CW have no specific pattern and are evenly distributed across the image.
图3(上)显示了SemanticAdv和CW针对R-101-S生成的对抗图像及相应的扰动。每张图像下方的文本是一个扩增属性的名称,名称前的符号代表从原始图像中“添加”(红色)或“移除”(蓝色)相应属性。图3(下)分别展示了带有17个扩增语义属性的对抗示例。属性名称显示在下方。第一排包含由 𝒢(𝐱,𝐜new)𝒢𝐱superscript𝐜new\mathcal{G}(\mathbf{x},\mathbf{c}^{\text{new}})caligraphic_G ( bold_x , bold_c start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT ) 生成的带有扩增属性 𝐜newsuperscript𝐜new\mathbf{c}^{\text{new}}bold_c start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT 的图像,第二排包含相应的在特征空间插值下的对抗图像。结果显示,我们的SemanticAdv能够根据相应属性生成外观合理的示例。特别是,SemanticAdv能够在与扩增属性相关的区域上生成扰动,而CW的扰动没有特定模式且均匀分布在图像中。

To further measure the perceptual quality of the adversarial images generated by SemanticAdv in the most strict settings (G-FPR <104absentsuperscript104<10^{-4}< 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT), we conduct a user study using Amazon Mechanical Turk (AMT). In total, we collect 2,62026202,6202 , 620 annotations from 77777777 participants. In 39.14±1.96%plus-or-minus39.14percent1.9639.14\pm 1.96\%39.14 ± 1.96 % (close to random guess 50%percent5050\%50 %) of trials, the adversarial images generated by our SemanticAdv are selected as reasonably-looking images, while 30.27±1.96%plus-or-minus30.27percent1.9630.27\pm 1.96\%30.27 ± 1.96 % of trials by CW are selected as reasonably-looking. It indicates that SemanticAdv can generate more perceptually plausible adversarial examples compared with CW under the most strict setting (G-FPR <104absentsuperscript104<10^{-4}< 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT). The corresponding images are shown in supplementary materials.
为了进一步衡量SemanticAdv生成的对抗性图像在最严格设置中的感知质量(G-FPR<b0>),我们使用亚马逊机械土耳其(AMT)进行了用户研究。总共我们收集到 2,62026202,6202 , 620 个参与者的 77777777 注释。在 39.14±1.96%plus-or-minus39.14percent1.9639.14\pm 1.96\%39.14 ± 1.96 % (接近随机猜测 50%percent5050\%50 % )次试验中,我们SemanticAdv生成的对抗性图像被选为合理的图像,而有 30.27±1.96%plus-or-minus30.27percent1.9630.27\pm 1.96\%30.27 ± 1.96 % 的试验中由CW选为合理。这表明,相对于CW,在最严格的设置下(G-FPR<b6>),SemanticAdv可以生成更具感知上可信度的对抗性例子。 相应的图像显示在附加资料中。

Refer to caption
Figure 4: Quantitative analysis on the attack success rate with different single-attribute attacks. In each figure, we show the results correspond to a larger FPR (G-FPR = T-FPR = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT) in skyblue and the results correspond to a smaller FPR (G-FPR = T-FPR = 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT) in blue, respectively
图4:对不同单属性攻击的攻击成功率进行定量分析。在每个图中,我们分别展示了对应于较大FPR(G-FPR = T-FPR = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT )的结果(天蓝色)和对应于较小FPR(G-FPR = T-FPR = 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT )的结果(蓝色)。

Single attribute analysis.
单属性分析。

One of the key advantages of our SemanticAdv is that we can generate adversarial perturbations in a more controllable fashion guided by the selected semantic attribute. This allows analyzing the robustness of a recognition system against different types of semantic attacks. We group the adversarial examples by augmented attributes in various settings. In Figure 4, we present the attack success rate against two face verification models, namely, R-101-S and R-101-C, using different attributes. We highlight the bar with light blue for G-FPR = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT and blue for G-FPR = 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT, respectively. As shown in Figure 4, with a larger T-FPR = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT, our SemanticAdv can achieve almost 100% attack success rate across different attributes. With a smaller T-FPR = 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT, we observe that SemanticAdv guided by some attributes such as Mouth Slightly Open and Arched Eyebrows achieve less than 50% attack success rate, while other attributes such as Pale Skin and Eyeglasses are relatively less affected. In summary, the above experiments indicate that SemanticAdv guided by attributes describing the local shape (e.g., mouth, earrings) achieve a relatively lower attack success rate compared to attributes relevant to the color (e.g., hair color) or entire face region (e.g., skin). This suggests that the face verification models used