这是用户在 2024-3-11 23:36 为 https://ar5iv.labs.arxiv.org/html/1906.07927?_immersive_translate_auto_translate=1 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?
11institutetext: The Chinese University of Hong Kong, Shenzhen
香港中文大学(深圳)
22institutetext: University of Michigan, Ann Arbor
密西根大学,安娜堡
33institutetext: The Chinese University of Hong Kong
香港中文大学
44institutetext: University of Illinois Urbana-Champaign
伊利诺伊大学厄巴纳-香槟分校
55institutetext: Uber ATG, San Francisco
优步ATG,旧金山

SemanticAdv: Generating Adversarial Examples
via Attribute-conditioned Image Editing
通过属性条件图像编辑生成对抗样本

Haonan Qiu  邱浩楠Alphabetical ordering; The first three authors contributed equally.11
按字母顺序排列;前三位作者贡献相同。11
   Chaowei Xiao*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT 肖朝伟 *{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT 22    Lei Yang*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT 雷扬 *{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT 33    Xinchen Yan  严欣宸Work partially done as a PhD student at University of Michigan.2255
密歇根大学博士生部分工作完成。2255
  
Honglak Lee  李洪樂
22
   Bo Li  玻璃44
Abstract 摘要

Deep neural networks (DNNs) have achieved great successes in various vision applications due to their strong expressive power. However, recent studies have shown that DNNs are vulnerable to adversarial examples which are manipulated instances targeting to mislead DNNs to make incorrect predictions. Currently, most such adversarial examples try to guarantee “subtle perturbation” by limiting the Lpsubscript𝐿𝑝L_{p}italic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT norm of the perturbation. In this paper, we propose SemanticAdv to generate a new type of semantically realistic adversarial examples via attribute-conditioned image editing. Compared to existing methods, our SemanticAdv enables fine-grained analysis and evaluation of DNNs with input variations in the attribute space. We conduct comprehensive experiments to show that our adversarial examples not only exhibit semantically meaningful appearances but also achieve high targeted attack success rates under both whitebox and blackbox settings. Moreover, we show that the existing pixel-based and attribute-based defense methods fail to defend against SemanticAdv. We demonstrate the applicability of SemanticAdv on both face recognition and general street-view images to show its generalization. Such non-Lpsubscript𝐿𝑝L_{p}italic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT bounded adversarial examples with controlled attribute manipulation can shed light on further understanding about vulnerabilities of DNNs as well as novel defense approaches.
深度神经网络(DNNs)由于其强大的表达能力在各种视觉应用中取得了巨大成功。然而,最近的研究表明,DNNs对抗性示例很容易受到攻击,这些被操纵的实例旨在误导DNNs做出不正确的预测。目前,大多数此类对抗性示例尝试通过限制扰动的范数来保证“微妙的扰动”。在本文中,我们提出SemanticAdv,通过属性条件图像编辑生成一种新的语义逼真的对抗性示例。与现有方法相比,我们的SemanticAdv能够在属性空间中具有输入变化的细粒度分析和评估DNNs。我们开展了全面的实验,展示了我们的对抗性示例不仅表现出语义上有意义的外观,而且在白盒和黑盒环境下均实现了高目标攻击成功率。此外,我们证明现有基于像素和属性的防御方法无法抵御SemanticAdv。我们展示了SemanticAdv在人脸识别和一般街景图像上的适用性以展示其泛化能力。 具有受控属性操纵的非有界对抗样本可以帮助进一步了解深度神经网络的漏洞以及新颖的防御方法。

1 Introduction 1介绍

Deep neural networks (DNNs) have demonstrated great successes in advancing the state-of-the-art performance in various vision tasks [36, 61, 64, 23, 59, 41, 77, 11] and have been widely used in many safety-critical applications such as face verification and autonomous driving [79]. At the same time, several studies [65, 21, 45, 51, 10, 71, 72, 70] have revealed the vulnerablity of DNNs against input variations. For example, carefully crafted Lpsubscript𝐿𝑝L_{p}italic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT bounded perturbations added to the pristine input images can introduce arbitrary prediction errors during testing time. While being visually imperceptible, Lpsubscript𝐿𝑝L_{p}italic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT bounded adversarial attacks have certain limitations as they only capture the variations in the raw pixel space and cannot guarantee the semantic realism for the generated instances. Recent works [72, 30, 69] have shown the limitations of only measuring and evaluating the Lpsubscript𝐿𝑝L_{p}italic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT bounded perturbation (e.g., cannot handle variations in lighting conditions). Therefore, understanding the failure modes of deep neural networks beyond raw pixel variations including semantic perturbations requires further understanding and exploration.
深度神经网络(DNNs)已经在多种视觉任务中取得了巨大成功,推动了最先进性能的发展,并被广泛应用于许多安全关键应用,如人脸验证和自动驾驶。与此同时,几项研究揭示了DNN对输入变化的脆弱性。例如,精心设计的边界扰动添加到原始输入图像中会在测试时引入任意的预测错误。虽然在视觉上无法察觉,但边界对抗攻击有一定的局限性,因为它们仅捕捉原始像素空间的变化,不能保证生成实例的语义真实性。最近的研究显示,仅计算和评估边界扰动的局限性(如无法处理光照条件的变化)。因此,理解深度神经网络在原始像素变化之外的故障模式,包括语义扰动,需要进一步的理解和探索。

In this work, we focus on studying how DNNs respond towards semantically meaningful perturbations in the visual attribute space. In the visual recognition literature, visual attributes [19, 37, 52] are properties observable in images that have human-designated properties (e.g., black hair and blonde hair). As illustrated in Figure 1 (left), given an input image with known attributes, we would like to craft semantically meaningful (attribute-conditioned) adversarial examples via image editing along a single attribute or a subset of attributes while keeping the rest unchanged. Compared to traditional Lpsubscript𝐿𝑝L_{p}italic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT bounded adversarial perturbations or semantic perturbations on global color and texture [5], such attribute-based image editing enables the users to conduct a fine-grained analysis and evaluation of the DNN models through removing one or a set of visual aspects or adding one object into the scene. We believe our attribute-conditioned image editing is a natural way of introducing semantic perturbations, and it preserves clear interpretability as: wearing a new pair of glasses or having the hair dyed with a different color.
在这项工作中,我们专注于研究深度神经网络对视觉属性空间中语义相关扰动的响应。在视觉识别文献中,视觉属性[19, 37, 52]是可在图像中观察到的具有人为属性的特征(例如黑发和金发)。如左图所示,针对具有已知属性的输入图像,我们希望通过沿着单个属性或属性子集进行图像编辑,制作语义相关(属性条件下的)对抗样本,同时保持其他部分不变。与传统的有界对抗扰动或全局颜色和纹理的语义扰动相比[5],这种基于属性的图像编辑使用户能够通过删除一个或一组视觉方面或在场景中添加一个对象来进行细粒度的分析和评估DNN模型。我们相信我们的属性条件下的图像编辑是引入语义扰动的一种自然方式,并保留清晰可解释性,例如:戴上一副新眼镜或将头发染成不同颜色。

Refer to caption
Figure 1: Pipeline of SemanticAdv Left: Each row shows a pair of images differ in only one semantic aspect. One of them is sampled from the ground-truth dataset, while the other one is created by our conditional image generator, which is adversary to the recognition model (e.g., face identification network and semantic segmentation network). Right: Overview of the proposed attribute-conditioned SemanticAdv against the face identity verification model
图1:SemanticAdv的流程。左侧:每行显示一对图片,仅在一个语义方面有所不同。其中一张是从真实数据集中采样的,而另一张是由我们的有条件图像生成器创建的,该生成器对识别模型具有对抗性(例如,人脸识别网络和语义分割网络)。右侧:属性条件下的SemanticAdv对抗人脸身份验证模型的概述。

To facilitate the generation of semantic adversarial perturbations along a single attribute dimension, we take advantage of the disentangled representation in deep image generative models [55, 31, 6, 75, 12, 3, 76, 28]. Such disentangled representation allows us to explore the variations for a specific semantic factor while keeping the other factors unchanged. As illustrated in Figure 1 (right), we first leverage an attribute-conditioned image editing model [12] to construct a new instance which is very similar to the source except one semantic aspect (the source image is given as input). Given such pair of images, we synthesize the adversarial example by interpolating between the pair of images in the feature-map space. As the interpolation is constrained by the image pairs, the appearance of the resulting semantic adversarial example resembles both of them.
为了促进沿着单一属性维度生成语义对抗性扰动,我们利用深度图像生成模型中的分解表示。这种分解表示使我们能够探索特定语义因素的变化,同时保持其他因素不变。如右图1所示,我们首先利用属性条件的图像编辑模型来构建一个新实例,该实例与源图像非常相似,只有一个语义方面不同(源图像作为输入提供)。在给定这对图像的情况下,我们通过在特征映射空间中插值生成对抗样本。由于插值受到图像对的约束,因此生成的语义对抗样本的外观类似于它们两者。

To validate the effectiveness of our proposed SemanticAdv by attribute-conditioned image editing, we consider two real-world tasks, including face verification and landmark detection. We conduct both qualitative and quantitative evaluations on CelebA dataset [40]. The results show that our SemanticAdv not only achieves high targeted attack success rate and also preserves the semantic meaning of the corresponding input images. To further demonstrate the applicability of our SemanticAdv beyond face domain, we extend the framework to generate adversarial street-view images. We treat semantic layouts as input attributes and use the layout-conditioned image editing model [24] pre-trained on Cityscape dataset [14]. Our results show that a well-trained semantic segmentation model can be successfully attacked to neglect the pedestrian if we insert another object by the side using our image editing model. In addition, we show that existing adversarial training-based defense method is less effective against our attack method, which motivates further defense strategies against such semantic adversarial examples.
验证我们提出的SemanticAdv在属性条件图像编辑中的有效性,考虑两个现实世界任务,包括人脸验证和地标检测。我们在CelebA数据集上进行定性和定量评估。结果表明我们的SemanticAdv不仅实现了较高的目标攻击成功率,还保留了对应输入图像的语义含义。为进一步展示我们的SemanticAdv在人脸领域以外的适用性,我们扩展该框架以生成对抗街景图像。我们将语义布局视为输入属性,并使用在Cityscape数据集上预训练的布局条件图像编辑模型。结果显示,一个训练良好的语义分割模型可以通过我们的图像编辑模型成功攻击忽视行人,如果我们通过侧面插入另一个物体。此外,我们展示现有基于对抗训练的防御方法对我们的攻击方法不太有效,这促使进一步针对这种语义对抗示例的防御策略。

Our contributions are summarized as follows: (1) We propose a novel method SemanticAdv to generate semantically meaningful adversarial examples via attribute-conditioned image editing based on feature-space interpolation. Compared to existing adversarial attacks, our method enables fine-grained attribute analysis as well as further evaluation of vulnerabilities for DNN models. Such semantic adversarial examples also provide explainable analysis for different attributes in terms of their robustness and editing flexibility. (2) We conduct extensive experiments and show that the proposed feature-space interpolation strategy can generate high quality attribute-conditioned adversarial examples more effectively than the simple attribute-space interpolation. Additionally, our SemanticAdv exhibits high attack transferability as well as 67.7% query-free black-box attack success rate on a real-world face verification platform. (3) We empirically show that, compared to Lpsubscript𝐿𝑝L_{p}italic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT attacks, the existing per-pixel based as well as attribute-based defense methods fail to defend against our SemanticAdv, which indicates that such semantic adversarial examples identify certain unexplored vulnerable landscape of DNNs. (4) To demonstrate the applicability and generalization of SemanticAdv beyond the face recognition domain, we extend the framework to generate adversarial street-view images that fool semantic segmentation models effectively.
我们的贡献总结如下:(1) 我们提出了一种新颖的方法SemanticAdv,通过基于特征空间插值的属性条件图像编辑来生成语义上有意义的对抗样本。与现有的对抗攻击相比,我们的方法能够进行细粒度的属性分析,并进一步评估深度神经网络模型的漏洞性。这种语义对抗样本还可以提供关于不同属性的鲁棒性和编辑灵活性的可解释分析。(2) 我们进行了大量实验证明,所提出的特征空间插值策略能够比简单的属性空间插值更有效地生成高质量的属性条件对抗样本。此外,我们的SemanticAdv在真实的人脸验证平台上表现出高攻击可传递性,以及67.7%的无查询黑盒攻击成功率。 我们通过实证表明,与基于像素的攻击相比,现有的基于像素和属性的防御方法无法抵御我们的SemanticAdv,这表明这样的语义对抗样本识别出DNNs中某些未被探索的易受攻击的风险领域。为了展示SemanticAdv在超越人脸识别领域的适用性和泛化能力,我们将这个框架扩展到生成欺骗语义分割模型的对抗街景图像。

2 Related Work 相关工作

Semantic image editing. 语义图像编辑。

Semantic image synthesis and manipulation is a popular research topic in machine learning, graphics and vision. Thanks to recent advances in deep generative models [34, 20, 50] and the empirical analysis of deep classification networks [36, 61, 64], past few years have witnessed tremendous breakthroughs towards high-fidelity pure image generation [55, 31, 6], attribute-to-image generation [75, 12], text-to-image generation [44, 56, 49, 48, 78, 28], and image-to-image translation [26, 81, 39, 68, 24].
语义图像合成和操作是机器学习、图形学和视觉领域中的研究热点。由于深度生成模型的最新进展以及对深度分类网络的经验分析,过去几年已经取得了对于高保真纯图像生成、属性到图像生成、文本到图像生成以及图像到图像翻译等方面巨大的突破。

Adversarial examples. 对抗样本。

Generating Lpsubscript𝐿𝑝L_{p}italic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT bounded adversarial perturbation has been extensively studied recently [65, 21, 45, 51, 10, 71]. To further explore diverse adversarial attacks and potentially help inspire defense mechanisms, it is important to generate the so-called “unrestricted” adversarial examples which contain unrestricted magnitude of perturbation while still preserve perceptual realism [7]. Recently, [72, 18] propose to spatially transform the image patches instead of adding pixel-wise perturbation, while such spatial transformation does not consider semantic information. Our proposed semanticAdv focuses on generating unrestricted perturbation with semantically meaningful patterns guided by visual attributes.
最近,生成 Lpsubscript𝐿𝑝L_{p}italic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT 受限制的对抗性扰动已被广泛研究[65, 21, 45, 51, 10, 71]。为了进一步探索多样的对抗攻击并潜在地帮助启发防御机制,生成所谓的“无限制”对抗性示例至关重要,这些示例包含无限制的扰动幅度,同时仍保持知觉逼真性[7]。最近,[72, 18]提出对图像补丁进行空间变换,而不是添加逐像素扰动,然而,这种空间变换不考虑语义信息。我们提出的semanticAdv专注于生成具有语义上有意义的模式的无限制扰动,该扰动由视觉属性指导。

Relevant to our work, [62] proposed to synthesize adversarial examples with an unconditional generative model. [5] studied semantic transformation in only the color or texture space. Compared to these works, semanticAdv is able to generate adversarial examples in a controllable fashion using specific visual attributes by performing manipulation in the feature space. We further analyze the robustness of the recognition system by generating adversarial examples guided by different visual attributes. Concurrent to our work, [29] proposed to generate semantic-based attacks against a restricted binary classifier, while our attack is able to mislead the model towards arbitrary adversarial targets. They conduct the manipulation within the attribution space which is less flexible and effective than our proposed feature-space interpolation.
与我们的工作相关,[62]提出使用无条件生成模型合成对抗样本。[5]仅在颜色或纹理空间中研究语义转换。与这些工作相比,semanticAdv能够通过在特征空间中进行操作,以特定的视觉属性以可控方式生成对抗样本。我们通过引导不同视觉属性生成对抗样本来进一步分析识别系统的鲁棒性。与我们的工作同时进行,[29]提出针对受限二元分类器生成基于语义的攻击,而我们的攻击能够将模型引向任意对抗目标。他们在属性空间内进行操作,比我们提出的特征空间插值方法不够灵活和有效。

3 SemanticAdv 3语义优势

3.1 Problem Definition 问题定义

Let \mathcal{M}caligraphic_M be a machine learning model trained on a dataset 𝒟={(𝐱,𝐲)}𝒟𝐱𝐲\mathcal{D}=\left\{(\mathbf{x},\mathbf{y})\right\}caligraphic_D = { ( bold_x , bold_y ) } consisting of image-label pairs, where 𝐱H×W×DI𝐱superscript𝐻𝑊subscript𝐷𝐼\mathbf{x}\in\mathbb{R}^{H\times W\times D_{I}}bold_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_H × italic_W × italic_D start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and 𝐲DL𝐲superscriptsubscript𝐷𝐿\mathbf{y}\in\mathbb{R}^{D_{L}}bold_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_D start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT end_POSTSUPERSCRIPT denote the image and the ground-truth label, respectively. Here, H𝐻Hitalic_H, W𝑊Witalic_W, DIsubscript𝐷𝐼D_{I}italic_D start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT, and DLsubscript𝐷𝐿D_{L}italic_D start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT denote the image height, image width, number of image channels, and label dimensions, respectively. For each image 𝐱𝐱\mathbf{x}bold_x, our model \mathcal{M}caligraphic_M makes a prediction 𝐲^=(𝐱)DL^𝐲𝐱superscriptsubscript𝐷𝐿\hat{\mathbf{y}}=\mathcal{M}(\mathbf{x})\in\mathbb{R}^{D_{L}}over^ start_ARG bold_y end_ARG = caligraphic_M ( bold_x ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_D start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. Given a target image-label pair (𝐱tgt,𝐲tgt)superscript𝐱tgtsuperscript𝐲tgt(\mathbf{x}^{\text{tgt}},\mathbf{y}^{\text{tgt}})( bold_x start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT , bold_y start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT ) and 𝐲𝐲tgt𝐲superscript𝐲tgt\mathbf{y}\neq\mathbf{y}^{\text{tgt}}bold_y ≠ bold_y start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT, a traditional attacker aims to synthesize adversarial examples 𝐱advsuperscript𝐱adv\mathbf{x}^{\text{adv}}bold_x start_POSTSUPERSCRIPT adv end_POSTSUPERSCRIPT by adding pixel-wise perturbations to or spatially transforming the original image 𝐱𝐱\mathbf{x}bold_x such that (𝐱adv)=𝐲tgtsuperscript𝐱advsuperscript𝐲tgt\mathcal{M}(\mathbf{x}^{\text{adv}})=\mathbf{y}^{\text{tgt}}caligraphic_M ( bold_x start_POSTSUPERSCRIPT adv end_POSTSUPERSCRIPT ) = bold_y start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT. In this work, we consider a semantic attacker that generates semantically meaningful perturbation via attribute-conditioned image editing with a conditional generative model 𝒢𝒢\mathcal{G}caligraphic_G. Compared to the traditional attacker, the proposed attack method generates adversarial examples in a more controllable fashion by editing a single semantic aspect through attribute-conditioned image editing.
假设 \mathcal{M}caligraphic_M 是一个机器学习模型,该模型在由图像-标签对组成的数据集 𝒟={(𝐱,𝐲)}𝒟𝐱𝐲\mathcal{D}=\left\{(\mathbf{x},\mathbf{y})\right\}caligraphic_D = { ( bold_x , bold_y ) } 上进行训练,其中 𝐱H×W×DI𝐱superscript𝐻𝑊subscript𝐷𝐼\mathbf{x}\in\mathbb{R}^{H\times W\times D_{I}}bold_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_H × italic_W × italic_D start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT end_POSTSUPERSCRIPT𝐲DL𝐲superscriptsubscript𝐷𝐿\mathbf{y}\in\mathbb{R}^{D_{L}}bold_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_D start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT end_POSTSUPERSCRIPT 分别表示图像和地面真实标签。这里, H𝐻Hitalic_HW𝑊Witalic_WDIsubscript𝐷𝐼D_{I}italic_D start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPTDLsubscript𝐷𝐿D_{L}italic_D start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT 分别表示图像的高度、宽度、图像通道数量和标签维度。对于每个图像 𝐱𝐱\mathbf{x}bold_x ,我们的模型 \mathcal{M}caligraphic_M 进行预测 𝐲^=(𝐱)DL^𝐲𝐱superscriptsubscript𝐷𝐿\hat{\mathbf{y}}=\mathcal{M}(\mathbf{x})\in\mathbb{R}^{D_{L}}over^ start_ARG bold_y end_ARG = caligraphic_M ( bold_x ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_D start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT end_POSTSUPERSCRIPT 。给定目标图像-标签对 (𝐱tgt,𝐲tgt)superscript𝐱tgtsuperscript𝐲tgt(\mathbf{x}^{\text{tgt}},\mathbf{y}^{\text{tgt}})( bold_x start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT , bold_y start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT )𝐲𝐲tgt𝐲superscript𝐲tgt\mathbf{y}\neq\mathbf{y}^{\text{tgt}}bold_y ≠ bold_y start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT ,传统的攻击者旨在通过向原始图像 𝐱𝐱\mathbf{x}bold_x 添加像素级扰动或对其进行空间变换来合成对抗性示例 𝐱advsuperscript𝐱adv\mathbf{x}^{\text{adv}}bold_x start_POSTSUPERSCRIPT adv end_POSTSUPERSCRIPT ,使得 (𝐱adv)=𝐲tgtsuperscript𝐱advsuperscript𝐲tgt\mathcal{M}(\mathbf{x}^{\text{adv}})=\mathbf{y}^{\text{tgt}}caligraphic_M ( bold_x start_POSTSUPERSCRIPT adv end_POSTSUPERSCRIPT ) = bold_y start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT 。在这项工作中,我们考虑一种语义攻击者,通过属性条件图像编辑使用条件生成模型 𝒢𝒢\mathcal{G}caligraphic_G 生成语义上有意义的扰动。与传统攻击者相比,提出的攻击方法通过属性条件图像编辑以更可控的方式生成对抗性示例,通过编辑单个语义方面。

3.2 Attribute-conditioned Image Editing
3.2基于属性的图像编辑

In order to produce semantically meaningful perturbations, we first introduce how to synthesize attribute-conditioned images through interpolation.
为了产生有语义意义的扰动,我们首先介绍如何通过插值合成属性条件图像。

Semantic image editing. 语义图像编辑。

For simplicity, we start with the formulation where the input attribute is represented as a compact vector. This formulation can be directly extended to other input attribute formats including semantic layouts. Let 𝐜DC𝐜superscriptsubscript𝐷𝐶\mathbf{c}\in\mathbb{R}^{D_{C}}bold_c ∈ blackboard_R start_POSTSUPERSCRIPT italic_D start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT end_POSTSUPERSCRIPT be an attribute representation reflecting the semantic factors (e.g., expression or hair color of a portrait image) of image 𝐱𝐱\mathbf{x}bold_x, where DCsubscript𝐷𝐶D_{C}italic_D start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT indicates the attribute dimension and ci{0,1}subscript𝑐𝑖01c_{i}\in\{0,1\}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ { 0 , 1 } indicates the existence of i𝑖iitalic_i-th attribute. We are interested in performing semantic image editing using the attribute-conditioned image generator 𝒢𝒢\mathcal{G}caligraphic_G. For example, given a portrait image of a girl with black hair and the new attribute blonde hair, our generator is supposed to synthesize a new image that turns the girl’s hair color from black to blonde while keeping the rest of appearance unchanged. The synthesized image is denoted as 𝐱new=𝒢(𝐱,𝐜new)superscript𝐱new𝒢𝐱superscript𝐜new\mathbf{x}^{\text{new}}=\mathcal{G}(\mathbf{x},\mathbf{c}^{\text{new}})bold_x start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT = caligraphic_G ( bold_x , bold_c start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT ) where 𝐜newDCsuperscript𝐜newsuperscriptsubscript𝐷𝐶\mathbf{c}^{\text{new}}\in\mathbb{R}^{D_{C}}bold_c start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_D start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is the new attribute. In the special case when there is no attribute change (𝐜=𝐜new𝐜superscript𝐜new\mathbf{c}=\mathbf{c}^{\text{new}}bold_c = bold_c start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT), the generator simply reconstructs the input: 𝐱=𝒢(𝐱,𝐜)superscript𝐱𝒢𝐱𝐜\mathbf{x^{\prime}}=\mathcal{G}(\mathbf{x},\mathbf{c})bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = caligraphic_G ( bold_x , bold_c ) (ideally, we hope 𝐱superscript𝐱\mathbf{x^{\prime}}bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT equals to 𝐱𝐱\mathbf{x}bold_x). As our attribute representation is disentangled and the change of attribute value is sufficiently small (e.g., we only edit a single semantic attribute), our synthesized image 𝐱newsuperscript𝐱new\mathbf{x}^{\text{new}}bold_x start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT is expected to be close to the data manifold [4, 57, 55]. In addition, we can generate many similar images by linearly interpolating between the image pair 𝐱𝐱\mathbf{x}bold_x and 𝐱newsuperscript𝐱new\mathbf{x}^{\text{new}}bold_x start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT in the attribute-space or the feature-space of the image-conditioned generator 𝒢𝒢\mathcal{G}caligraphic_G, which is supported by the previous work [75, 55, 3]
为简便起见,我们从将输入属性表示为紧凑向量的公式开始。这种公式可以直接扩展到其他输入属性格式,包括语义布局。 设 𝐜DC𝐜superscriptsubscript𝐷𝐶\mathbf{c}\in\mathbb{R}^{D_{C}}bold_c ∈ blackboard_R start_POSTSUPERSCRIPT italic_D start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT end_POSTSUPERSCRIPT 是反映图像 𝐱𝐱\mathbf{x}bold_x 的语义因素(例如肖像图像的表情或头发颜色)的属性表示,其中 DCsubscript𝐷𝐶D_{C}italic_D start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT 表示属性维度, ci{0,1}subscript𝑐𝑖01c_{i}\in\{0,1\}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ { 0 , 1 } 表示属性的存在。我们有兴趣使用属性条件的图像生成器 𝒢𝒢\mathcal{G}caligraphic_G 执行语义图像编辑。例如,给定一幅黑发女孩的肖像图像和新属性金发,我们的生成器应该合成一幅新图像,将女孩的头发颜色从黑色变为金色,同时保持其余外观不变。合成图像表示为 𝐱new=𝒢(𝐱,𝐜new)superscript𝐱new𝒢𝐱superscript𝐜new\mathbf{x}^{\text{new}}=\mathcal{G}(\mathbf{x},\mathbf{c}^{\text{new}})bold_x start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT = caligraphic_G ( bold_x , bold_c start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT ) ,其中 𝐜newDCsuperscript𝐜newsuperscriptsubscript𝐷𝐶\mathbf{c}^{\text{new}}\in\mathbb{R}^{D_{C}}bold_c start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_D start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT end_POSTSUPERSCRIPT 是新属性。在没有属性变化的特殊情况下( 𝐜=𝐜new𝐜superscript𝐜new\mathbf{c}=\mathbf{c}^{\text{new}}bold_c = bold_c start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT ),生成器仅重建输入: 𝐱=𝒢(𝐱,𝐜)superscript𝐱𝒢𝐱𝐜\mathbf{x^{\prime}}=\mathcal{G}(\mathbf{x},\mathbf{c})bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = caligraphic_G ( bold_x , bold_c ) (理想情况下,我们希望 𝐱superscript𝐱\mathbf{x^{\prime}}bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT 等于 𝐱𝐱\mathbf{x}bold_x )。 由于我们的属性表示是解缰的,属性值的变化足够小(例如,我们只编辑一个语义属性),我们合成的图像 𝐱newsuperscript𝐱new\mathbf{x}^{\text{new}}bold_x start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT 预计会接近数据流形。[4, 57, 55]。此外,我们可以在图像条件生成器的属性空间或特征空间中通过线性插值生成许多与图像对 𝐱𝐱\mathbf{x}bold_x𝐱newsuperscript𝐱new\mathbf{x}^{\text{new}}bold_x start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT 相似的图像,该方法得到了之前研究的支持[75, 55, 3]

Attribute-space interpolation.
属性空间插值。

Given a pair of attributes 𝐜𝐜\mathbf{c}bold_c and 𝐜newsuperscript𝐜new\mathbf{c}^{\text{new}}bold_c start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT, we introduce an interpolation parameter α(0,1)𝛼01\alpha\in(0,1)italic_α ∈ ( 0 , 1 ) to generate the augmented attribute vector 𝐜*DCsuperscript𝐜superscriptsubscript𝐷𝐶\mathbf{c}^{*}\in\mathbb{R}^{D_{C}}bold_c start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_D start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT end_POSTSUPERSCRIPT (see Eq. 1). Given augmented attribute 𝐜*superscript𝐜\mathbf{c}^{*}bold_c start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT and original image 𝐱𝐱\mathbf{x}bold_x, we produce the image 𝐱*superscript𝐱*\mathbf{x}^{\text{*}}bold_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT by the generator 𝒢𝒢\mathcal{G}caligraphic_G through attribute-space interpolation.
给定一对属性 𝐜𝐜\mathbf{c}bold_c𝐜newsuperscript𝐜new\mathbf{c}^{\text{new}}bold_c start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT ,我们引入一个插值参数 α(0,1)𝛼01\alpha\in(0,1)italic_α ∈ ( 0 , 1 ) 来生成增强的属性向量 𝐜*DCsuperscript𝐜superscriptsubscript𝐷𝐶\mathbf{c}^{*}\in\mathbb{R}^{D_{C}}bold_c start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_D start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT end_POSTSUPERSCRIPT (见公式1)。给定增强属性 𝐜*superscript𝐜\mathbf{c}^{*}bold_c start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT 和原始图像 𝐱𝐱\mathbf{x}bold_x ,通过属性空间插值,我们通过生成器 𝒢𝒢\mathcal{G}caligraphic_G 生成图像 𝐱*superscript𝐱*\mathbf{x}^{\text{*}}bold_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT

𝐱*superscript𝐱*\displaystyle\mathbf{x}^{\text{*}}bold_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT =𝒢(𝐱,𝐜*)absent𝒢𝐱superscript𝐜*\displaystyle=\mathcal{G}(\mathbf{x},\mathbf{c}^{\text{*}})= caligraphic_G ( bold_x , bold_c start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT )
𝐜*superscript𝐜*\displaystyle\mathbf{c}^{\text{*}}bold_c start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT =α𝐜+(1α)𝐜new, where α[0,1]absent𝛼𝐜1𝛼superscript𝐜new, where α[0,1]\displaystyle=\alpha\cdot\mathbf{c}+(1-\alpha)\cdot\mathbf{c}^{\text{new}}\text{, where $\alpha\in[0,1]$}= italic_α ⋅ bold_c + ( 1 - italic_α ) ⋅ bold_c start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT , where italic_α ∈ [ 0 , 1 ] (1)

Feature-map interpolation.
特征图插值。

Alternatively, we propose to interpolate using the feature map produced by the generator 𝒢=𝒢dec𝒢enc𝒢subscript𝒢decsubscript𝒢enc\mathcal{G}=\mathcal{G}_{\text{dec}}\circ\mathcal{G}_{\text{enc}}caligraphic_G = caligraphic_G start_POSTSUBSCRIPT dec end_POSTSUBSCRIPT ∘ caligraphic_G start_POSTSUBSCRIPT enc end_POSTSUBSCRIPT. Here, 𝒢encsubscript𝒢enc\mathcal{G}_{\text{enc}}caligraphic_G start_POSTSUBSCRIPT enc end_POSTSUBSCRIPT is the encoder module that takes the image as input and outputs the feature map. Similarly, 𝒢decsubscript𝒢dec\mathcal{G}_{\text{dec}}caligraphic_G start_POSTSUBSCRIPT dec end_POSTSUBSCRIPT is the decoder module that takes the feature map as input and outputs the synthesized image. Let 𝐟*=𝒢enc(𝐱,𝐜)HF×WF×CFsuperscript𝐟subscript𝒢enc𝐱𝐜superscriptsubscript𝐻𝐹subscript𝑊𝐹subscript𝐶𝐹\mathbf{f^{*}}=\mathcal{G}_{\text{enc}}(\mathbf{x},\mathbf{c})\in\mathbb{R}^{H_{F}\times W_{F}\times C_{F}}bold_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = caligraphic_G start_POSTSUBSCRIPT enc end_POSTSUBSCRIPT ( bold_x , bold_c ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_H start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT × italic_W start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT × italic_C start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_POSTSUPERSCRIPT be the feature map of an intermediate layer in the generator, where HFsubscript𝐻𝐹H_{F}italic_H start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT, WFsubscript𝑊𝐹W_{F}italic_W start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT and CFsubscript𝐶𝐹C_{F}italic_C start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT indicate the height, width, and number of channels in the feature map.
另外,我们建议使用生成器生成的特征图进行插值。这里, 𝒢encsubscript𝒢enc\mathcal{G}_{\text{enc}}caligraphic_G start_POSTSUBSCRIPT enc end_POSTSUBSCRIPT 是将图像作为输入并输出特征图的编码器模块。类似地, 𝒢decsubscript𝒢dec\mathcal{G}_{\text{dec}}caligraphic_G start_POSTSUBSCRIPT dec end_POSTSUBSCRIPT 是将特征图作为输入并输出合成图像的解码器模块。设 𝐟*=𝒢enc(𝐱,𝐜)HF×WF×CFsuperscript𝐟subscript𝒢enc𝐱𝐜superscriptsubscript𝐻𝐹subscript𝑊𝐹subscript𝐶𝐹\mathbf{f^{*}}=\mathcal{G}_{\text{enc}}(\mathbf{x},\mathbf{c})\in\mathbb{R}^{H_{F}\times W_{F}\times C_{F}}bold_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = caligraphic_G start_POSTSUBSCRIPT enc end_POSTSUBSCRIPT ( bold_x , bold_c ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_H start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT × italic_W start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT × italic_C start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_POSTSUPERSCRIPT 为生成器中间层的特征图, HFsubscript𝐻𝐹H_{F}italic_H start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPTWFsubscript𝑊𝐹W_{F}italic_W start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPTCFsubscript𝐶𝐹C_{F}italic_C start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT 表示特征图的高度、宽度和通道数。

𝐱*superscript𝐱*\displaystyle\mathbf{x}^{\text{*}}bold_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT =𝒢dec(𝐟*)absentsubscript𝒢decsuperscript𝐟*\displaystyle=\mathcal{G}_{\text{dec}}(\mathbf{f}^{\text{*}})= caligraphic_G start_POSTSUBSCRIPT dec end_POSTSUBSCRIPT ( bold_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT )
𝐟*superscript𝐟*\displaystyle\mathbf{f}^{\text{*}}bold_f start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT =𝜷𝒢enc(𝐱,𝐜)+(𝟏𝜷)𝒢enc(𝐱,𝐜new)absentdirect-product𝜷subscript𝒢enc𝐱𝐜direct-product1𝜷subscript𝒢enc𝐱superscript𝐜new\displaystyle=\boldsymbol{\beta}\odot\mathcal{G}_{\text{enc}}(\mathbf{x},\mathbf{c})+(\mathbf{1}-\boldsymbol{\beta})\odot\mathcal{G}_{\text{enc}}(\mathbf{x},\mathbf{c}^{\text{new}})= bold_italic_β ⊙ caligraphic_G start_POSTSUBSCRIPT enc end_POSTSUBSCRIPT ( bold_x , bold_c ) + ( bold_1 - bold_italic_β ) ⊙ caligraphic_G start_POSTSUBSCRIPT enc end_POSTSUBSCRIPT ( bold_x , bold_c start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT ) (2)

Compared to the attribute-space interpolation which is parameterized by a scalar α𝛼\alphaitalic_α, we parameterize feature-map interpolation by a tensor 𝜷HF×WF×CF𝜷superscriptsubscript𝐻𝐹subscript𝑊𝐹subscript𝐶𝐹{\boldsymbol{\beta}}\in\mathbb{R}^{H_{F}\times W_{F}\times C_{F}}bold_italic_β ∈ blackboard_R start_POSTSUPERSCRIPT italic_H start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT × italic_W start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT × italic_C start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_POSTSUPERSCRIPT (βh,w,k[0,1]subscript𝛽𝑤𝑘01\beta_{h,w,k}\in[0,1]italic_β start_POSTSUBSCRIPT italic_h , italic_w , italic_k end_POSTSUBSCRIPT ∈ [ 0 , 1 ], where 1hHF1subscript𝐻𝐹1\leq h\leq H_{F}1 ≤ italic_h ≤ italic_H start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT, 1wWF1𝑤subscript𝑊𝐹1\leq w\leq W_{F}1 ≤ italic_w ≤ italic_W start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT, and 1kCF1𝑘subscript𝐶𝐹1\leq k\leq C_{F}1 ≤ italic_k ≤ italic_C start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT) with the same shape as the feature map. Compared to linear interpolation over attribute-space, such design introduces more flexibility for adversarial attacks. Empirical results in Section 4.2 show such design is critical to maintain both attack success and good perceptual quality at the same time.
与由标量 α𝛼\alphaitalic_α 参数化的属性空间插值相比,我们通过一个与特征图具有相同形状的张量 𝜷HF×WF×CF𝜷superscriptsubscript𝐻𝐹subscript𝑊𝐹subscript𝐶𝐹{\boldsymbol{\beta}}\in\mathbb{R}^{H_{F}\times W_{F}\times C_{F}}bold_italic_β ∈ blackboard_R start_POSTSUPERSCRIPT italic_H start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT × italic_W start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT × italic_C start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_POSTSUPERSCRIPTβh,w,k[0,1]subscript𝛽𝑤𝑘01\beta_{h,w,k}\in[0,1]italic_β start_POSTSUBSCRIPT italic_h , italic_w , italic_k end_POSTSUBSCRIPT ∈ [ 0 , 1 ] ,其中 1hHF1subscript𝐻𝐹1\leq h\leq H_{F}1 ≤ italic_h ≤ italic_H start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT1wWF1𝑤subscript𝑊𝐹1\leq w\leq W_{F}1 ≤ italic_w ≤ italic_W start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT1kCF1𝑘subscript𝐶𝐹1\leq k\leq C_{F}1 ≤ italic_k ≤ italic_C start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT )来参数化特征图插值。与属性空间的线性插值相比,这种设计为对抗攻击引入了更多的灵活性。第4.2节的实证结果表明,这种设计对于同时保持攻击成功和良好的感知质量至关重要。

3.3 Generating Semantically Meaningful Adversarial Examples
生成语义上有意义的对抗样本

Existing work obtains the adversarial image 𝐱advsuperscript𝐱adv\mathbf{x}^{\text{adv}}bold_x start_POSTSUPERSCRIPT adv end_POSTSUPERSCRIPT by adding perturbations or transforming the input image 𝐱𝐱\mathbf{x}bold_x directly. In contrast, our semantic attack method requires additional attribute-conditioned image generator 𝒢𝒢\mathcal{G}caligraphic_G during the adversarial image generation through interpolation. As we see in Eq. 3, the first term of our objective function is the adversarial metric, the second term is a smoothness constraint to guarantee the perceptual quality, and λ𝜆\lambdaitalic_λ is used to control the balance between the two terms. The adversarial metric is minimized once the model \mathcal{M}caligraphic_M has been successfully attacked towards the target image-label pair (𝐱tgt,𝐲tgt)superscript𝐱tgtsuperscript𝐲tgt(\mathbf{x}^{\text{tgt}},\mathbf{y}^{\text{tgt}})( bold_x start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT , bold_y start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT ). For identify verification, 𝐲tgtsuperscript𝐲tgt\mathbf{y}^{\text{tgt}}bold_y start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT is the identity representation of the target image; For structured prediction tasks in our paper, 𝐲tgtsuperscript𝐲tgt\mathbf{y}^{\text{tgt}}bold_y start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT either represents certain coordinates (landmark detection) or semantic label maps (semantic segmentation).
现有工作通过添加扰动或直接转换输入图像 𝐱𝐱\mathbf{x}bold_x 来获得敌对图像 𝐱advsuperscript𝐱adv\mathbf{x}^{\text{adv}}bold_x start_POSTSUPERSCRIPT adv end_POSTSUPERSCRIPT 。相比之下,我们的语义攻击方法在敌对图像生成过程中需要额外的属性条件图像生成器 𝒢𝒢\mathcal{G}caligraphic_G 进行插值。如我们在公式3中所见,我们的目标函数的第一项是对抗度量,第二项是平滑性约束以保证感知质量,而 λ𝜆\lambdaitalic_λ 用于控制这两个项之间的平衡。一旦模型 \mathcal{M}caligraphic_M 成功朝向目标图像标签对 (𝐱tgt,𝐲tgt)superscript𝐱tgtsuperscript𝐲tgt(\mathbf{x}^{\text{tgt}},\mathbf{y}^{\text{tgt}})( bold_x start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT , bold_y start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT ) 进行了攻击,对抗度量就会被最小化。对于识别验证, 𝐲tgtsuperscript𝐲tgt\mathbf{y}^{\text{tgt}}bold_y start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT 是目标图像的身份表示;对于我们论文中的结构化预测任务, 𝐲tgtsuperscript𝐲tgt\mathbf{y}^{\text{tgt}}bold_y start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT 表示某些坐标(地标检测)或语义标签映射(语义分割)。

𝐱advsuperscript𝐱adv\displaystyle\mathbf{x}^{\text{adv}}bold_x start_POSTSUPERSCRIPT adv end_POSTSUPERSCRIPT =argmin𝐱*(𝐱*)absentsubscriptargminsuperscript𝐱superscript𝐱\displaystyle={\operatorname*{argmin}}_{\mathbf{x}^{*}}\mathcal{L}(\mathbf{x}^{*})= roman_argmin start_POSTSUBSCRIPT bold_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT caligraphic_L ( bold_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT )
(𝐱*)superscript𝐱\displaystyle\mathcal{L}(\mathbf{x}^{*})caligraphic_L ( bold_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) =adv(𝐱*;,𝐲tgt)+λsmooth(𝐱*)absentsubscriptadvsuperscript𝐱superscript𝐲tgt𝜆subscriptsmoothsuperscript𝐱\displaystyle=\mathcal{L}_{\text{adv}}(\mathbf{x}^{*};\mathcal{M},\mathbf{y}^{\text{tgt}})+\lambda\cdot\mathcal{L}_{\text{smooth}}(\mathbf{x}^{*})= caligraphic_L start_POSTSUBSCRIPT adv end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ; caligraphic_M , bold_y start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT ) + italic_λ ⋅ caligraphic_L start_POSTSUBSCRIPT smooth end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) (3)

Identity verification. 身份验证。

In the identity verification task, two images are considered to be the same identity if the corresponding identity embeddings from the verification model \mathcal{M}caligraphic_M are reasonably close.
在身份验证任务中,如果验证模型中对应的身份嵌入特征相近,那么两个图像被认为是同一个身份。

adv(𝐱*;,𝐲tgt)=max{κ,Φid(𝐱*,𝐱tgt)}subscriptadvsuperscript𝐱superscript𝐲tgt𝜅superscriptsubscriptΦidsuperscript𝐱superscript𝐱tgt\mathcal{L}_{\text{adv}}(\mathbf{x}^{*};\mathcal{M},\mathbf{y}^{\text{tgt}})=\max\{\kappa,\Phi_{\mathcal{M}}^{\text{id}}(\mathbf{x}^{*},\mathbf{x}^{\text{tgt}})\}caligraphic_L start_POSTSUBSCRIPT adv end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ; caligraphic_M , bold_y start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT ) = roman_max { italic_κ , roman_Φ start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT id end_POSTSUPERSCRIPT ( bold_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT ) } (4)

As we see in Eq. 4, Φid(,)superscriptsubscriptΦid\Phi_{\mathcal{M}}^{\text{id}}(\cdot,\cdot)roman_Φ start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT id end_POSTSUPERSCRIPT ( ⋅ , ⋅ ) measures the distance between two identity embeddings from the model \mathcal{M}caligraphic_M, where the normalized L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT distance is used in our setting. In addition, we introduce the parameter κ𝜅\kappaitalic_κ representing the constant related to the false positive rate (FPR) threshold computed from the development set.
如方程式4所示, Φid(,)superscriptsubscriptΦid\Phi_{\mathcal{M}}^{\text{id}}(\cdot,\cdot)roman_Φ start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT id end_POSTSUPERSCRIPT ( ⋅ , ⋅ ) 测量了来自模型 \mathcal{M}caligraphic_M 的两个身份嵌入之间的距离,我们的设置中采用了归一化的 L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT 距离。此外,我们引入了参数 κ𝜅\kappaitalic_κ ,表示与从开发集计算得到的假阳性率(FPR)阈值相关的常数。

Structured prediction. 结构化预测。

For structured prediction tasks such as landmark detection and semantic segmentation, we use Houdini objective proposed in [13] as our adversarial metric and select the target landmark (semantic segmentation) target as 𝐲tgtsuperscript𝐲tgt\mathbf{y}^{\text{tgt}}bold_y start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT. As we see in the equation, Φ(,)subscriptΦ\Phi_{\mathcal{M}}(\cdot,\cdot)roman_Φ start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT ( ⋅ , ⋅ ) is a scoring function for each image-label pair and γ𝛾\gammaitalic_γ is the threshold. In addition, l(𝐲*,𝐲tgt)𝑙superscript𝐲superscript𝐲tgtl(\mathbf{y}^{*},\mathbf{y}^{\text{tgt}})italic_l ( bold_y start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , bold_y start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT ) is task loss decided by the specific adversarial target, where 𝐲*=(𝐱*)superscript𝐲superscript𝐱\mathbf{y}^{*}=\mathcal{M}(\mathbf{x}^{*})bold_y start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = caligraphic_M ( bold_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ).
对于结构化预测任务,如地标检测和语义分割,我们将 [13] 中提出的Houdini目标作为我们的对抗性度量,并选择目标地标(语义分割)目标为 𝐲tgtsuperscript𝐲tgt\mathbf{y}^{\text{tgt}}bold_y start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT

adv(𝐱*;,𝐲tgt)=subscriptadvsuperscript𝐱superscript𝐲tgtabsent\displaystyle\mathcal{L}_{\text{adv}}(\mathbf{x}^{*};\mathcal{M},\mathbf{y}^{\text{tgt}})=caligraphic_L start_POSTSUBSCRIPT adv end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ; caligraphic_M , bold_y start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT ) = Pγ𝒩(0,1)[Φ(𝐱*,𝐲*)Φ(𝐱*,𝐲tgt)<γ]l(𝐲*,𝐲tgt)subscript𝑃similar-to𝛾𝒩01delimited-[]subscriptΦsuperscript𝐱superscript𝐲subscriptΦsuperscript𝐱superscript𝐲tgt𝛾𝑙superscript𝐲superscript𝐲tgt\displaystyle P_{\gamma\sim\mathcal{N}(0,1)}\Big{[}\Phi_{\mathcal{M}}(\mathbf{x}^{*},\mathbf{y}^{*})-\Phi_{\mathcal{M}}(\mathbf{x}^{*},\mathbf{y}^{\text{tgt}})<\gamma\Big{]}\cdot l(\mathbf{y}^{*},\mathbf{y}^{\text{tgt}})italic_P start_POSTSUBSCRIPT italic_γ ∼ caligraphic_N ( 0 , 1 ) end_POSTSUBSCRIPT [ roman_Φ start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , bold_y start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) - roman_Φ start_POSTSUBSCRIPT caligraphic_M end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , bold_y start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT ) < italic_γ ] ⋅ italic_l ( bold_y start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , bold_y start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT ) (5)

Interpolation smoothness smoothsubscriptsmooth\mathcal{L}_{\text{smooth}}caligraphic_L start_POSTSUBSCRIPT smooth end_POSTSUBSCRIPT.
插值平滑程度。

As the tensor to be interpolated in the feature-map space has far more parameters compared to the attribute itself, we propose to enforce a smoothness constraint on the tensor α𝛼\alphaitalic_α used in feature-map interpolation. As we see in Eq. 6, the smoothness loss encourages the interpolation tensors to consist of piece-wise constant patches spatially, which has been widely used as a pixel-wise de-noising objective for natural image processing [43, 27].
由于特征图空间中需要插值的张量参数远远多于属性本身,我们建议对特征图插值中使用的张量 α𝛼\alphaitalic_α 施加平滑限制。从等式6中可以看到,平滑损失鼓励插值张量在空间上由分段常数块组成,这在自然图像处理中被广泛用作逐像素去噪目标。

smooth(𝜷)=h=1HF1w=1WF𝜷h+1,w𝜷h,w22+h=1HFw=1WF1𝜷h,w+1𝜷h,w22subscriptsmooth𝜷superscriptsubscript1subscript𝐻𝐹1superscriptsubscript𝑤1subscript𝑊𝐹subscriptsuperscriptnormsubscript𝜷1𝑤subscript𝜷𝑤22superscriptsubscript1subscript𝐻𝐹superscriptsubscript𝑤1subscript𝑊𝐹1subscriptsuperscriptnormsubscript𝜷𝑤1subscript𝜷𝑤22\mathcal{L}_{\text{smooth}}(\boldsymbol{\beta})=\sum_{h=1}^{H_{F}-1}\sum_{w=1}^{W_{F}}\|{\boldsymbol{\beta}}_{h+1,w}-{\boldsymbol{\beta}}_{h,w}\|^{2}_{2}+\sum_{h=1}^{H_{F}}\sum_{w=1}^{W_{F}-1}\|{\boldsymbol{\beta}}_{h,w+1}-{\boldsymbol{\beta}}_{h,w}\|^{2}_{2}caligraphic_L start_POSTSUBSCRIPT smooth end_POSTSUBSCRIPT ( bold_italic_β ) = ∑ start_POSTSUBSCRIPT italic_h = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_w = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_W start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∥ bold_italic_β start_POSTSUBSCRIPT italic_h + 1 , italic_w end_POSTSUBSCRIPT - bold_italic_β start_POSTSUBSCRIPT italic_h , italic_w end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_h = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_w = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_W start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT ∥ bold_italic_β start_POSTSUBSCRIPT italic_h , italic_w + 1 end_POSTSUBSCRIPT - bold_italic_β start_POSTSUBSCRIPT italic_h , italic_w end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (6)

4 Experiments 4实验

In the experimental section, we mainly focus on analyzing the proposed SemanticAdv in attacking state-of-the-art face recognition systems [63, 59, 80, 67] due to its wide applicability (e.g., identification for mobile payment) in the real world. We attack both face verification and face landmark detection by generating attribute-conditioned adversarial examples using annotations from CelebA dataset [40]. In addition, we extend our attack to urban street scenes with semantic label maps as the condition. We attack the semantic segmentation model DRN-D-22 [77] previously trained on Cityscape [14] by generating adversarial examples with dynamic objects manipulated (e.g., insert a car into the scene).
在实验部分,我们主要关注分析提出的SemanticAdv在攻击最先进的人脸识别系统[63, 59, 80, 67]中的应用,这是由于其在现实世界中具有广泛的适用性(例如,移动支付的识别)。我们通过使用CelebA数据集[40]的注释生成属性条件对抗样本,攻击人脸验证和人脸特征点检测。此外,我们将我们的攻击扩展到具有语义标签地图作为条件的城市街景。我们攻击先前在Cityscape[14]上训练的语义分割模型DRN-D-22[77],通过生成具有动态物体操纵的对抗样本(例如,在场景中插入一辆汽车)。

The experimental section is organized as follows. First, we analyze the quality of generated adversarial examples and qualitatively compare our method with psubscript𝑝\mathcal{L}_{p}caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT bounded pixel-wise optimization-based methods [10, 16, 73]. Second, we provide both qualitative and quantitative results by controlling single semantic attribute. In terms of attack transferability, we evaluate our proposed SemanticAdv in various settings and further demonstrate the effectiveness of our method via query-free black-box attacks against online face verification platforms. Third, we compare our method with the baseline methods against different defense methods on the face verification task. Fourth, we demonstrate that our SemanticAdv is a general framework by showing the results in other tasks including face landmark detection and street-view semantic segmentation.

4.1 Experimental Setup 4.1实验设置

Face identity verification.
人脸识别验证。

We select ResNet-50 and ResNet-101 [23] trained on MS-Celeb-1M [22, 15] as our face verification models. The models are trained using two different objectives, namely, softmax loss [63, 80] and cosine loss [67]. For simplicity, we use the notation “R-N-S” to indicate the model with N𝑁Nitalic_N-layer residual blocks as backbone trained using softmax loss, while “R-N-C” indicates the same backbone trained using cosine loss. The distance between face features is measured by normalized L2 distance. For R-101-S model, we decide the parameter κ𝜅\kappaitalic_κ based on the false positive rate (FPR) for the identity verification task. Four different FPRs have been used: 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT (with κ=1.24𝜅1.24\kappa=1.24italic_κ = 1.24), 3×1043superscript1043\times 10^{-4}3 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT (with κ=1.05𝜅1.05\kappa=1.05italic_κ = 1.05), 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT (with κ=0.60𝜅0.60\kappa=0.60italic_κ = 0.60), and <104absentsuperscript104<10^{-4}< 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT (with κ=0.30𝜅0.30\kappa=0.30italic_κ = 0.30). The distance metrics and selected thresholds are commonly used when evaluating the performance of face recognition model [35, 32]. Supplementary provides more details on the performance of face recognition models and their corresponding κ𝜅\kappaitalic_κ. To distinguish between the FPR we used in generating adversarial examples and the other FPR used in evaluation, we introduce two notations “Generation FPR (G-FPR)” and “Test FPR (T-FPR)”. For the experiment with query-free black-box API attacks, we use two online face verification services provided by Face++ [2] and AliYun [1].
我们选择在MS-Celeb-1M上训练的ResNet-50和ResNet-101作为我们的人脸验证模型。这些模型使用两种不同的目标进行训练,即softmax损失和余弦损失。为简单起见,我们使用符号“R-N-S”表示使用softmax损失训练的带有 N𝑁Nitalic_N -layer残差块作为主干的模型,而“R-N-C”表示使用余弦损失训练的相同主干。人脸特征之间的距离通过归一化的L2距离来衡量。对于R-101-S模型,我们根据身份验证任务的误识率(FPR)来确定参数 κ𝜅\kappaitalic_κ 。使用了四种不同的FPR: 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT (使用括号 κ=1.24𝜅1.24\kappa=1.24italic_κ = 1.24 )、 3×1043superscript1043\times 10^{-4}3 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT (使用括号 κ=1.05𝜅1.05\kappa=1.05italic_κ = 1.05 )、 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT (使用括号 κ=0.60𝜅0.60\kappa=0.60italic_κ = 0.60 )、和 <104absentsuperscript104<10^{-4}< 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT ( 使用括号 κ=0.30𝜅0.30\kappa=0.30italic_κ = 0.30 )。在评估人脸识别模型性能时,距离度量和选定的阈值是常用的。详情请参考附件中关于人脸识别模型性能及其相关的 κ𝜅\kappaitalic_κ 。 为了区分生成对抗样本中使用的FPR和评估中使用的另一个FPR,我们引入了两个符号“生成FPR(G-FPR)”和“测试FPR(T-FPR)”。在无需查询的黑盒API攻击实验中,我们使用由Face++ [2] 和阿里云 [1] 提供的两个在线人脸验证服务。

Semantic attacks on face images.
对人脸图像进行语义攻击。

In our experiments, we randomly sample 1,28012801,2801 , 280 distinct identities form CelebA [40] and use the StarGAN [12] for attribute-conditional image editing. In particular, we re-train our model on CelebA by aligning the face landmarks and then resizing images to resolution 112×112112112112\times 112112 × 112. We select 17 identity-preserving attributes as our analysis, as such attributes mainly reflect variations in facial expression and hair color.
在我们的实验中,我们从CelebA [40] 中随机抽取 1,28012801,2801 , 280 个不同的身份,并使用StarGAN [12] 进行属性条件的图像编辑。具体来说,我们通过将脸部地标进行对齐,然后将图像调整至分辨率 112×112112112112\times 112112 × 112 ,重新训练我们的模型。我们选择了17个保持身份的属性进行分析,因为这些属性主要反映了面部表情和发色的变化。

In feature-map interpolation, to reduce the reconstruction error brought by the generator (e.g., 𝐱𝒢(𝐱,𝐜)𝐱𝒢𝐱𝐜\mathbf{x}\neq\mathcal{G}(\mathbf{x},\mathbf{c})bold_x ≠ caligraphic_G ( bold_x , bold_c )) in practice, we take one more step to obtain the updated feature map 𝐟=𝒢enc(𝐱,𝐜)superscript𝐟subscript𝒢encsuperscript𝐱𝐜\mathbf{f}^{\prime}=\mathcal{G}_{\text{enc}}(\mathbf{x}^{\prime},\mathbf{c})bold_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = caligraphic_G start_POSTSUBSCRIPT enc end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_c ), where 𝐱=argmin𝐱𝒢(𝐱,𝐜)𝐱superscript𝐱subscriptargminsuperscript𝐱norm𝒢superscript𝐱𝐜𝐱\mathbf{x}^{\prime}=\operatorname*{argmin}_{\mathbf{x}^{\prime}}\|\mathcal{G}(\mathbf{x}^{\prime},\mathbf{c})-\mathbf{x}\|bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = roman_argmin start_POSTSUBSCRIPT bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ caligraphic_G ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_c ) - bold_x ∥.
在特征图插值中,为了减少生成器(例如, 𝐱𝒢(𝐱,𝐜)𝐱𝒢𝐱𝐜\mathbf{x}\neq\mathcal{G}(\mathbf{x},\mathbf{c})bold_x ≠ caligraphic_G ( bold_x , bold_c ) )带来的重建误差,在实践中,我们再走一步,得到更新后的特征图 𝐟=𝒢enc(𝐱,𝐜)superscript𝐟subscript𝒢encsuperscript𝐱𝐜\mathbf{f}^{\prime}=\mathcal{G}_{\text{enc}}(\mathbf{x}^{\prime},\mathbf{c})bold_f start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = caligraphic_G start_POSTSUBSCRIPT enc end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_c ) ,其中 𝐱=argmin𝐱𝒢(𝐱,𝐜)𝐱superscript𝐱subscriptargminsuperscript𝐱norm𝒢superscript𝐱𝐜𝐱\mathbf{x}^{\prime}=\operatorname*{argmin}_{\mathbf{x}^{\prime}}\|\mathcal{G}(\mathbf{x}^{\prime},\mathbf{c})-\mathbf{x}\|bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = roman_argmin start_POSTSUBSCRIPT bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ caligraphic_G ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_c ) - bold_x ∥

For each distinct identity pair (𝐱,𝐱tgt)𝐱superscript𝐱tgt(\mathbf{x},\mathbf{x}^{\text{tgt}})( bold_x , bold_x start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT ), we perform semanticAdv guided by each of the 17 attributes (e.g., we intentionally add or remove one specific attribute while keeping the rest unchanged). In total, for each image 𝐱𝐱\mathbf{x}bold_x, we generate 17 adversarial images with different augmented attributes. In the experiments, we select a commonly-used pixel-wise adversarial attack method [10] (referred as CW) as our baseline. Compared to our proposed method, CW does not require visual attributes as part of the system, as it only generates one adversarial example for each instance. We refer the corresponding attack success rate as the instance-wise success rate in which the attack success rate is calculated for each instance. For each instance with 17 adversarial images using different augmented attributes, if one of the 17 produced images can attack successfully, we count the attack of this instance as one success, vice verse.
对于每对不同的身份 (𝐱,𝐱tgt)𝐱superscript𝐱tgt(\mathbf{x},\mathbf{x}^{\text{tgt}})( bold_x , bold_x start_POSTSUPERSCRIPT tgt end_POSTSUPERSCRIPT ) ,我们进行语义增强,其中依据17个属性之一进行引导(例如,我们有意添加或移除一个特定属性,同时保持其余属性不变)。总共,对于每个图像 𝐱𝐱\mathbf{x}bold_x ,我们生成具有不同增强属性的17个对抗性图像。在实验中,我们选择了一种常用的基于像素的对抗攻击方法[10](简称为CW)作为基准线。与我们提出的方法相比,CW不需要将视觉属性作为系统的一部分,因为它仅为每个实例生成一个对抗性示例。我们将相应的攻击成功率称为实例级成功率,其中攻击成功率是为每个实例计算的。对于使用不同增强属性产生的17个对抗性图像的每个实例,如果其中一个生成的17个图像可以成功攻击,我们将该实例的攻击计为一个成功,反之亦然。

Face landmark detection.
面部特征点检测。

We select Face Alignment Network (FAN) [9] trained on 300W-LP [82] and fine-tuned on 300-W [58] for 2D landmark detection. The network is constructed by stacking Hour-Glass networks [47] with hierarchical block [8]. Given a face image as input, FAN outputs 2D heatmaps which can be subsequently leveraged to yield 68686868 2D landmarks.
我们选择在300W-LP上训练并在300-W上微调的人脸对齐网络(FAN)进行2D地标检测。该网络通过堆叠带有分层块的Hour-Glass网络构建而成。给定一张人脸图像作为输入,FAN输出2D热图,随后可以利用这些热图得到2D地标。

Semantic attacks on street-view images.
街景图像的语义攻击。

We select DRN-D-22 [77] as our semantic segmentation model and fine-tune the model on image regions with resolution 256×256256256256\times 256256 × 256. To synthesize semantic adversarial perturbations, we consider semantic label maps as the input attribute and leverage a generative image manipulation model [24] pre-trained on CityScape [14] dataset. Given an input semantic label map at resolution 256×256256256256\times 256256 × 256, we select a target object instance (e.g., a pedestrian) to attack. Then, we create a manipulated semantic label map by inserting another object instance (e.g., a car) in the vicinity of the target object. Similar to the experiments in the face domain, for both semantic label maps, we use the image manipulation encoder to extract features (with 1,02410241,0241 , 024 channels at spatial resolution 16×16161616\times 1616 × 16) and conduct feature-space interpolation. We synthesize the final image by feeding the interpolated features to the image manipulation decoder. By searching the interpolation coefficient that maximizes the attack rate, we are able to fool the segmentation model with the synthesized final image.
我们选择DRN-D-22 [77]作为我们的语义分割模型,并在分辨率 256×256256256256\times 256256 × 256 的图像区域上微调模型。为了合成语义对抗扰动,我们将语义标签映射视为输入属性,并利用在CityScape [14]数据集上预先训练的生成式图像处理模型[24]。给定分辨率为 256×256256256256\times 256256 × 256 的输入语义标签映射,我们选择一个目标对象实例(例如,一个行人)进行攻击。然后,在目标对象的附近插入另一个对象实例(例如,一辆汽车)来创建一个操纵过的语义标签映射。类似于面部领域的实验,对于两个语义标签映射,我们使用图像处理编码器提取特征(在空间分辨率为 16×16161616\times 1616 × 161,02410241,0241 , 024 通道)并进行特征空间插值。通过寻找最大化攻击率的插值系数,我们能够通过将插值后的特征馈送给图像处理解码器来愚弄合成的最终图像,从而欺骗分割模型。

4.2 SemanticAdv on Face Identity Verification
人脸识别的语义增强

Refer to caption
Figure 2: Qualitative comparisons between attribute-space and feature-space interpolation In our visualization, we set the interpolation parameter to be 0.0,0.2,0.4,0.6,0.8,1.00.00.20.40.60.81.00.0,0.2,0.4,0.6,0.8,1.00.0 , 0.2 , 0.4 , 0.6 , 0.8 , 1.0
图2:属性空间与特征空间插值之间的定性比较在我们的可视化中,我们将插值参数设置为 0.0,0.2,0.4,0.6,0.8,1.00.00.20.40.60.81.00.0,0.2,0.4,0.6,0.8,1.00.0 , 0.2 , 0.4 , 0.6 , 0.8 , 1.0
Table 1: Attack success rate by selecting attribute or different layer’s feature-map for interpolation on R-101-S(%) using G-FPR=T-FPR=103G-FPRT-FPRsuperscript103\text{G-FPR}=\text{T-FPR}=10^{-3}G-FPR = T-FPR = 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT. Here, 𝐟isubscript𝐟𝑖\mathbf{f}_{i}bold_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT indicates the feature-map after i𝑖iitalic_i-th up-sampling operation. 𝐟2subscript𝐟2\mathbf{f}_{-2}bold_f start_POSTSUBSCRIPT - 2 end_POSTSUBSCRIPT and 𝐟1subscript𝐟1\mathbf{f}_{-1}bold_f start_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT are the first and the second feature-maps after the last down-sampling operation, respectively.
表1:通过选择属性或不同层特征图进行插值,在R-101-S上的攻击成功率(%)。这里, 𝐟isubscript𝐟𝑖\mathbf{f}_{i}bold_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT 表示第 i𝑖iitalic_i 次上采样操作后的特征图。 𝐟2subscript𝐟2\mathbf{f}_{-2}bold_f start_POSTSUBSCRIPT - 2 end_POSTSUBSCRIPT𝐟1subscript𝐟1\mathbf{f}_{-1}bold_f start_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT 分别是最后一次下采样操作后的第一和第二个特征图。
Interpolation / Attack Success (%)
插值/攻击成功率 (%)
Feature 特征 Attribute 属性
𝐟2subscript𝐟2\mathbf{f}_{-2}bold_f start_POSTSUBSCRIPT - 2 end_POSTSUBSCRIPT 𝐟1subscript𝐟1\mathbf{f}_{-1}bold_f start_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT 𝐟0subscript𝐟0\mathbf{f}_{0}bold_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT 𝐟1subscript𝐟1\mathbf{f}_{1}bold_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT 𝐟2subscript𝐟2\mathbf{f}_{2}bold_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
𝐱advsuperscript𝐱adv\mathbf{x}^{\text{adv}}bold_x start_POSTSUPERSCRIPT adv end_POSTSUPERSCRIPT, G-FPR = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT 99.38 100.00 100.00 100.00 99.69 0.08
𝐱advsuperscript𝐱adv\mathbf{x}^{\text{adv}}bold_x start_POSTSUPERSCRIPT adv end_POSTSUPERSCRIPT, G-FPR = 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT
𝐱advsuperscript𝐱adv\mathbf{x}^{\text{adv}}bold_x start_POSTSUPERSCRIPT adv end_POSTSUPERSCRIPT ,G-FPR = 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT
59.53 98.44 99.45 97.58 73.52 0.00

Attribute-space vs. feature-space interpolation.
属性空间与特征空间的插值。

First, we qualitatively compare the two interpolation methods and found that both attribute-space and feature-space interpolation can generate reasonably looking samples (see Figure 2) through interpolation (these are not adversarial examples). However, we found the two interpolation methods perform differently when we optimize using the adversarial objective (Eq. 3). We measure the attack success rate of attribute-space interpolation (with G-FPR = T-FPR = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT): 0.08%percent0.080.08\%0.08 % on R-101-S, 0.31%percent0.310.31\%0.31 % on R-101-C, and 0.16%percent0.160.16\%0.16 % on both R-50-S and R-50-C, which consistently fails to attack the face verification model. Compared to attribute-space interpolation, generating adversarial examples with feature-space interpolation produces much better quantitative results (see Table 1). We conjecture that this is because the high dimensional feature space can provide more manipulation freedom. This also explains one potential reason of poor samples (e.g., blurry with many noticeable artifacts) generated by the method proposed in [29]. We select 𝐟0subscript𝐟0\mathbf{f}_{0}bold_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, the last conv layer before up-sampling layer in the generator for feature-space interpolation due to its good performance.
首先,我们在定性上比较了两种插值方法,并发现属性空间和特征空间插值都可以通过插值生成外观合理的样本(参见图2)(这些并不是对抗样本)。然而,当我们使用对抗目标进行优化时(公式3),我们发现这两种插值方法表现不同。我们测量了属性空间插值的攻击成功率(带有G-FPR = T-FPR = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT ): 0.08%percent0.080.08\%0.08 % 在R-101-S上, 0.31%percent0.310.31\%0.31 % 在R-101-C上, 0.16%percent0.160.16\%0.16 % 在R-50-S和R-50-C上,这些插值方法始终无法攻击人脸验证模型。与属性空间插值相比,使用特征空间插值生成对抗样本可以产生更好的定量结果(见表1)。我们推测这是因为高维特征空间可以提供更多的操纵自由度。这也解释了[29]中提出的方法生成的样本质量差(例如,模糊且有许多明显的伪像)的一个潜在原因。我们选择在生成器的上采样层之前最后一个卷积层 𝐟0subscript𝐟0\mathbf{f}_{0}bold_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT 进行特征空间插值,因为它的性能很好。

Refer to caption
Figure 3: Top: Qualitative comparisons between our proposed SemanticAdv and pixel-wise adversarial examples generated by CW [10]. Along with the adversarial examples, we also provide the corresponding perturbations (residual) on the right. Perturbations generated by our SemanticAdv (G-FPR = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT) are unrestricted with semantically meaningful patterns. Bottom: Qualitative analysis on single-attribute adversarial attack (G-FPR = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT). More results are shown in the supplementary
图3: 顶部:我们提出的SemanticAdv与由CW [10]生成的逐像素对抗性样本之间的定性比较。 除了对抗性示例外,我们还在右侧提供相应的扰动(残差)。 我们的SemanticAdv生成的扰动(G-FPR = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT )没有受到语义上意义的模式的限制。 底部:关于单属性对抗攻击的定性分析(G-FPR = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT )。 更多结果请参见补充说明。

Qualitative analysis. 定性分析。

Figure 3 (top) shows the generated adversarial images and corresponding perturbations against R-101-S of SemanticAdv and CW respectively. The text below each figure is the name of an augmented attribute, the sign before the name represents “adding” (in red) or “removing” (in blue) the corresponding attribute from the original image. Figure 3 (bottom) shows the adversarial examples with 17 augmented semantic attributes, respectively. The attribute names are shown in the bottom. The first row contains images generated by 𝒢(𝐱,𝐜new)𝒢𝐱superscript𝐜new\mathcal{G}(\mathbf{x},\mathbf{c}^{\text{new}})caligraphic_G ( bold_x , bold_c start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT ) with an augmented attribute 𝐜newsuperscript𝐜new\mathbf{c}^{\text{new}}bold_c start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT and the second row includes the corresponding adversarial images under feature-space interpolation. It shows that our SemanticAdv can generate examples with reasonably-looking appearance guided by the corresponding attribute. In particular, SemanticAdv is able to generate perturbations on the corresponding regions correlated with the augmented attribute, while the perturbations of CW have no specific pattern and are evenly distributed across the image.
图3(上)显示了SemanticAdv和CW针对R-101-S生成的对抗图像及相应的扰动。每张图像下方的文本是一个扩增属性的名称,名称前的符号代表从原始图像中“添加”(红色)或“移除”(蓝色)相应属性。图3(下)分别展示了带有17个扩增语义属性的对抗示例。属性名称显示在下方。第一排包含由 𝒢(𝐱,𝐜new)𝒢𝐱superscript𝐜new\mathcal{G}(\mathbf{x},\mathbf{c}^{\text{new}})caligraphic_G ( bold_x , bold_c start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT ) 生成的带有扩增属性 𝐜newsuperscript𝐜new\mathbf{c}^{\text{new}}bold_c start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT 的图像,第二排包含相应的在特征空间插值下的对抗图像。结果显示,我们的SemanticAdv能够根据相应属性生成外观合理的示例。特别是,SemanticAdv能够在与扩增属性相关的区域上生成扰动,而CW的扰动没有特定模式且均匀分布在图像中。

To further measure the perceptual quality of the adversarial images generated by SemanticAdv in the most strict settings (G-FPR <104absentsuperscript104<10^{-4}< 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT), we conduct a user study using Amazon Mechanical Turk (AMT). In total, we collect 2,62026202,6202 , 620 annotations from 77777777 participants. In 39.14±1.96%plus-or-minus39.14percent1.9639.14\pm 1.96\%39.14 ± 1.96 % (close to random guess 50%percent5050\%50 %) of trials, the adversarial images generated by our SemanticAdv are selected as reasonably-looking images, while 30.27±1.96%plus-or-minus30.27percent1.9630.27\pm 1.96\%30.27 ± 1.96 % of trials by CW are selected as reasonably-looking. It indicates that SemanticAdv can generate more perceptually plausible adversarial examples compared with CW under the most strict setting (G-FPR <104absentsuperscript104<10^{-4}< 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT). The corresponding images are shown in supplementary materials.
为了进一步衡量SemanticAdv生成的对抗性图像在最严格设置中的感知质量(G-FPR<b0>),我们使用亚马逊机械土耳其(AMT)进行了用户研究。总共我们收集到 2,62026202,6202 , 620 个参与者的 77777777 注释。在 39.14±1.96%plus-or-minus39.14percent1.9639.14\pm 1.96\%39.14 ± 1.96 % (接近随机猜测 50%percent5050\%50 % )次试验中,我们SemanticAdv生成的对抗性图像被选为合理的图像,而有 30.27±1.96%plus-or-minus30.27percent1.9630.27\pm 1.96\%30.27 ± 1.96 % 的试验中由CW选为合理。这表明,相对于CW,在最严格的设置下(G-FPR<b6>),SemanticAdv可以生成更具感知上可信度的对抗性例子。 相应的图像显示在附加资料中。

Refer to caption
Figure 4: Quantitative analysis on the attack success rate with different single-attribute attacks. In each figure, we show the results correspond to a larger FPR (G-FPR = T-FPR = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT) in skyblue and the results correspond to a smaller FPR (G-FPR = T-FPR = 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT) in blue, respectively
图4:对不同单属性攻击的攻击成功率进行定量分析。在每个图中,我们分别展示了对应于较大FPR(G-FPR = T-FPR = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT )的结果(天蓝色)和对应于较小FPR(G-FPR = T-FPR = 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT )的结果(蓝色)。

Single attribute analysis.
单属性分析。

One of the key advantages of our SemanticAdv is that we can generate adversarial perturbations in a more controllable fashion guided by the selected semantic attribute. This allows analyzing the robustness of a recognition system against different types of semantic attacks. We group the adversarial examples by augmented attributes in various settings. In Figure 4, we present the attack success rate against two face verification models, namely, R-101-S and R-101-C, using different attributes. We highlight the bar with light blue for G-FPR = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT and blue for G-FPR = 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT, respectively. As shown in Figure 4, with a larger T-FPR = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT, our SemanticAdv can achieve almost 100% attack success rate across different attributes. With a smaller T-FPR = 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT, we observe that SemanticAdv guided by some attributes such as Mouth Slightly Open and Arched Eyebrows achieve less than 50% attack success rate, while other attributes such as Pale Skin and Eyeglasses are relatively less affected. In summary, the above experiments indicate that SemanticAdv guided by attributes describing the local shape (e.g., mouth, earrings) achieve a relatively lower attack success rate compared to attributes relevant to the color (e.g., hair color) or entire face region (e.g., skin). This suggests that the face verification models used in our experiments are more robustly trained in terms of detecting local shapes compared to colors. In practice, we have the flexibility to select attributes for attacking an image based on the perceptual quality and attack success rate.
我们的SemanticAdv的一个关键优势是,我们可以更加可控地生成对抗性扰动,通过选择的语义属性进行引导。这样可以分析识别系统对不同类型语义攻击的抗性。我们通过不同设置中的增强属性对对抗性示例进行分组。在图4中,我们展示了针对两个人脸验证模型R-101-S和R-101-C,使用不同属性的攻击成功率。我们分别用浅蓝色代表G-FPR = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT ,蓝色代表G-FPR = 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT 。如图4所示,当T-FPR = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT 较大时,我们的SemanticAdv在不同属性上几乎可以达到100%的攻击成功率。当T-FPR = 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT 较小时,我们观察到,由一些属性(如微微张开的嘴和拱形眉毛)引导的SemanticAdv的攻击成功率不到50%,而其他属性(如苍白皮肤和眼镜)受影响相对较小。总之,以上实验表明,由描述局部形状的属性引导的SemanticAdv,例如在攻击成功率方面,与颜色相关的属性(如头发颜色)或整个面部区域(如皮肤)相比,口、耳环等属性实现了相对较低的成功率。这表明我们实验中使用的人脸验证模型在检测局部形状方面的训练比检测颜色更为稳健。在实践中,我们有灵活性根据感知质量和攻击成功率选择攻击图像的属性。

Transferability analysis.
可转让性分析。

To generate adversarial examples under black-box setting, we analyze the transferability of SemanticAdv in various settings. For each model with different FPRs, we select the successfully attacked adversarial examples from Section 4.1 to construct our evaluation dataset and evaluate these adversarial samples across different models. Table 2(a) illustrates the transferability of SemanticAdv among different models by using the same FPRs (G-FPR = T-FPR = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT). Table 2(b) illustrates the result with different FPRs for generation and evaluation (G-FPR = 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT and T-FPR = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT). As shown in Table 2(a), adversarial examples generated against models trained with softmax loss exhibit certain transferability compared to models trained with cosine loss. We conduct the same experiment by generating adversarial examples with CW and found it has weaker transferability compared to our SemanticAdv (results in brackets of Table 2).
在黑盒设置下生成对抗样本,我们分析了SemanticAdv在不同设置下的可传移性。对于具有不同FPR的每个模型,我们选择从第4.1节成功攻击的对抗样本来构建我们的评估数据集,并在不同模型之间评估这些对抗样本。表2(a)通过使用相同的FPRs(G-FPR = T-FPR = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT )展示了SemanticAdv在不同模型之间的可传移性。表2(b)展示了生成和评估时不同FPR的结果(G-FPR = 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT ,T-FPR = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT )。如表2(a)所示,针对使用softmax损失训练的模型生成的对抗样本相对于使用余弦损失训练的模型表现出一定的可传移性。我们通过使用CW生成的对抗样本进行相同实验,发现它相对于我们的SemanticAdv具有较弱的可传移性(表2中括号内的结果)。

As Table 2(b) illustrates, the adversarial examples generated against the model with smaller G-FPR =104absentsuperscript104=10^{-4}= 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT exhibit strong attack success rate when evaluating the model with larger T-FPR =103absentsuperscript103=10^{-3}= 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT. Especially, we found the adversarial examples generated against R-101-S have the best attack performance on other models. These findings motivate the analysis of the query-free black-box API attack detailed in the following paragraph.
如表2(b)所示,针对具有较小G-FPR =104absentsuperscript104=10^{-4}= 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT 的模型生成的对抗样本,在评估具有更大T-FPR =103absentsuperscript103=10^{-3}= 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT 的模型时表现出强大的攻击成功率。特别是我们发现针对R-101-S生成的对抗样本在其他模型上具有最佳的攻击性能。这些发现促使我们对下一段详细介绍的无查询黑盒API攻击进行分析。

Table 2: Transferability of SemanticAdv: cell (i,j)𝑖𝑗(i,j)( italic_i , italic_j ) shows attack success rate of adversarial examples generated against j𝑗jitalic_j-th model and evaluate on i𝑖iitalic_i-th model. Results of CW are listed in brackets. Left: Results generated with G-FPR = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT and T-FPR = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT; Right: Results generated with G-FPR = 104superscript104{10}^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT and T-FPR = 103superscript103{10}^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT
表2:SemanticAdv的可转移性: (i,j)𝑖𝑗(i,j)( italic_i , italic_j ) 单元显示针对第 j𝑗jitalic_j 个模型生成的对抗样本的攻击成功率,并在第 i𝑖iitalic_i 个模型上评估。CW的结果列在括号中。左侧:使用G-FPR = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT 和T-FPR = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT 生成的结果;右侧:使用G-FPR = 104superscript104{10}^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT 和T-FPR = 103superscript103{10}^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT 生成的结果。
testsubscripttest\mathcal{M}_{\text{test}}caligraphic_M start_POSTSUBSCRIPT test end_POSTSUBSCRIPT /// optsubscriptopt\mathcal{M}_{\text{opt}}caligraphic_M start_POSTSUBSCRIPT opt end_POSTSUBSCRIPT R-50-S R-101-S R-50-C R-101-C R-50-S 1.000 (1.000) 0.108 (0.032) 0.023 (0.007) 0.018 (0.005) R-101-S 0.169 (0.029) 1.000 (1.000) 0.030 (0.009) 0.032 (0.011) R-50-C 0.166 (0.054) 0.202 (0.079) 1.000 (1.000) 0.048 (0.020) R-101-C 0.120 (0.034) 0.236 (0.080) 0.040 (0.017) 1.000 (1.000) (a)
testsubscripttest\mathcal{M}_{\text{test}}caligraphic_M start_POSTSUBSCRIPT test end_POSTSUBSCRIPT /// optsubscriptopt\mathcal{M}_{\text{opt}}caligraphic_M start_POSTSUBSCRIPT opt end_POSTSUBSCRIPT R-50-S R-101-S R-50-S 1.000 (1.000) 0.862 (0.530) R-101-S 0.874 (0.422) 1.000 (1.000) R-50-C 0.693 (0.347) 0.837 (0.579) R-101-C 0.617 (0.218) 0.888 (0.617) (b)

Query-free black-box API attack.
无查询黑盒API攻击。

In this experiment, we generate adversarial examples against R-101-S with G-FPR =103(κ=1.24)absentsuperscript103𝜅1.24=10^{-3}(\kappa=1.24)= 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT ( italic_κ = 1.24 ), G-FPR =104(κ=0.60)absentsuperscript104𝜅0.60=10^{-4}(\kappa=0.60)= 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT ( italic_κ = 0.60 ), and G-FPR <104(κ=0.30)absentsuperscript104𝜅0.30<10^{-4}(\kappa=0.30)< 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT ( italic_κ = 0.30 ), respectively. We evaluate our algorithm on two industry level face verification APIs, namely, Face++ and AliYun. Since attack transferability has never been explored in concurrent work that generates semantic adversarial examples, we use psubscript𝑝\mathcal{L}_{p}caligraphic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT bounded pixel-wise methods (CW [10], MI-FGSM[16], M-DI22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT-FGSM[73]) as our baselines. We also introduce a much strong baseline by first performing attribute-conditioned image editing and running CW attack on the editted images, which we refer as and StarGAN+CW. Compared to CW, the latter two devise certain techniques to improve their transferability. We adopt the ensemble version of MI-FGSM[16] following the original paper. As shown in Table 3, our proposed SemanticAdv achieves a much higher attack success rate than the baselines in both APIs under all FPR thresholds (e.g., our adversarial examples generated with G-FPR <104absentsuperscript104<10^{-4}< 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT achieves 67.69%percent67.6967.69\%67.69 % attack success rate on Face++ platform with T-FPR =103absentsuperscript103=10^{-3}= 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT). In addition, we found that lower G-FPR can achieve higher attack success rate on both APIs within the same T-FPR (see our supplementary material for more details).
在这个实验中,我们分别针对R-101-S生成了针对G-FPR =103(κ=1.24)absentsuperscript103𝜅1.24=10^{-3}(\kappa=1.24)= 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT ( italic_κ = 1.24 ) 、G-FPR =104(κ=0.60)absentsuperscript104𝜅0.60=10^{-4}(\kappa=0.60)= 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT ( italic_κ = 0.60 ) 和G-FPR <104(κ=0.30)absentsuperscript104𝜅0.30<10^{-4}(\kappa=0.30)< 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT ( italic_κ = 0.30 ) 的对抗样本。我们在两个行业级人脸验证API,即Face++和AliYun上评估我们的算法。由于攻击可转移性在生成语义对抗样本的同时从未被探讨,我们使用了基于像素级的受限方法(CW[10]、MI-FGSM[16]、M-DI 22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT -FGSM[73])作为我们的基线。我们还引入了一个更强大的基线,首先进行属性条件图像编辑,然后对编辑后的图像运行CW攻击,我们称之为StarGAN+CW。与CW相比,后两者设计了一些技术来提高它们的可转移性。我们采用了MI-FGSM[16]的集成版本,遵循原始论文。如表3所示,我们提出的SemanticAdv在两个API中的所有FPR阈值下都比基线实现了更高的攻击成功率(例如,我们生成的带有G-FPR <104absentsuperscript104<10^{-4}< 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT 的对抗样本在Face++平台上实现了 67.69%percent67.6967.69\%67.69 % 的攻击成功率,具有T-FPR =103absentsuperscript103=10^{-3}= 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT )。 另外,我们发现在相同的T-FPR下,较低的G-FPR可以实现更高的攻击成功率(详细信息请参阅我们的补充材料)。

Table 3: Quantitative analysis on query-free black-box attack. We use ResNet-101 optimized with softmax loss for evaluation and report the attack success rate(%) on two online face verification platforms. Note that for PGD-based attacks, we adopt MI-FGSM (ϵ=8italic-ϵ8\epsilon=8italic_ϵ = 8) in [16] and M-DI22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT-FGSM (ϵ=8italic-ϵ8\epsilon=8italic_ϵ = 8) in [73], respectively. For CW, StarGAN+CW and SemanticAdv, we generate adversarial samples with G-FPR <104absentsuperscript104<10^{-4}< 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT
表3:对无查询黑盒攻击的定量分析。我们使用经过softmax损失优化的ResNet-101进行评估,并报告了两个在线人脸验证平台上的攻击成功率(%)。请注意,对于基于PGD的攻击,我们分别采用[16]中的MI-FGSM( ϵ=8italic-ϵ8\epsilon=8italic_ϵ = 8 )和[73]中的M-DI 22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT -FGSM( ϵ=8italic-ϵ8\epsilon=8italic_ϵ = 8 )。对于CW、StarGAN+CW和SemanticAdv,我们使用G-FPR生成对抗样本 <104absentsuperscript104<10^{-4}< 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT
API name API名称 Face++ 人脸识别技术 AliYun 阿里云
Attacker /// Metric
攻击者度量
T-FPR = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT T-FPR = 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT T-FPR = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT T-FPR = 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT
CW [10] CW [10] 37.24 20.41 18.00 9.50
StarGAN+CW 星空生成对抗网络 + 条件 Wasserstein 损失 47.45 26.02 20.00 8.50
MI-FGSM [16] MI-FGSM [16] 53.89 30.57 29.50 17.50
M-DI22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT-FGSM [73]
M-DI -FGSM [ 73]
56.12 33.67 30.00 18.00
SemanticAdv 语义广告 67.69 48.21 36.50 19.50
Refer to caption
Figure 5: Quantitative analysis on attacking several defense methods including JPEG [17], Blurring [38], and Feature Squeezing [74]
图5:针对包括JPEG [17]、模糊[38]和特征挤压[74]等多种防御方法的攻击的定量分析

SemanticAdv against defense methods.
语义防御对抗方法。

We evaluate the strength of the proposed attack by testing against five existing defense methods, namely, Feature squeezing [74], Blurring [38], JPEG [17], AMI [66] and adversarial training [42].
我们通过对五种现有防御方法进行测试来评估所提出攻击的强度,即特征挤压[74]、模糊[38]、JPEG[17]、AMI[66]和对抗训练[42]。

Figure 5 illustrates SemanticAdv is more robust against the pixel-wise defense methods comparing with CW. The same G-FPR and T-FPR are used for evaluation. Both SemanticAdv and CW achieve a high attack success rate when T-FPR = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT, while SemanticAdv marginally outperforms CW when T-FPR goes down to 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT. While defense methods have proven to be effective against CW attacks on classifiers trained with ImageNet [36], our results indicate that these methods are still vulnerable in the face verification system with small G-FPR.
图5显示SemanticAdv相较于CW在像素级防御方法上更具鲁棒性。评估时使用相同的G-FPR和T-FPR。当T-FPR = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT 时,SemanticAdv和CW均取得较高的攻击成功率,而当T-FPR降至 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT 时,SemanticAdv略优于CW。虽然防御方法已被证明对使用ImageNet训练的分类器的CW攻击有效,但我们的结果表明,这些方法在具有小G-FPR的人脸验证系统中仍然容易受到攻击。

We further evaluate SemanticAdv on attribute-based defense method AMI [66] by constructing adversarial examples for the pretrained VGG-Face [53] in a black-box manner. From adversarial examples generated by R-101-S, we use fc7 as the embedding and select the images with normalized L2 distance (to the corresponding benign images) beyond the threshold defined previously. With the benign and adversarial examples, we first extract attribute witnesses with our aligned face images and then leverage them to build a attribute-steered model. When misclassifying 10%percent1010\%10 % benign inputs into adversarial images, it only correctly identifies 8%percent88\%8 % adversarial images from SemanticAdv and 12%percent1212\%12 % from CW.
我们进一步通过以黑盒方式构建对预训练的VGG-Face的对抗性示例来评估基于属性的防御方法AMI[66]上的SemanticAdv。从R-101-S生成的对抗性示例中,我们使用fc7作为嵌入,并选择与之前定义的阈值之外的带有归一化L2距离的图像。利用良性和对抗性示例,我们首先从我们对齐的人脸图像中提取属性证人,然后利用它们构建一个属性导向的模型。当将 10%percent1010\%10 % 良性输入误分类为对抗性图像时,它仅能正确识别 8%percent88\%8 % 从SemanticAdv和 12%percent1212\%12 % 从CW中的对抗性图像。

Moreover, we evaluate SemanticAdv on existing adversarial training based defense (the detailed setting is presented in supplementary materials). We find that accuracy of adversarial training based defense method is 10% against the adversarial examples generated by SemanticAdv, while is 46.7% against the adversarial examples generated by PGD [42]. It indicates that existing adversarial training based defense method is less effective against SemanticAdv, which further demonstrates that our SemanticAdv identifies an unexplored research area beyond previous Lpsubscript𝐿𝑝L_{p}italic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-based ones.
此外,我们对现有基于对抗训练的防御方法SemanticAdv进行评估(详细设置见补充材料)。我们发现,对抗训练的防御方法在面对由SemanticAdv生成的对抗样本时准确率仅为10%,而在面对由PGD [42]生成的对抗样本时为46.7%。这表明现有基于对抗训练的防御方法对于SemanticAdv的应对效果较差,进一步证明了我们的SemanticAdv在之前基于 Lpsubscript𝐿𝑝L_{p}italic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT 的研究领域之外发现了一个未被探索的研究领域。

4.3 SemanticAdv on Face Landmark Detection
人脸关键点检测上的语义增强

We evaluate the effectiveness of SemanticAdv on face landmark detection under two attack tasks, namely, “Rotating Eyes” and “Out of Region”. For the “Rotating Eyes” task, we rotate the coordinates of the eyes in the image counter-clockwise by 90°°\degree°. For the “Out of Region” task, we set a target bounding box and attempt to push all points out of the box. Figure 6 indicates that our method is applicable to attack landmark detection models.
我们评估了SemanticAdv在面部特征点检测中对两个攻击任务的有效性,即“旋转眼睛”和“区域外”。对于“旋转眼睛”任务,我们将图像中眼睛的坐标逆时针旋转90度。对于“区域外”任务,我们设置一个目标边界框,并尝试将所有点推出边界框之外。图6表明我们的方法适用于攻击特征点检测模型。

Refer to caption
Figure 6: Qualitative results on attacking face landmark detection model
图6:攻击面部关键点检测模型的定性结果

4.4 SemanticAdv on Street-view Semantic Segmentation
在街景语义分割上的语义增强

Refer to caption
Figure 7: Qualitative results on attacking street-view semantic segmentation model
图7:对攻击街景语义分割模型的定性结果

We further demonstrate the applicability of our SemanticAdv beyond the face domain by generating adversarial perturbations on street-view images. Figure 7 illustrates the adversarial examples on semantic segmentation. In the first example, we select the leftmost pedestrian as the target object instance and insert another car into the scene to attack it. The segmentation model has been successfully attacked to neglect the pedestrian (see last column), while it does exist in the scene (see second-to-last column). In the second example, we insert an adversarial car in the scene by SemanticAdv and the cyclist has been recognized as a pedestrian by the segmentation model.
我们通过在街景图像上生成对抗扰动进一步展示了我们的SemanticAdv在人脸领域之外的适用性。图7展示了语义分割中的对抗示例。在第一个示例中,我们选择最左侧的行人作为目标对象实例,并在场景中插入另一辆汽车进行攻击。分割模型已成功攻击以忽略行人(请参见最后一列),而行人确实存在于场景中(请参见倒数第二列)。在第二个示例中,我们通过SemanticAdv在场景中插入一辆对抗性汽车,分割模型将骑自行车的人识别为行人。

5 Conclusions 5结论

Overall we presented a novel attack method SemanticAdv, which is capable of generating semantically meaningful adversarial perturbations guided by single semantic attribute. Compared to existing methods, SemanticAdv works in a more controllable fashion. Experimental evaluations on face verification and landmark detection demonstrate several unique properties including attack transferability. We believe this work would open up new research opportunities and challenges in the field of adversarial learning. For instance, how to leverage semantic information to defend against such attacks will lead to potential new discussions.
我们提出了一种新颖的攻击方法SemanticAdv,能够生成由单个语义属性引导的有意义的对抗扰动。与现有方法相比,SemanticAdv的工作方式更具可控性。在人脸验证和地标检测上的实验评估展示了攻击可转移性等几个独特属性。我们相信这项工作将在对抗性学习领域开启新的研究机会和挑战。例如,如何利用语义信息来抵御这种攻击将引发潜在的新讨论。

Acknowledgement This work was supported in part by the National Science Foundation under Grant CNS-1422211, CNS-1616575, IIS-1617767, DARPA under Grant 00009970, and Google PhD Fellowship to X. Yan.
致谢 这项工作得到了部分由国家科学基金会资助,资助号为CNS-1422211,CNS-1616575,IIS-1617767,DARPA资助号为00009970,以及Google博士奖学金颁发给了X. Yan。

References 参考文献

  • [1] Alibaba Cloud Computing Co. Ltd. https://help.aliyun.com/knowledge_detail/53535.html
    阿里巴巴云计算有限公司 https://help.aliyun.com/knowledge_detail/53535.html
  • [2] Megvii Technology Co. Ltd. https://console.faceplusplus.com/documents/5679308
    旷视科技有限公司 https://console.faceplusplus.com/documents/5679308
  • [3] Bau, D., Zhu, J.Y., Strobelt, H., Zhou, B., Tenenbaum, J.B., Freeman, W.T., Torralba, A.: Gan dissection: Visualizing and understanding generative adversarial networks. arXiv preprint arXiv:1811.10597 (2018)
    Bau, D., Zhu, J.Y., Strobelt, H., Zhou, B., Tenenbaum, J.B., Freeman, W.T., Torralba, A.: GAN解剖:可视化和理解生成对抗网络。arXiv预印本 arXiv:1811.10597(2018)
  • [4] Bengio, Y., Mesnil, G., Dauphin, Y., Rifai, S.: Better mixing via deep representations. In: ICML (2013)
    Bengio, Y., Mesnil, G., Dauphin, Y., Rifai, S.: 通过深度表征实现更好的混合。在:ICML(2013)
  • [5] Bhattad, A., Chong, M.J., Liang, K., Li, B., Forsyth, D.: Unrestricted adversarial examples via semantic manipulation. In: International Conference on Learning Representations (2020)
    Bhattad, A., Chong, M.J., Liang, K., Li, B., Forsyth, D.: 通过语义操作实现无限制对抗样本. 在:国际学习表示会议(2020)
  • [6] Brock, A., Donahue, J., Simonyan, K.: Large scale gan training for high fidelity natural image synthesis. In: ICLR (2019)
  • [7] Brown, T.B., Carlini, N., Zhang, C., Olsson, C., Christiano, P., Goodfellow, I.: Unrestricted adversarial examples. arXiv preprint arXiv:1809.08352 (2018)
  • [8] Bulat, A., Tzimiropoulos, G.: Binarized convolutional landmark localizers for human pose estimation and face alignment with limited resources. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 3706–3714 (2017)
  • [9] Bulat, A., Tzimiropoulos, G.: How far are we from solving the 2d & 3d face alignment problem?(and a dataset of 230,000 3d facial landmarks). In: ICCV (2017)
  • [10] Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (S&P). IEEE (2017)
  • [11] Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence 40(4), 834–848 (2017)
  • [12] Choi, Y., Choi, M., Kim, M., Ha, J.W., Kim, S., Choo, J.: Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In: CVPR (2018)
  • [13] Cisse, M., Adi, Y., Neverova, N., Keshet, J.: Houdini: Fooling deep structured prediction models. In: NIPS (2017)
  • [14] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 3213–3223 (2016)
  • [15] Deng, J., Guo, J., Xue, N., Zafeiriou, S.: Arcface: Additive angular margin loss for deep face recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 4690–4699 (2019)
  • [16] Dong, Y., Liao, F., Pang, T., Su, H., Zhu, J., Hu, X., Li, J.: Boosting adversarial attacks with momentum. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 9185–9193 (2018)
  • [17] Dziugaite, G.K., Ghahramani, Z., Roy, D.M.: A study of the effect of jpg compression on adversarial images. arXiv preprint arXiv:1608.00853 (2016)
  • [18] Engstrom, L., Tran, B., Tsipras, D., Schmidt, L., Madry, A.: A rotation and a translation suffice: Fooling cnns with simple transformations. arXiv preprint arXiv:1712.02779 (2017)
  • [19] Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: CVPR. IEEE (2009)
  • [20] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: NIPS (2014)
  • [21] Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: ICLR (2014)
  • [22] Guo, Y., Zhang, L., Hu, Y., He, X., Gao, J.: Ms-celeb-1m: A dataset and benchmark for large-scale face recognition. In: ECCV. Springer (2016)
  • [23] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
  • [24] Hong, S., Yan, X., Huang, T.S., Lee, H.: Learning hierarchical semantic image manipulation through structured representations. In: NeurIPS (2018)
  • [25] Huang, G.B., Mattar, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: A database forstudying face recognition in unconstrained environments. In: Workshop on faces in’Real-Life’Images: detection, alignment, and recognition (2008)
  • [26] Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR. pp. 1125–1134 (2017)
  • [27] Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: ECCV. Springer (2016)
  • [28] Johnson, J., Gupta, A., Fei-Fei, L.: Image generation from scene graphs. In: CVPR. pp. 1219–1228 (2018)
  • [29] Joshi, A., Mukherjee, A., Sarkar, S., Hegde, C.: Semantic adversarial attacks: Parametric transformations that fool deep classifiers. arXiv preprint arXiv:1904.08489 (2019)
  • [30] Kang, D., Sun, Y., Hendrycks, D., Brown, T., Steinhardt, J.: Testing robustness against unforeseen adversaries. arXiv preprint arXiv:1908.08016 (2019)
  • [31] Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. In: ICLR (2018)
  • [32] Kemelmacher-Shlizerman, I., Seitz, S.M., Miller, D., Brossard, E.: The megaface benchmark: 1 million faces for recognition at scale. In: CVPR. pp. 4873–4882 (2016)
  • [33] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: ICLR (2015)
  • [34] Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: ICLR (2014)
  • [35] Klare, B.F., Klein, B., Taborsky, E., Blanton, A., Cheney, J., Allen, K., Grother, P., Mah, A., Jain, A.K.: Pushing the frontiers of unconstrained face detection and recognition: Iarpa janus benchmark a. In: CVPR (2015)
  • [36] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)
  • [37] Kumar, N., Berg, A.C., Belhumeur, P.N., Nayar, S.K.: Attribute and simile classifiers for face verification. In: ICCV. IEEE (2009)
  • [38] Li, X., Li, F.: Adversarial examples detection in deep networks with convolutional filter statistics. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 5764–5772 (2017)
  • [39] Liu, M.Y., Breuel, T., Kautz, J.: Unsupervised image-to-image translation networks. In: NIPS (2017)
  • [40] Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: ICCV (2015)
  • [41] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 3431–3440 (2015)
  • [42] Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. In: ICLR (2018)
  • [43] Mahendran, A., Vedaldi, A.: Understanding deep image representations by inverting them. In: CVPR (2015)
  • [44] Mansimov, E., Parisotto, E., Ba, J.L., Salakhutdinov, R.: Generating images from captions with attention. In: ICLR (2015)
  • [45] Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: Deepfool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 2574–2582 (2016)
  • [46] Moschoglou, S., Papaioannou, A., Sagonas, C., Deng, J., Kotsia, I., Zafeiriou, S.: Agedb: the first manually collected, in-the-wild age database. In: CVPR Workshops. pp. 51–59 (2017)
  • [47] Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: European Conference on Computer Vision. pp. 483–499. Springer (2016)
  • [48] Odena, A., Olah, C., Shlens, J.: Conditional image synthesis with auxiliary classifier gans. In: ICML. JMLR (2017)
  • [49] Van den Oord, A., Kalchbrenner, N., Espeholt, L., Vinyals, O., Graves, A., et al.: Conditional image generation with pixelcnn decoders. In: NIPS (2016)
  • [50] Oord, A.v.d., Kalchbrenner, N., Kavukcuoglu, K.: Pixel recurrent neural networks. In: ICML (2016)
  • [51] Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z.B., Swami, A.: The limitations of deep learning in adversarial settings. In: Security and Privacy (EuroS&P), 2016 IEEE European Symposium on (2016)
  • [52] Parikh, D., Grauman, K.: Relative attributes. In: ICCV. IEEE (2011)
  • [53] Parkhi, O.M., Vedaldi, A., Zisserman, A., et al.: Deep face recognition. In: bmvc. vol. 1, p. 6 (2015)
  • [54] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in PyTorch. In: NIPS Autodiff Workshop (2017)
  • [55] Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: ICLR (2015)
  • [56] Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: ICML (2016)
  • [57] Reed, S., Sohn, K., Zhang, Y., Lee, H.: Learning to disentangle factors of variation with manifold interaction. In: ICML (2014)
  • [58] Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: 300 faces in-the-wild challenge: The first facial landmark localization challenge. In: ICCV Workshop (2013)
  • [59] Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: A unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 815–823 (2015)
  • [60] Sengupta, S., Chen, J.C., Castillo, C., Patel, V.M., Chellappa, R., Jacobs, D.W.: Frontal to profile face verification in the wild. In: WACV. pp. 1–9. IEEE (2016)
  • [61] Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  • [62] Song, Y., Shu, R., Kushman, N., Ermon, S.: Constructing unrestricted adversarial examples with generative models. In: Advances in Neural Information Processing Systems. pp. 8312–8323 (2018)
  • [63] Sun, Y., Wang, X., Tang, X.: Deep learning face representation from predicting 10,000 classes. In: CVPR (2014)
  • [64] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A., et al.: Going deeper with convolutions. In: CVPR (2015)
  • [65] Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013)
  • [66] Tao, G., Ma, S., Liu, Y., Zhang, X.: Attacks meet interpretability: Attribute-steered detection of adversarial samples. In: NeurIPS (2018)
  • [67] Wang, H., Wang, Y., Zhou, Z., Ji, X., Gong, D., Zhou, J., Li, Z., Liu, W.: Cosface: Large margin cosine loss for deep face recognition. In: CVPR (2018)
  • [68] Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional gans. In: CVPR (2018)
  • [69] Wong, E., Schmidt, F.R., Kolter, J.Z.: Wasserstein adversarial examples via projected sinkhorn iterations. ICML (2019)
  • [70] Xiao, C., Deng, R., Li, B., Yu, F., Liu, M., Song, D.: Characterizing adversarial examples based on spatial consistency information for semantic segmentation. In: ECCV (2018)
  • [71] Xiao, C., Li, B., Zhu, J.Y., He, W., Liu, M., Song, D.: Generating adversarial examples with adversarial networks. In: IJCAI (2018)
  • [72] Xiao, C., Zhu, J.Y., Li, B., He, W., Liu, M., Song, D.: Spatially transformed adversarial examples. In: ICLR (2018)
  • [73] Xie, C., Zhang, Z., Zhou, Y., Bai, S., Wang, J., Ren, Z., Yuille, A.L.: Improving transferability of adversarial examples with input diversity. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 2730–2739 (2019)
  • [74] Xu, W., Evans, D., Qi, Y.: Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155 (2017)
  • [75] Yan, X., Yang, J., Sohn, K., Lee, H.: Attribute2image: Conditional image generation from visual attributes. In: ECCV. Springer (2016)
  • [76] Yao, S., Hsu, T.M., Zhu, J.Y., Wu, J., Torralba, A., Freeman, B., Tenenbaum, J.: 3d-aware scene manipulation via inverse graphics. In: Advances in neural information processing systems. pp. 1887–1898 (2018)
  • [77] Yu, F., Koltun, V., Funkhouser, T.: Dilated residual networks. In: Computer Vision and Pattern Recognition (CVPR) (2017)
  • [78] Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., Metaxas, D.N.: Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In: ICCV (2017)
  • [79] Zhang, M., Zhang, Y., Zhang, L., Liu, C., Khurshid, S.: Deeproad: Gan-based metamorphic testing and input validation framework for autonomous driving systems. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. pp. 132–142 (2018)
  • [80] Zhang, X., Yang, L., Yan, J., Lin, D.: Accelerated training for massive classification via dynamic class selection. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
  • [81] Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV (2017)
  • [82] Zhu, X., Lei, Z., Liu, X., Shi, H., Li, S.Z.: Face alignment across large poses: A 3d solution. In: CVPR (2016)

Appendix 0.A Implementation details

In this section, we provide implementation details used in our experiments. We implement our SemanticAdv using PyTorch [54]. Our implementation will be available after the final decision.

0.A.1 Face identity verification

We use Adam optimizer [33] to generate adversarial examples for both our SemanticAdv and the pixel-wise attack method CW [10]. More specifically, we run optimization for up to 200200200200 steps with a fixed updating rate 0.050.050.050.05 under G-FPR <104absentsuperscript104<{10}^{-4}< 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT. Under cases with a slightly higher G-FPR, we run the optimization for up to 500500500500 steps with a fixed updating rate 0.010.010.010.01. For the pixel-wise attack method CW, we use additional pixel reconstruction objective with the weight set to 5555. Specifically, we run optimization for up to 1,00010001,0001 , 000 steps with a fixed updating rate 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT.

Evaluation metrics.

To evaluate the performance of SemanticAdv under different attributes, we consider three metrics as follows:

  • Best: the attack is successful as long as one single attribute among 17 can be successfully attacked;

  • Average: we calculate the average attack success rate among 17 attributes for the same face identity;

  • Worst: the attack is successful only when all of 17 attributes can be successfully attacked;

Please note that we use the Best metric as a fair comparison to the attack success rate reported by existing pixel-wise attack methods, while SemanticAdv can be generated with different attributes as one of our advantages. In practice, both our SemanticAdv (Best) and CW achieve 100% attack success rate. In addition, we report the performance using the average and worst metric, which enables us to analyze the adversarial robustness towards certain semantic attributes.

Pixel-wise defense methods.

Feature squeezing [74] is a simple but effective method by reducing color bit depth to remove the adversarial effects. We compress the image represented by 8 bits for each channel to 4 bits for each channel to evaluate the effectiveness. For Blurring [38], we use a 3×3333\times 33 × 3 Gaussian kernel with standard deviation 1 to smooth the adversarial perturbations. For JPEG [17], it leverages the compression and decompression to remove the adversarial perturbation. We set the compression ratio as 0.750.750.750.75 in our experiment.

0.A.2 Face landmark detection

We use Adam optimizer [33] to generate SemanticAdv against the face landmark detection model. Specifically, we run optimization for up to 2,00020002,0002 , 000 steps with a fixed updating rate 0.050.050.050.05 with the balancing factor λ𝜆\lambdaitalic_λ set to 0.010.010.010.01 (see Eq. 3 in the main paper).

Evaluation Metrics.

We apply different metrics for two adversarial attack tasks, respectively. For “Rotating Eyes” task, we use a widely adopted metric Normalized Mean Error (NME) [9] for experimental evaluation.

rNME=1Nk=1N𝐩k𝐩^k2WB*HB,subscript𝑟NME1𝑁superscriptsubscript𝑘1𝑁subscriptnormsubscript𝐩𝑘subscript^𝐩𝑘2subscript𝑊𝐵subscript𝐻𝐵r_{\text{NME}}=\dfrac{1}{N}\sum_{k=1}^{N}\dfrac{||\mathbf{p}_{k}-\mathbf{\hat{p}}_{k}||_{2}}{\sqrt{W_{B}*H_{B}}},italic_r start_POSTSUBSCRIPT NME end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT divide start_ARG | | bold_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - over^ start_ARG bold_p end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG italic_W start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT * italic_H start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT end_ARG end_ARG , (7)

where 𝐩ksubscript𝐩𝑘\mathbf{p}_{k}bold_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT denotes the k𝑘kitalic_k-th ground-truth landmark, 𝐩^𝐤subscript^𝐩𝐤\mathbf{\hat{p}_{k}}over^ start_ARG bold_p end_ARG start_POSTSUBSCRIPT bold_k end_POSTSUBSCRIPT denotes the k𝑘kitalic_k-th predicted landmark and WB*HBsubscript𝑊𝐵subscript𝐻𝐵\sqrt{W_{B}*H_{B}}square-root start_ARG italic_W start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT * italic_H start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT end_ARG is the square-root area of ground-truth bounding box, where WBsubscript𝑊𝐵W_{B}italic_W start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT and HBsubscript𝐻𝐵H_{B}italic_H start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT represents the width and height of the box.

For “Out of Region” task, we consider the attack is successful if the landmark predictions fall outside a pre-defined centering region on the portrait image. We introduce a metric that reflects the portion of landmarks outside of the pre-defined centering region: rOR=NoutNtotalsubscript𝑟ORsubscript𝑁outsubscript𝑁totalr_{\text{OR}}=\frac{N_{\text{out}}}{N_{\text{total}}}italic_r start_POSTSUBSCRIPT OR end_POSTSUBSCRIPT = divide start_ARG italic_N start_POSTSUBSCRIPT out end_POSTSUBSCRIPT end_ARG start_ARG italic_N start_POSTSUBSCRIPT total end_POSTSUBSCRIPT end_ARG, where Noutsubscript𝑁outN_{\text{out}}italic_N start_POSTSUBSCRIPT out end_POSTSUBSCRIPT denotes the number of predicted landmarks outside the pre-defined bounding box and Ntotalsubscript𝑁totalN_{\text{total}}italic_N start_POSTSUBSCRIPT total end_POSTSUBSCRIPT denotes the total number of landmarks.

0.A.3 Ablation study: feature-space interpolation

We include an ablation study on feature-space interpolation by analyzing attack success rates using different feature-maps in the main paper. We illustrate the choices of StarGAN feature-maps used in Figure 8. Table 1 in the main paper shows the attack success rate on R-101-S. As shown in Figure 8, we use 𝐟isubscript𝐟𝑖\mathbf{f}_{i}bold_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to represent the feature-map after i𝑖iitalic_i-th up-sampling operation. 𝐟0subscript𝐟0\mathbf{f}_{0}bold_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT denotes the feature-map before applying up-sampling operation. The result demonstrates that samples generated by interpolating on 𝐟0subscript𝐟0\mathbf{f}_{0}bold_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT achieve the highest success rate. Since 𝐟0subscript𝐟0\mathbf{f}_{0}bold_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is the feature-map before decoder, it still well embeds semantic information in the feature space. We adopt 𝐟0subscript𝐟0\mathbf{f}_{0}bold_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT for interpolation in our experiments.

Refer to caption
Figure 8: The illustration of the features we used in StarGAN encoder-decoder architecture.

Appendix 0.B Additional quantitative results

0.B.1 Face identity verification

Benchmark performance.

We provide additional information about the ResNet models used in the experiments. Table 4 illustrates the performance on multiple face identity verification benchmarks including Labeled Face in the Wild (LFW) dataset [25], AgeDB-30 dataset [46], and Celebrities in Frontal-Profile (CFP) dataset [60]. LFW [25] is the de facto standard testing set for face verification under unconstrained conditions, which contains 13,2331323313,23313 , 233 face images from 5,74957495,7495 , 749 identities. AgeDB [46] contains 12,2401224012,24012 , 240 images from 440440440440 identities. AgeDB-30 is the most challenging subsets for evaluating face verification models. The large variations in age makes the face model perform worse on this dataset than on LFW. CFP [60] consists of 500500500500 identities, where each identity has 10 frontal and 4 profile images. Although good performance has been achieved on the Frontal-to-Frontal (CFP-FF) test protocol, the Frontal-to-Profile (CFP-FP) test protocol still remains challenging as most of the face training sets have very few profile face images. Table 4 indicates that the used face verification model achieves state-of-the-art under all benchmarks.

Table 4: The verification accuracy (%) of ResNet models on multiple face recognition datasets including LFW, AgeDB-30, and CFP.
\mathcal{M}caligraphic_M /// benchmarks LFW AgeDB-30 CFP-FF CFP-FP
R-50-S 99.27 94.15 99.26 91.49
R-101-S 99.42 95.93 99.57 95.07
R-50-C 99.38 95.08 99.24 90.24
R-101-C 99.67 95.58 99.57 92.71

Thresholds for identity verification.

To decide whether two portrait images belong to the same identity or not, we use the normalized L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT distance between face features and set the FPR thresholds accordingly, which is a commonly used procedure when evaluating the face verification model [35, 32]. Table 5 illustrates the threshold values used in our experiments when determining whether two portrait images belong to the same identity or not.

Table 5: The threshold values for face identity verification.
FPR/\mathcal{M}caligraphic_M R-50-S R-101-S R-50-C R-101-C
103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT 1.181 1.244 1.447 1.469
3×1043superscript1043\times 10^{-4}3 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT 1.058 1.048 1.293 1.242
104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT 0.657 0.597 0.864 0.809

Quantitative analysis.

Combining the results from Table 6 and Figure 4 in the main paper, we understand that the face verification models used in our experiments have different levels of robustness across attributes. For example, face verification models are more robust against local shape variations than color variations, e.g., pale skin has higher attack success rate than mouth open. We believe these discoveries will help the community further understand the properties of face verification models.

Table 6 shows the overall performance (accuracy) of face verification model and attack success rate of SemanticAdv and CW. As shown in Table 6, although the face model trained with cos objective achieves higher face recognition performance, it is more vulnerable to adversarial attack compared with the model trained with softmax objective. Table 7 shows that the intermediate results of SemanticAdv before adversarial perturbation cannot attack successfully, which indicates the success of SemanticAdv comes from adding adversarial perturbations through interpolation.

Table 6: Quantitative results of identity verification (%). It shows accuracy of face verification model and attack success rate of SemanticAdv and CW.
G-FPR Metrics /// \mathcal{M}caligraphic_M R-50-S R-101-S R-50-C R-101-C
103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT Verification Accuracy 98.36 98.78 98.63 98.84
SemanticAdv (Best) 100.00 100.00 100.00 100.00
SemanticAdv (Worst) 91.95 93.98 99.53 99.77
SemanticAdv (Average) 98.98 99.29 99.97 99.99
CW 100.00 100.00 100.00 100.00
3×1043superscript1043\times 10^{-4}3 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT Verification Accuracy 97.73 97.97 97.91 97.85
SemanticAdv (Best) 100.00 100.00 100.00 100.00
SemanticAdv (Worst) 83.75 79.06 98.98 96.64
SemanticAdv (Average) 97.72 97.35 99.92 99.72
CW 100.00 100.00 100.00 100.00
104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT Verification Accuracy 93.25 92.80 93.43 92.98
SemanticAdv (Best) 100.00 100.00 100.00 100.00
SemanticAdv (Worst) 33.59 19.84 67.03 48.67
SemanticAdv (Average) 83.53 76.64 95.57 91.13
CW 100.00 100.00 100.00 100.00
Table 7: Attack success rate of the intermediate output of SemanticAdv (%). 𝐱superscript𝐱\mathbf{x^{\prime}}bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, G(𝐱,𝐜)𝐺superscript𝐱𝐜G(\mathbf{x^{\prime}},\mathbf{c})italic_G ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_c ) and G(𝐱,𝐜new)𝐺superscript𝐱superscript𝐜newG(\mathbf{x^{\prime}},\mathbf{c}^{\text{new}})italic_G ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_c start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT ) are the intermediate results of our method before adversarial perturbation.
G-FPR Metrics /// \mathcal{M}caligraphic_M R-50-S R-101-S R-50-C R-101-C
103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT 𝐱superscript𝐱\mathbf{x^{\prime}}bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT 0.00 0.00 0.08 0.00
G(𝐱,𝐜)𝐺superscript𝐱𝐜G(\mathbf{x^{\prime}},\mathbf{c})italic_G ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_c ) 0.00 0.00 0.00 0.23
G(𝐱,𝐜new)𝐺superscript𝐱superscript𝐜newG(\mathbf{x^{\prime}},\mathbf{c}^{\text{new}})italic_G ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_c start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT )(Best) 0.16 0.08 0.16 0.31
3×1043superscript1043\times 10^{-4}3 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT 𝐱superscript𝐱\mathbf{x^{\prime}}bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT 0.00 0.00 0.00 0.00
G(𝐱,𝐜)𝐺superscript𝐱𝐜G(\mathbf{x^{\prime}},\mathbf{c})italic_G ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_c ) 0.00 0.00 0.00 0.00
G(𝐱,𝐜new)𝐺superscript𝐱superscript𝐜newG(\mathbf{x^{\prime}},\mathbf{c}^{\text{new}})italic_G ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_c start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT )(Best) 0.00 0.00 0.00 0.00
104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT 𝐱superscript𝐱\mathbf{x^{\prime}}bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT 0.00 0.00 0.00 0.00
G(𝐱,𝐜)𝐺superscript𝐱𝐜G(\mathbf{x^{\prime}},\mathbf{c})italic_G ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_c ) 0.00 0.00 0.00 0.00
G(𝐱,𝐜new)𝐺superscript𝐱superscript𝐜newG(\mathbf{x^{\prime}},\mathbf{c}^{\text{new}})italic_G ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , bold_c start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT )(Best) 0.00 0.00 0.00 0.00

0.B.2 Face landmark detection

Table 8: Quantitative results on face landmark detection (%) The two row shows the measured ratios (lower is better) for “Rotating Eyes” and “Out of Region” task, respectively.
Tasks (Metrics) Pristine Augmented Attributes
Blond
Hair
Young Eyeglasses
Rosy
Cheeks
Smiling
Arched
Eyebrows
Bangs
Pale
Skin
rNMEsubscript𝑟NMEr_{\text{NME}}italic_r start_POSTSUBSCRIPT NME end_POSTSUBSCRIPT \downarrow 28.04 14.03 17.28 8.58 13.24 19.21 23.42 15.99 10.72
rORsubscript𝑟ORr_{\text{OR}}italic_r start_POSTSUBSCRIPT OR end_POSTSUBSCRIPT \downarrow 45.98 17.42 23.04 7.51 16.65 25.44 33.85 20.03 13.51

We present the quantitative results of SemanticAdv on face landmark detection model in Table 8 including two adversarial tasks, namely, “Rotating Eyes” and “Out of Region”. We observe that our method is efficient to perform attacking on landmark detection models. For certain attributes such as “Eyeglasses” and “Pale Skin”, SemanticAdv achieves reasonably-good performance.

0.B.3 User study

We conduct a user study on the adversarial images of SemanticAdv and CW used in the experiment of API-attack and the original images. The adversarial images are generated with G-FPR<104absentsuperscript104<10^{-4}< 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT for both methods. We present a pair of original image and adversarial image to participants and ask them to rank the two options. The order of these two images is randomized and the images are displayed for 2 seconds in the screen during each trial. After the images disappear, the participants have unlimited time to select the more reasonably-looking image according to their perception. To maintain the high quality of the collected responses, each participant can only conduct at most 50 trials, while and each adversarial image was shown to 5 different participants. We present the images we used for user study in Figure 9. In total, we collect 2,62026202,6202 , 620 annotations from 77 participants. In 39.14±1.96%plus-or-minus39.14percent1.9639.14\pm 1.96\%39.14 ± 1.96 % of trials the adversarial images generated by SemanticAdv are selected as reasonably-looking images and in 30.27±1.96%plus-or-minus30.27percent1.9630.27\pm 1.96\%30.27 ± 1.96 % of trails, the adversarial images generated by CW are selected as reasonably-looking images. It indicates that our semantic adversarial examples are more perceptual reasonably-looking than CW. Additionally, we also conduct the user study with larger G-FPR=103absentsuperscript103=10^{-3}= 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT. In 45.42±1.96%plus-or-minus45.42percent1.9645.42\pm 1.96\%45.42 ± 1.96 % of trials, the adversarial images generated by SemanticAdv are selected as reasonably-looking images, which is very close to the random guess (50%percent5050\%50 %).

Refer to caption
Figure 9: Qualitative comparisons among ground truth, pixel-wise adversarial examples generated by CW, and our proposed SemanticAdv. Here, we present the results from G-FPR <104absentsuperscript104<10^{-4}< 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT so that perturbations are visible.

0.B.4 Semantic attack transferability

In Table 9, we present the quantitative results of the attack transferability under the setting with G-FPR = 104superscript104{10}^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT and T-FPR = 104superscript104{10}^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT. We observe that with more strict testing criterion (lower T-FPR) of the verification model, the transferability becomes lower across different models.

Table 9: Transferability of SemanticAdv: cell (i,j)𝑖𝑗(i,j)( italic_i , italic_j ) shows attack success rate of adversarial examples generated against j𝑗jitalic_j-th model and evaluate on i𝑖iitalic_i-th model. Results are generated with G-FPR = 104superscript104{10}^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT and T-FPR = 104superscript104{10}^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT.
testsubscripttest\mathcal{M}_{\text{test}}caligraphic_M start_POSTSUBSCRIPT test end_POSTSUBSCRIPT /// optsubscriptopt\mathcal{M}_{\text{opt}}caligraphic_M start_POSTSUBSCRIPT opt end_POSTSUBSCRIPT R-50-S R-101-S R-50-C R-101-C
R-50-S 1.000 0.005 0.000 0.000
R-101-S 0.000 1.000 0.000 0.000
R-50-C 0.000 0.000 1.000 0.000
R-101-C 0.000 0.000 0.000 1.000
Table 10: Transferability of StarGAN+CW: cell (i,j)𝑖𝑗(i,j)( italic_i , italic_j ) shows attack success rate of adversarial examples generated against j𝑗jitalic_j-th model and evaluate on i𝑖iitalic_i-th model. Results of SemanticAdv are listed in brackets.
testsubscripttest\mathcal{M}_{\text{test}}caligraphic_M start_POSTSUBSCRIPT test end_POSTSUBSCRIPT /// optsubscriptopt\mathcal{M}_{\text{opt}}caligraphic_M start_POSTSUBSCRIPT opt end_POSTSUBSCRIPT R-101-S
R-50-S 0.035 (0.108)
R-101-S 1.000 (1.000)
R-50-C 0.145 (0.202)
R-101-C 0.085 (0.236)
(a) G-FPR=103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT, T-FPR=103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT
testsubscripttest\mathcal{M}_{\text{test}}caligraphic_M start_POSTSUBSCRIPT test end_POSTSUBSCRIPT /// optsubscriptopt\mathcal{M}_{\text{opt}}caligraphic_M start_POSTSUBSCRIPT opt end_POSTSUBSCRIPT R-101-S
R-50-S 0.615 (0.862)
R-101-S 1.000 (1.000)
R-50-C 0.570 (0.837)
R-101-C 0.695 (0.888)
(b) G-FPR=104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT, T-FPR=103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT

To further showcase that our SemanticAdv is non-trivially different from pixel-wise attack added on top of semantic image editing, we provide one additional baseline called StarGAN+CW and evaluate its attack transferability. This baseline first performs semantic image editing using the StarGAN model (non-adversarial) and then conducts the standard Lpsubscript𝐿𝑝L_{p}italic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT CW attacks on the generated images. As shown in Table 10, the StarGAN+CW baseline has noticeable performance gap to our proposed SemanticAdv. This also justifies that our SemanticAdv is able to produce novel adversarial examples which cannot be simply achieved by combining attribute-conditioned image editing model with Lpsubscript𝐿𝑝L_{p}italic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT bounded perturbation.

0.B.5 Query-free black-box API attack

Table 11: Quantitative analysis on query-free black-box attack. We use ResNet-101 optimized with softmax loss for evaluation and report the attack success rate(%). Note that for Micresoft Azure API, it does not provide the accept thresholds for different T-FPRs and thus we use the provided likelihood 0.5 to determine whether two faces belong to the same person.
API name Face++ AliYun Azure
Metric T-FPR T-FPR Likelihood
Attacker /// Metric value 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT 0.5
Original 𝐱𝐱\mathbf{x}bold_x 2.04 0.51 0.50 0.00 0.00
Generated 𝐱newsuperscript𝐱new\mathbf{x^{\text{new}}}bold_x start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT 4.21 0.53 0.50 0.00 0.00
CW (G-FPR = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT) 9.18 2.04 2.00 0.50 0.00
StarGAN+CW (G-FPR = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT) 15.9 3.08 3.50 1.00 0.00
SemanticAdv (G-FPR = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT) 20.00 4.10 4.00 0.50 0.00
CW (G-FPR = 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT) 28.57 10.17 10.50 2.50 1.04
StarGAN+CW (G-FPR = 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT) 35.38 14.36 12.50 3.50 1.05
SemanticAdv (G-FPR = 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT) 58.25 31.44 24.00 10.50 5.73
CW 37.24 20.41 18.00 9.50 3.09
StarGAN+CW 47.45 26.02 20.00 8.50 5.56
MI-FGSM [16] 53.89 30.57 29.50 17.50 10.82
M-DI22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT-FGSM [73] 56.12 33.67 30.00 18.00 12.04
SemanticAdv (G-FPR <104absentsuperscript104<10^{-4}< 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT) 67.69 48.21 36.5 19.5 15.63
Refer to caption
Figure 10: Illustration of our SemanticAdv in the real world face verification platform (editing on pale skin). Note that the confidence denotes the likelihood that two faces belong to the same person.

In Table 11, we present the results of SemanticAdv performing query-free black-box attack on three online face verification platforms. SemanticAdv outperforms CW and StarGAN+CW in all APIs under all FPR thresholds. In addition, under the same T-FPR, we achieve higher attack success rate on APIs using samples generated using lower G-FPR compared to samples generated using higher G-FPR. Original 𝐱𝐱\mathbf{x}bold_x and generated 𝐱newsuperscript𝐱new\mathbf{x^{\text{new}}}bold_x start_POSTSUPERSCRIPT new end_POSTSUPERSCRIPT are regarded as reference point of the performance of online face verification platforms. In Figure 10, we also show several examples of our API attack on Microsoft Azure face verification system, which further demonstrates the effectiveness of our approach.

0.B.6 SemanticAdv against adversarial training

We evaluate our SemanticAdv against the existing adversarial training based defense method [42]. In detail, we randomly sample 10 persons from CelebA [40] and then randomly split the sampled dataset into training set, validation set and testing set according to a proportion of 80%, 10% and 10%, respectively. We train a ResNet-50 [23] to identify these face images by following the standard face recognition training pipeline [63]. As CelebA [40] does not contain enough images for each person, we finetune our model from a pretrained model trained on MS-Celeb-1M [22, 80]. We train the robust model by using adversarial training based method  [42]. In detail, we follow the same setting in  [42]. We use 7-step PGD Lsubscript𝐿L_{\infty}italic_L start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT attack to generate adversarial examples to solve the inner maximum problem for adversarial training. During test process, we evaluate by using adversarial examples generated by 20-step PGD attacks. The perturbation is bounded by 8 pixel (ranging from [0,255]) in terms of Lsubscript𝐿L_{\infty}italic_L start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT distance).

Table 12: Accuracy on standard model (without adversarial training) and robust model (with adversarial training).
Training Method / Attack Benign PGD SemanticAdv
Standard 93.3% 0% 0%
Robust [42] 86.7% 46.7% 10%

As shown in Table 12, the robust model achieves 10% accuracy against the adversarial examples generated by SemanticAdv, while 46.7% against the adversarial examples generated by PGD [42]. It indicates that existing adversarial training based defense method is less effective against SemanticAdv. It further demonstrates that our SemanticAdv identifies an unexplored research area beyond previous Lpsubscript𝐿𝑝L_{p}italic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-based ones.

Appendix 0.C Additional visualizations

Refer to caption
Refer to caption
Figure 11: Qualitative analysis on single-attribute adversarial attack (G-FPR=103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT).
Refer to caption
Refer to caption
Figure 12: Qualitative analysis on single-attribute adversarial attack (G-FPR=103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT).
Refer to caption
Refer to caption
Figure 13: Qualitative analysis on single-attribute adversarial attack (G-FPR=103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT).
Refer to caption
Figure 14: Qualitative comparisons between our proposed SemanticAdv (G-FPR = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT) and pixel-wise adversarial examples generated by CW. Along with the adversarial examples, we also provide the corresponding perturbations (residual) on the right.
Refer to caption
Refer to caption
Figure 15: Qualitative analysis on single-attribute adversarial attack (SemanticAdv with G-FPR = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT) by each other. Along with the adversarial examples, we also provide the corresponding perturbations (residual) on the right.