How I became a machine learning practitioner
我是如何成为一名机器学习从业者的
For the first three years of OpenAI, I dreamed of becoming a machine learning expert but made little progress towards that goal. Over the past nine months, I’ve finally made the transition to being a machine learning practitioner. It was hard but not impossible, and I think most people who are good programmers and know (or are willing to learn) the math can do it too. There are many online courses to self-study the technical side, and what turned out to be my biggest blocker was a mental barrier — getting ok with being a beginner again.
在加入 OpenAI 的前三年,我一直梦想成为一名机器学习专家,但在实现这一目标方面进展甚微。在过去的九个月里,我终于实现了向机器学习实践者的转变。这很难,但并非不可能,我认为大多数擅长编程并了解(或愿意学习)数学的人也能做到这一点。有很多在线课程可以让我自学技术方面的知识,而我最大的障碍是心理障碍--重新接受初学者的身份。
Studying machine learning during the 2018 holiday season.
在2018年假期期间学习机器学习。
Early days 早期 #
A founding principle of OpenAI is that we value research and engineering equally — our goal is to build working systems that solve previously impossible tasks, so we need both. (In fact, our team is comprised of 25% people primarily using software skills, 25% primarily using machine learning skills, and 50% doing a hybrid of the two.) So from day one of OpenAI, my software skills were always in demand, and I kept procrastinating on picking up the machine learning skills I wanted.
OpenAI 的创始原则是,我们对研究和工程同等重视--我们的目标是构建工作系统,解决以前不可能完成的任务,因此我们需要两者兼顾。(事实上,我们团队中 25% 的人主要使用软件技能,25% 的人主要使用机器学习技能,50% 的人混合使用这两种技能)。因此,从 OpenAI 成立的第一天起,我的软件技能就一直很抢手,而我却一直拖着不去学习我想要的机器学习技能。
After helping build OpenAI Gym, I was called to work on Universe. And as Universe was winding down, we decided to start working on Dota — and we needed someone to turn the game into a reinforcement learning environment before any machine learning could begin.
在帮助建立 OpenAI Gym 之后,我被邀请参与 Universe 的工作。当 Universe 即将结束时,我们决定开始开发 Dota,我们需要有人在机器学习开始之前将游戏转化为强化学习环境。
Dota 多塔 #
Turning such a complex game into a research environment without source code access was awesome work, and the team’s excitement every time I overcame a new obstacle was deeply validating. I figured out how to break out of the game’s Lua sandbox, LD_PRELOAD in a Go GRPC server to programmatically control the game, incrementally dump the whole game state into a Protobuf, and build a Python library and abstractions with future compatibility for the many different multiagent configurations we might want to use.
在没有源代码访问权限的情况下将如此复杂的游戏转化为研究环境是一项了不起的工作,每次我克服一个新的障碍时,团队的兴奋之情都让我深感欣慰。我想出了如何跳出游戏的 Lua 沙盒,在 Go GRPC 服务器中使用 LD_PRELOAD 来编程控制游戏,将整个游戏状态增量式地转储到 Protobuf 中,并建立了一个 Python 库和抽象,为我们将来可能要使用的多种不同的多代理配置提供兼容性。
But I felt half blind. At Stripe, though I gravitated towards infrastructure solutions, I could make changes anywhere in the stack since I knew the product code intimately. In Dota, I was constrained to looking at all problems through a software lens, which sometimes meant I tried to solve hard problems that could be avoided by just doing the machine learning slightly differently.
但我觉得自己半盲。在 Stripe,虽然我倾向于基础架构解决方案,但由于我对产品代码了如指掌,我可以对堆栈中的任何地方进行修改。而在 Dota,我只能从软件的角度来看待所有问题,这有时意味着我试图解决一些棘手的问题,而这些问题本可以通过略微不同的机器学习方法来避免。
I wanted to be like my teammates Jakub Pachocki and Szymon Sidor, who had made the core breakthrough that powered our Dota bot. They had questioned the common wisdom within OpenAI that reinforcement algorithms didn’t scale. They wrote a distributed reinforcement learning framework called Rapid and scaled it exponentially every two weeks or so, and we never hit a wall with it. I wanted to be able to make critical contributions like that which combined software and machine learning skills.
我想像我的队友 Jakub Pachocki 和 Szymon Sidor 一样,取得核心突破,为我们的 Dota 机器人提供动力。他们对 OpenAI 内部认为强化算法无法扩展的普遍看法提出了质疑。他们编写了一个名为 Rapid 的分布式强化学习框架,并每两周左右对其进行一次指数级扩展,我们从未在这方面碰壁。我希望能够做出这样的重要贡献,将软件和机器学习技能结合起来。
Szymon on the left; Jakub on the right.
左边是 Szymon,右边是 Jakub。
In July 2017, it looked like I might have my chance. The software infrastructure was stable, and I began work on a machine learning project. My goal was to use behavioral cloning to teach a neural network from human training data. But I wasn’t quite prepared for just how much I would feel like a beginner.
2017 年 7 月,我似乎有了机会。软件基础设施已经稳定,我开始着手一个机器学习项目。我的目标是利用行为克隆从人类训练数据中教授神经网络。但我并没有做好心理准备,不知道自己会有多像一个初学者。
I kept being frustrated by small workflow details which made me uncertain if I was making progress, such as not being certain which code a given experiment had used or realizing I needed to compare against a result from last week that I hadn’t properly archived. To make things worse, I kept discovering small bugs that had been corrupting my results the whole time.
工作流程中的一些小细节不断让我感到沮丧,这些细节让我无法确定自己是否取得了进展,比如无法确定某个实验使用了哪些代码,或者意识到我需要与上周的结果进行比较,而我还没有正确地将其归档。更糟糕的是,我不断发现一些小错误,它们一直在破坏我的结果。
I didn’t feel confident in my work, but to make it worse, other people did. People would mention how how hard behavioral cloning from human data is. I always made sure to correct them by pointing out that I was a newbie, and this probably said more about my abilities than the problem.
我对自己的工作没有信心,但更糟的是,其他人也对自己的工作没有信心。人们会提到根据人类数据进行行为克隆是多么困难。我总要纠正他们,指出我是个新手,这可能更能说明我的能力,而不是问题。
It all briefly felt worth it when my code made it into the bot, as Jie Tang used it as the starting point for creep blocking which he then fine-tuned with reinforcement learning. But soon Jie figured out how to get better results without using my code, and I had nothing to show for my efforts.
当我的代码进入机器人时,我曾短暂地觉得这一切都是值得的,因为唐杰把它作为蠕变阻塞的起点,然后用强化学习对它进行了微调。但很快,唐杰就发现了如何在不使用我的代码的情况下获得更好的结果,而我却一无所获。
I never tried machine learning on the Dota project again.
我再也没有在 Dota 项目上尝试过机器学习。
Time out 超时 #
After we lost two games in The International in 2018, most observers thought we’d topped out what our approach could do. But we knew from our metrics that we were right on the edge of success and mostly needed more training. This meant the demands on my time had relented, and in November 2018, I felt I had an opening to take a gamble with three months of my time.
2018 年,我们在国际比赛中输掉两场比赛后,大多数观察家都认为我们的方法已经达到了顶峰。但我们从我们的指标中知道,我们正处于成功的边缘,主要是需要更多的训练。这意味着对我时间的要求有所放松,2018 年 11 月,我觉得我有机会用三个月的时间赌一把。
Team members in high spirits after losing our first game at The International.
在国际赛上输掉第一场比赛后,队员们情绪高涨。
I learn best when I have something specific in mind to build. I decided to try building a chatbot. I started self-studying the curriculum we developed for our Fellows program, selecting only the NLP-relevant modules. For example, I wrote and trained an LSTM language model and then a Transformer-based one. I also read up on topics like information theory and read many papers, poring over each line until I fully absorbed it.
当我想到要构建一些特定的东西时,我的学习效果最好。我决定尝试构建一个聊天机器人。我开始自学我们为研究员项目开发的课程,只选择与 NLP 相关的模块。例如,我编写并训练了一个 LSTM 语言模型,然后又编写了一个基于 Transformer 的模型。我还阅读了信息论等主题,并阅读了许多论文,仔细研读每一行,直到完全吸收为止。
It was slow going, but this time I expected it. I didn’t experience flow state. I was reminded of how I’d felt when I just started programming, and I kept thinking of how many years it had taken to achieve a feeling of mastery. I honestly wasn’t confident that I would ever become good at machine learning. But I kept pushing because… well, honestly because I didn’t want to be constrained to only understanding one part of my projects. I wanted to see the whole picture clearly.
虽然进展缓慢,但这次在我意料之中。我没有体验到流动状态。我想起了刚开始编程时的感觉,我一直在想,我花了多少年才有了掌握的感觉。老实说,我对自己能否成为机器学习高手并没有信心。但我一直在努力,因为......嗯,老实说,因为我不想受限于只能理解项目的一部分。我想看清全局。
My personal life was also an important factor in keeping me going. I’d begun a relationship with someone who made me feel it was ok if I failed. I spent our first holiday season together beating my head against the machine learning wall, but she was there with me no matter how many planned activities it meant skipping.
我的个人生活也是让我坚持下去的一个重要因素。我和一个让我觉得即使失败也没关系的人开始了一段恋情。在我们一起度过的第一个假期里,我一直在用脑袋撞机器学习的墙,但她一直陪着我,无论有多少计划好的活动都被她放弃了。
One important conceptual step was overcoming a barrier I’d been too timid to do with Dota: make substantive changes to someone else’s machine learning code. I fine-tuned GPT-1 on chat datasets I’d found, and made a small change to add my own naive sampling code. But it became so painfully slow as I tried to generate longer messages that my frustration overwhelmed my fear, and I implemented GPU caching — a change which touched the entire model.
一个重要的概念性步骤是克服了我在 Dota 上不敢做的障碍:对别人的机器学习代码进行实质性修改。我在找到的聊天数据集上对 GPT-1 进行了微调,并做了一个小改动,添加了我自己的天真采样代码。但当我试图生成更长的信息时,速度变得非常慢,我的挫败感压倒了恐惧感,于是我实施了 GPU 缓存,这一改动触及了整个模型。
I had to try a few times, throwing out my changes as they exceeded the complexity I could hold in my head. By the time I got it working a few days later, I realized I’d learned something that I would have previously thought impossible: I now understood how the whole model was put together, down to small stylistic details like how the codebase elegantly handles TensorFlow variable scopes.
我不得不尝试了几次,当我的改动超出了我脑海中所能容纳的复杂程度时,我就把它们扔掉了。几天后,当我能正常工作时,我意识到我学到了以前认为不可能学到的东西:我现在明白了整个模型是如何组合在一起的,小到风格细节,比如代码库是如何优雅地处理 TensorFlow 变量作用域的。
Retooled 重装 #
After three months of self-study, I felt ready to work on an actual project. This was also the first point where I felt I could benefit from the many experts we have at OpenAI, and I was delighted when Jakub and my co-founder Ilya Sutskever agreed to advise me.
经过三个月的自学,我觉得自己已经准备好参与一个实际项目了。这也是我第一次感到自己可以从 OpenAI 的众多专家那里获益,当 Jakub 和我的联合创始人 Ilya Sutskever 同意为我提供建议时,我感到非常高兴。
Ilya singing karaoke at our company offsite.
伊利亚在公司外唱卡拉 OK。
We started to get very exciting results, and Jakub and Szymon joined the project full-time. I feel proud every time I see a commit from them in the machine learning codebase I’d started.
我们开始取得令人兴奋的成果,Jakub 和 Szymon 全职加入了这个项目。每次看到他们在我创建的机器学习代码库中提交的代码,我都感到非常自豪。
I’m starting to feel competent, though I haven’t yet achieved mastery. I’m seeing this reflected in the number of hours I can motivate myself to spend focused on doing machine learning work — I’m now around 75% of the number of coding hours from where I’ve been historically.
虽然我还没有达到精通的程度,但我开始感觉自己有能力了。我看到这一点反映在我能激励自己专注于机器学习工作的小时数上--我现在的编码小时数是历史上的 75%。
But for the first time, I feel that I’m on trajectory. At first, I was overwhelmed by the seemingly endless stream of new machine learning concepts. Within the first six months, I realized that I could make progress without constantly learning entirely new primitives. I still need to get more experience with many skills, such as initializing a network or setting a learning rate schedule, but now the work feels incremental rather than potentially impossible.
但这是我第一次感到自己已经步入正轨。起初,我被似乎无穷无尽的机器学习新概念淹没了。在最初的六个月里,我意识到,我可以在不不断学习全新基元的情况下取得进步。我仍然需要在很多技能上积累更多经验,比如初始化网络或设置学习率计划,但现在我感觉工作是循序渐进的,而不是潜在的不可能。
From our Fellows and Scholars programs, I’d known that software engineers with solid fundamentals in linear algebra and probability can become machine learning engineers with just a few months of self study. But somehow I’d convinced myself that I was the exception and couldn’t learn. But I was wrong — even embedded in the middle of OpenAI, I couldn’t make the transition because I was unwilling to become a beginner again.
从我们的 "研究员 "和 "学者 "项目中,我知道软件工程师只要具备扎实的线性代数和概率论基础,只需几个月的自学就能成为机器学习工程师。但不知何故,我一直坚信自己是个例外,学不会。但我错了--即使置身于 OpenAI 中间,我也无法完成转型,因为我不愿意再次成为初学者。
You’re probably not an exception either. If you’d like to become a deep learning practitioner, you can. You need to give yourself the space and time to fail. If you learn from enough failures, you’ll succeed — and it’ll probably take much less time than you expect.
你可能也不例外。如果你想成为深度学习实践者,你可以做到。你需要给自己失败的空间和时间。如果你从足够多的失败中吸取教训,你就会成功,而且所花的时间可能比你预期的要少得多。
At some point, it does become important to surround yourself by existing experts. And that is one place where I’m incredibly lucky. If you’re a great software engineer who reaches that point, keep in mind there’s a way you can be surrounded by the same people as I am — apply to OpenAI!
在某些时候,身边有现成的专家确实很重要。而这正是我无比幸运的地方。如果你是一名优秀的软件工程师,并达到了这一境界,那么请记住,有一种方法可以让你和我一样,被同样的人包围--申请加入 OpenAI!