这是用户在 2024-5-29 24:48 为 https://mattweidner.com/2023/04/13/position-strings.html 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

"Position Strings" for Collaborative Lists and Text
用于协作列表和文本的“位置字符串”

Matthew Weidner | Apr 13th, 2023
马修·韦德纳 | 2023 年 4 月 13 日

Home | RSS Feed
主页 | RSS 订阅

Keywords: position-strings, collaborative text editing, CRDTs, fractional indexing, Fugue
关键词:位置字符串,协作文本编辑,CRDTs,分数索引,Fugue

I recently published a new TypeScript library on npm: position-strings.
我最近在 npm 上发布了一个新的 TypeScript 库:position-strings。

You can read the docs there for full info, but basically, it provides “position strings” for use in collaborative lists or text strings. Each position string points to a specific list element (or text character), and the list order is given by the lexicographic order on position strings.
您可以阅读那里的文档以获取完整信息,但基本上,它提供了用于协作列表或文本字符串中的“位置字符串”。 每个位置字符串指向特定的列表元素(或文本字符),列表顺序由位置字符串上的词典顺序给出。

For example, a collaborative text editor using the library could represent the text "Hello world" as:
例如,使用该库的协作文本编辑器可以将文本 "Hello world" 表示为:

[
  { position: "r95sU223.B", char: 'H' },
  { position: "r95sU223.D", char: 'e' },
  { position: "r95sU223.F", char: 'l' },
  { position: "r95sU223.H", char: 'l' },
  { position: "r95sU223.J", char: 'o' },
  { position: "r95sU223.N", char: ' ' },
  { position: "r95sU223.N,Y2FsVQbt.B", char: 'w' },
  { position: "r95sU223.N,Y2FsVQbt.D", char: 'o' },
  { position: "r95sU223.N,Y2FsVQbt.F", char: 'r' },
  { position: "r95sU223.N,Y2FsVQbt.H", char: 'l' },
  { position: "r95sU223.N,Y2FsVQbt.J", char: 'd' }
]

Here the array is sorted lexicographically by position. (The funny-looking substrings "r95sU223" and "Y2FsVQbt" are random IDs, used to ensure uniqueness when multiple users type concurrently.)
这里的数组按照 position 的字典顺序排序。(看起来有趣的子字符串 "r95sU223""Y2FsVQbt" 是随机 ID,用于确保多个用户同时输入时的唯一性。)

When the user types ',' between "Hello " and "world", the text editor will ask the library for a new position string in between the positions of ' ' and 'w':
当用户在 "Hello ""world" 之间键入 ',' 时,文本编辑器将向库请求一个新的位置字符串,位于 ' ''w' 的位置之间:

// At the start of the app:
import { PositionSource } from "position-strings";
const source = new PositionSource();

// When the user types between ' ' @ "r95sU223.N" and 'w' @ "r95sU223.N,Y2FsVQbt.B":
const newPos = source.createBetween("r95sU223.N", "r95sU223.N,Y2FsVQbt.B");
// newPos is "r95sU223.J,bLBGHJmO.B", which is indeed between:
// "r95sU223.N" < "r95sU223.J,bLBGHJmO.B" < "r95sU223.N,Y2FsVQbt.B".

The text editor then adds { position: newPos, char: ',' } to its array in sorted order. It also broadcasts this object to collaborators, who add it to their own arrays. Likewise, when the user deletes a character, the text editor broadcasts the deleted position and all collaborators delete its list entry.
文本编辑器然后按排序顺序将 { position: newPos, char: ',' } 添加到其数组中。它还将此对象广播给协作者,他们将其添加到自己的数组中。同样,当用户删除一个字符时,文本编辑器会广播已删除的位置,所有协作者都会删除其列表条目。

Position-strings implements the hard part of a list CRDT, with a minimal, lightweight API. Unlike most CRDT implementations, position-strings doesn’t “own” the list state or store extra metadata1; instead, you are free to store the positions and chars (or list values) wherever you like. For example, you could store each { position, char } as a row in a cloud-synced database like Firebase Realtime DB - I provide an example app that implements collaborative text editing this way.
位置字符串实现了列表 CRDT 的难点,具有最小、轻量级的 API。与大多数 CRDT 实现不同,位置字符串不“拥有”列表状态或存储额外的元数据;相反,您可以自由地将位置和字符(或列表值)存储在任何您喜欢的地方。例如,您可以将每个作为一行存储在像 Firebase 实时数据库 这样的云同步数据库中 - 我提供了一个实现协作文本编辑的示例应用程序。

You can also think of position-strings like fractional indexing, but with a few extra features: global uniqueness, non-interleaving, and slower (logarithmic) length growth in sequences.
您还可以将位置字符串视为分数索引,但具有一些额外功能:全局唯一性、非交错和序列长度增长较慢(对数级别)。


The catch is that a list/text CRDT using position-strings will have more storage and network overhead than a dedicated data structure like Y.Array. In my benchmarks, the position strings are typically 10-100 characters long, so you’ll have at least 10-100 bytes of storage overhead per list element/text character. This is fine for short documents and simple apps, but could be a problem for fancier uses.
问题在于,使用位置字符串的列表/文本 CRDT 将比像 Y.Array 这样的专用数据结构具有更多的存储和网络开销。在我的基准测试中,位置字符串通常为 10-100 个字符长,因此每个列表元素/文本字符至少会有 10-100 字节的存储开销。对于短文档和简单应用程序来说,这是可以接受的,但对于更复杂的用途可能会成为问题。

Under the hood, position-strings uses an optimized version of Fugue’s string implementation. You can read about the full algorithm here.
在内部,position-strings 使用了 Fugue 的字符串实现的优化版本。您可以在这里阅读有关完整算法的信息。

  1. Except for a small amount of ephemeral, local state that you don’t need to store or broadcast - it only exists for the lifetime of a PositionSource object. 
    除了一小部分短暂的、本地的状态,您无需存储或广播 - 它仅存在于 PositionSource 对象的生命周期。 ↩

⭐️ I'm on the market for a software engineering job post-PhD.
⭐️ 我正在寻找博士毕业后的软件工程师工作。


Home • Matthew Weidner • PhD student at CMU CSD • mweidner037@gmail.com • @MatthewWeidner3LinkedInGitHub
主页 • Matthew Weidner • 卡内基梅隆大学计算机科学博士生 • mweidner037@gmail.com • @MatthewWeidner3 • 领英 • GitHub