Which is better? Estimating Public Preference for Roads 哪个更好?估计公众对道路的偏好
In today’s era of ubiquitous mobile smart devices, car-pooling, and navigation services have become an integral part of daily traveling. At the heart of these services lies the essential function of route planning - the art of efficiently guiding individuals from their starting point to their intended destination. For providers of route planning services, the challenge goes beyond mere connectivity; it extends to delivering routes that resonate with users, as evidenced by the likelihood of users choosing these recommended paths. Elevated user satisfaction typically correlates with an increased rate of route selections that align with the planned routes, a measure that reflects the efficacy ofthe route planning algorithm. While traditional route planning leverages weighted graphs and shortest-path algorithms (such as Dijkstra’s algorithm). 在当今移动智能设备无处不在的时代,拼车和导航服务已成为日常出行不可或缺的一部分。这些服务的核心是路线规划的基本功能 - 有效地引导个人从起点到预定目的地的艺术。对于路线规划服务提供商来说,挑战不仅仅是连接;它扩展到提供与用户产生共鸣的路由,用户选择这些推荐路径的可能性就证明了这一点。用户满意度的提高通常与与规划路线一致的路线选择率的增加相关,这一衡量标准反映了路线规划算法的有效性。而传统的路线规划利用加权图和最短路径算法(例如 Dijkstra 算法)。
The key reason for the divergence between shortest paths and user preferences lies in the choice of weightings for the edges in the weighted graph. Intuitive choices like distance or travel time often prove inadequate. Figure 1 of Route Planning with Different Goals illustrates a typical case: given the origin A and destination B, Route 1 represents the popular user choice. Route 2 is the fastest (corresponding to the shortest path from A to B when edge weights are average travel times). Route 3 is the shortest (corresponding to the shortest path when the edge weights are defined as road length). Both Route 2 and Route 3 differ from the user’s popular choice, underscoring the fact that real-world route selection involves more intricate considerations that are hidden within historical data and are challenging to quantify using a simple set of criteria. Expert manual optimization is not only inefficient but also yields suboptimal results. 最短路径和用户首选项之间差异的关键原因在于加权图中边的权重选择。事实证明,距离或旅行时间等直观选择往往是不够的。不同目标的路由规划图 1 说明了一个典型案例:给定源 A 和目标 B,路由 1 表示流行的用户选择。路径 2 是最快的(当边权重为平均行驶时间时,对应于从 A 到 B 的最短路径)。路线 3 是最短的(当边权重定义为道路长度时,对应于最短路径)。Route 2 和 Route 3 都与用户的热门选择不同,这凸显了这样一个事实,即实际的路由选择涉及隐藏在历史数据中的更复杂的考虑因素,并且很难使用一组简单的标准进行量化。专家手动优化不仅效率低下,而且会产生次优的结果。
Figure 1 Examples of Route Planning with Different Goals. 图 1 具有不同目标的路由规划示例。
Hence, existing route planning models are typically data-driven, aiming to harness historical data to achieve more realistic results than straightforward shortest-path algorithms. An effective method is to determine the edge weightofthe road network so that most popular paths are the shortest paths between their start and end point on the weighted graph. A car-hailing company hopes that your team can solve the Edge Weight Estimation Problem defined as the following: 因此,现有的路线规划模型通常是数据驱动的,旨在利用历史数据获得比简单的最短路径算法更真实的结果。一种有效的方法是确定道路网络的边权重,以便大多数常用路径是加权图上其起点和终点之间的最短路径。一家网约车公司希望您的团队能够解决定义如下的 Edge Weight Estimation 问题:
Given a weighted, directed graph G=(V,E)G=(V, E) representing the map of a city. VV is the vertex set, each vertex v in Vv \in V represent a crossroad. EE is the edge set, each edge (v_(i),v_(j))in E\left(v_{i}, v_{j}\right) \in E represent the road segment between v_(i)v_{i} and v_(j)v_{j}. Theweightof edge (v_(i),v_(j))in E\left(v_{i}, v_{j}\right) \in E is w(v_(i),v_(j))w\left(v_{i}, v_{j}\right) which need to be determined based on historical paths. 给定一个加权的有向图 G=(V,E)G=(V, E) ,表示城市地图。 VV 是顶点集,每个顶点 v in Vv \in V 代表一个十字路口。 EE 是边集,每条边 (v_(i),v_(j))in E\left(v_{i}, v_{j}\right) \in E 代表 和 之间的 v_(i)v_{i}v_(j)v_{j} 路段。edge (v_(i),v_(j))in E\left(v_{i}, v_{j}\right) \in E 的权重是 w(v_(i),v_(j))w\left(v_{i}, v_{j}\right) 需要根据历史路径来确定的。
For some vertexes v_(s),v_(t)in Vv_{s}, v_{t} \in V, wehave historical paths indicating how people used to go from v_(s)v_{s} to v_(t)v_{t}, denoted as a set P(s,t)={p_(1),p_(2),cdots,p_(|P(s,t)|)}P(s, t)=\left\{p_{1}, p_{2}, \cdots, p_{|P(s, t)|}\right\} where |P(s,t)||P(s, t)| is the number of paths in P(s,t)P(s, t). Each path p_(k)in P(s,t)p_{k} \in P(s, t) from v_(s)v_{s} to v_(t)v_{t} is represent by some edges. The length ofp_(k)p_{k} is the sum oftheweightsof its edges, denoted as L(p_(k))L\left(p_{k}\right). 对于某些顶点 v_(s),v_(t)in Vv_{s}, v_{t} \in V ,我们有历史路径,指示人们过去是如何从 到 v_(s)v_{s}v_(t)v_{t} 的,表示为一个集合 P(s,t)={p_(1),p_(2),cdots,p_(|P(s,t)|)}P(s, t)=\left\{p_{1}, p_{2}, \cdots, p_{|P(s, t)|}\right\} ,其中 |P(s,t)||P(s, t)| 是 中的 P(s,t)P(s, t) 路径数。从 到 的 v_(s)v_{s}v_(t)v_{t} 每条路径 p_(k)in P(s,t)p_{k} \in P(s, t) 都由一些边缘表示。的长度 p_(k)p_{k} 是其边的权重之和,表示为 L(p_(k))L\left(p_{k}\right) 。
When the edge weights are determined, for each (s,t)(s, t) pair, we can find the length ofthe shortest path from ss to tt (Also called the distance from ss to tt ), denoted as D(s,t)D(s, t). We can also calculate L(p_(k))L\left(p_{k}\right) for AAp_(k)in P(s,t)\forall p_{k} \in P(s, t) based on the determined edge weights. For each p_(k)in P(s,t)p_{k} \in P(s, t), define a similarity function f(p_(k))={[1," if "L(p_(k))=D(s","t)],[0," if "L(p_(k)) > D(s","t)]:}f\left(p_{k}\right)=\left\{\begin{array}{ll}1 & \text { if } L\left(p_{k}\right)=D(s, t) \\ 0 & \text { if } L\left(p_{k}\right)>D(s, t)\end{array}\right.. 当确定了边权重时,对于每 (s,t)(s, t) 对,我们可以找到从 ss to tt 的最短路径的长度(也称为从 ss to tt 的距离),表示为 D(s,t)D(s, t) 。我们还可以根据确定的边权重进行计算 L(p_(k))L\left(p_{k}\right)AAp_(k)in P(s,t)\forall p_{k} \in P(s, t) 。对于每个 p_(k)in P(s,t)p_{k} \in P(s, t) ,定义一个相似性函数 f(p_(k))={[1," if "L(p_(k))=D(s","t)],[0," if "L(p_(k)) > D(s","t)]:}f\left(p_{k}\right)=\left\{\begin{array}{ll}1 & \text { if } L\left(p_{k}\right)=D(s, t) \\ 0 & \text { if } L\left(p_{k}\right)>D(s, t)\end{array}\right. 。
Our goal is to maximize the overall similarity SIM=(sum_(s,t in V)sum_(p_(k)in P(s,t))f(p_(k)))/(sum_(s,t in V)|P(s,t)|)\operatorname{SIM}=\frac{\sum_{s, t \in V} \sum_{p_{k} \in P(s, t)} f\left(p_{k}\right)}{\sum_{s, t \in V}|P(s, t)|} by assigning weights to the edges in the graph G. For simplicity, we restrict theweightsofthe edges to be positive integers. Example 1 shows the calculation of SIM for a given graph, historical paths, and already determined the edge weight. 我们的目标是通过为图 G 中的边分配权重来最大化整体相似性 SIM=(sum_(s,t in V)sum_(p_(k)in P(s,t))f(p_(k)))/(sum_(s,t in V)|P(s,t)|)\operatorname{SIM}=\frac{\sum_{s, t \in V} \sum_{p_{k} \in P(s, t)} f\left(p_{k}\right)}{\sum_{s, t \in V}|P(s, t)|} 。为简单起见,我们将 theweightsofthe edges 限制为正整数。示例 1 显示了给定图形、历史路径和已确定的边缘权重的 SIM 计算。
Example 1: Given a graph G=(V,E)G=(V, E), Vertex set V={1,2,3,4,5,6,7,8,9}V=\{1,2,3,4,5,6,7,8,9\}, Edge Set E=E={(1,2),(1,4),(2,3),(2,5),(3,6),(4,5),(4,7),(5,4),(5,6),(5,8),(6,5),(6,9),(7,8),(8,9),(9,8)}\{(1,2),(1,4),(2,3),(2,5),(3,6),(4,5),(4,7),(5,4),(5,6),(5,8),(6,5),(6,9),(7,8),(8,9),(9,8)\} 示例 1:给定一个图形 G=(V,E)G=(V, E) 、 顶点集 V={1,2,3,4,5,6,7,8,9}V=\{1,2,3,4,5,6,7,8,9\} 、 边集 E=E={(1,2),(1,4),(2,3),(2,5),(3,6),(4,5),(4,7),(5,4),(5,6),(5,8),(6,5),(6,9),(7,8),(8,9),(9,8)}\{(1,2),(1,4),(2,3),(2,5),(3,6),(4,5),(4,7),(5,4),(5,6),(5,8),(6,5),(6,9),(7,8),(8,9),(9,8)\}
Historical paths set P(1,7)={p_(1),p_(2)},P(1,9)={p_(3),p_(4)}P(1,7)=\left\{p_{1}, p_{2}\right\}, P(1,9)=\left\{p_{3}, p_{4}\right\}, Where p_(1)=<(1,2),(2,5),(5,4),(4,7) >p_{1}=<(1,2),(2,5),(5,4),(4,7)>, p_(2)=<(1,2),(2,5),(5,8),(8,7) > ,p_(3)=<(1,2),(2,5),(5,6),(6,9) > ,p_(4)=<(1,2),(2,3),(3,6),(6,5)p_{2}=<(1,2),(2,5),(5,8),(8,7)>, p_{3}=<(1,2),(2,5),(5,6),(6,9)>, p_{4}=<(1,2),(2,3),(3,6),(6,5), (5,8),(8,9) >(5,8),(8,9)>. If we determine the edge weights as Table 1. 历史路径集 P(1,7)={p_(1),p_(2)},P(1,9)={p_(3),p_(4)}P(1,7)=\left\{p_{1}, p_{2}\right\}, P(1,9)=\left\{p_{3}, p_{4}\right\} ,其中 p_(1)=<(1,2),(2,5),(5,4),(4,7) >p_{1}=<(1,2),(2,5),(5,4),(4,7)> , p_(2)=<(1,2),(2,5),(5,8),(8,7) > ,p_(3)=<(1,2),(2,5),(5,6),(6,9) > ,p_(4)=<(1,2),(2,3),(3,6),(6,5)p_{2}=<(1,2),(2,5),(5,8),(8,7)>, p_{3}=<(1,2),(2,5),(5,6),(6,9)>, p_{4}=<(1,2),(2,3),(3,6),(6,5) , , (5,8),(8,9) >(5,8),(8,9)> .如果我们确定边缘权重,如 表 1.