{ "cells": [ { "cell_type": "markdown", "id": "ef6d5298", "metadata": {}, "source": [ "# Reward, Cost, Termination, and Step Information\n", "\n", "[![Click and Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/metadriverse/metaurban/blob/main/documentation/source/reward_cost_done.ipynb)\n", "\n", "\n", "\n", "Following the standard OpenAI Gym API, after each step of the environment `env.step(...)`, the environment will return a tuple containing five items: `(obs, reward, terminated, truncated, info)`. In this page, we discuss the design of reward function `reward`, cost function `info[\"cost\"]`, termination criterion `terminated` in various settings, truncation information `truncated`, and the details of step information `info`." ] }, { "cell_type": "markdown", "id": "ed8d0bad", "metadata": {}, "source": [ "## Reward Function\n", "\n", "For all environments, reward functions consist of generally a dense driving reward and a sparse terminal reward. The dense reward is the longitudinal movement along the reference line or lane toward destination. When the episode is terminated due to, i.e. arriving the destination or driving out of the road, a sparse reward will be added to the dense reward. In practice, the concrete implementations of reward function are slightly different across all environments. \n", "\n", "The reward functions are implemented as follows." ] }, { "cell_type": "code", "execution_count": 2, "id": "fb7b7072", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mdef\u001b[39;49;00m\u001b[37m \u001b[39;49;00m\u001b[35mreward_function\u001b[39;49;00m\u001b[35m(\u001b[39;49;00m\u001b[35mself\u001b[39;49;00m\u001b[35m,\u001b[39;49;00m \u001b[30mvehicle_id\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m \u001b[35mstr\u001b[39;49;00m\u001b[35m)\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "\u001b[37m \u001b[39;49;00m\u001b[33m\"\"\"\u001b[39;49;00m\n", "\u001b[33m Override this func to get a new reward function\u001b[39;49;00m\n", "\u001b[33m :param vehicle_id: id of BaseVehicle\u001b[39;49;00m\n", "\u001b[33m :return: reward\u001b[39;49;00m\n", "\u001b[33m \"\"\"\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mvehicle\u001b[39;49;00m = \u001b[35mself\u001b[39;49;00m.\u001b[30magents\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[30mvehicle_id\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mstep_info\u001b[39;49;00m = \u001b[35mdict\u001b[39;49;00m\u001b[35m(\u001b[39;49;00m\u001b[35m)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "\u001b[37m\u001b[39;49;00m\n", " \u001b[32m# Reward for moving forward in current lane\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mcurrent_lane\u001b[39;49;00m = \u001b[30mvehicle\u001b[39;49;00m.\u001b[30mlane\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mlong_last\u001b[39;49;00m = \u001b[30mvehicle\u001b[39;49;00m.\u001b[30mnavigation\u001b[39;49;00m.\u001b[30mlast_longitude\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mlong_now\u001b[39;49;00m = \u001b[30mvehicle\u001b[39;49;00m.\u001b[30mnavigation\u001b[39;49;00m.\u001b[30mcurrent_longitude\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mlateral_now\u001b[39;49;00m = \u001b[30mvehicle\u001b[39;49;00m.\u001b[30mnavigation\u001b[39;49;00m.\u001b[30mcurrent_lateral\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "\u001b[37m\u001b[39;49;00m\n", " \u001b[32m# dense driving reward\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mreward\u001b[39;49;00m = \u001b[34m0\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mreward\u001b[39;49;00m += \u001b[35mself\u001b[39;49;00m.\u001b[30mconfig\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mdriving_reward\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m * \u001b[35m(\u001b[39;49;00m\u001b[30mlong_now\u001b[39;49;00m - \u001b[30mlong_last\u001b[39;49;00m\u001b[35m)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "\u001b[37m\u001b[39;49;00m\n", " \u001b[32m# reward for lane keeping, without it vehicle can learn to overtake but fail to keep in lane\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mlateral_factor\u001b[39;49;00m = \u001b[35mabs\u001b[39;49;00m\u001b[35m(\u001b[39;49;00m\u001b[30mlateral_now\u001b[39;49;00m\u001b[35m)\u001b[39;49;00m / \u001b[35mself\u001b[39;49;00m.\u001b[30mconfig\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mmax_lateral_dist\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mlateral_penalty\u001b[39;49;00m = -\u001b[30mlateral_factor\u001b[39;49;00m * \u001b[35mself\u001b[39;49;00m.\u001b[30mconfig\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mlateral_penalty\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mreward\u001b[39;49;00m += \u001b[30mlateral_penalty\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "\u001b[37m\u001b[39;49;00m\n", " \u001b[32m# heading diff\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mref_line_heading\u001b[39;49;00m = \u001b[30mvehicle\u001b[39;49;00m.\u001b[30mnavigation\u001b[39;49;00m.\u001b[30mcurrent_heading_theta_at_long\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mheading_diff\u001b[39;49;00m = \u001b[35mabs\u001b[39;49;00m\u001b[35m(\u001b[39;49;00m\u001b[30mwrap_to_pi\u001b[39;49;00m\u001b[35m(\u001b[39;49;00m\u001b[30mvehicle\u001b[39;49;00m.\u001b[30mheading_theta\u001b[39;49;00m - \u001b[30mref_line_heading\u001b[39;49;00m\u001b[35m)\u001b[39;49;00m\u001b[35m)\u001b[39;49;00m / \u001b[30mnp\u001b[39;49;00m.\u001b[30mpi\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mheading_penalty\u001b[39;49;00m = -\u001b[30mheading_diff\u001b[39;49;00m * \u001b[35mself\u001b[39;49;00m.\u001b[30mconfig\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mheading_penalty\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mreward\u001b[39;49;00m += \u001b[30mheading_penalty\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "\u001b[37m\u001b[39;49;00m\n", " \u001b[32m# steering_range\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30msteering\u001b[39;49;00m = \u001b[35mabs\u001b[39;49;00m\u001b[35m(\u001b[39;49;00m\u001b[30mvehicle\u001b[39;49;00m.\u001b[30mcurrent_action\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[34m0\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m\u001b[35m)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mallowed_steering\u001b[39;49;00m = \u001b[35m(\u001b[39;49;00m\u001b[34m1\u001b[39;49;00m / \u001b[35mmax\u001b[39;49;00m\u001b[35m(\u001b[39;49;00m\u001b[30mvehicle\u001b[39;49;00m.\u001b[30mspeed\u001b[39;49;00m\u001b[35m,\u001b[39;49;00m \u001b[32m1e-2\u001b[39;49;00m\u001b[35m)\u001b[39;49;00m\u001b[35m)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30moverflowed_steering\u001b[39;49;00m = \u001b[35mmin\u001b[39;49;00m\u001b[35m(\u001b[39;49;00m\u001b[35m(\u001b[39;49;00m\u001b[30mallowed_steering\u001b[39;49;00m - \u001b[30msteering\u001b[39;49;00m\u001b[35m)\u001b[39;49;00m\u001b[35m,\u001b[39;49;00m \u001b[34m0\u001b[39;49;00m\u001b[35m)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30msteering_range_penalty\u001b[39;49;00m = \u001b[30moverflowed_steering\u001b[39;49;00m * \u001b[35mself\u001b[39;49;00m.\u001b[30mconfig\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33msteering_range_penalty\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mreward\u001b[39;49;00m += \u001b[30msteering_range_penalty\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "\u001b[37m\u001b[39;49;00m\n", " \u001b[32m# steering smoothness\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30msteering_reward\u001b[39;49;00m = \u001b[34m0\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mif\u001b[39;49;00m \u001b[30mvehicle_id\u001b[39;49;00m \u001b[35mnot\u001b[39;49;00m \u001b[35min\u001b[39;49;00m \u001b[35mself\u001b[39;49;00m.\u001b[30mprevious_agent_actions\u001b[39;49;00m \u001b[35mor\u001b[39;49;00m \u001b[33m\"\u001b[39;49;00m\u001b[33msteering_penalty\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m \u001b[35mnot\u001b[39;49;00m \u001b[35min\u001b[39;49;00m \u001b[35mself\u001b[39;49;00m.\u001b[30mconfig\u001b[39;49;00m \u001b[35mor\u001b[39;49;00m \u001b[35mself\u001b[39;49;00m.\u001b[30mconfig\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[33m\"\u001b[39;49;00m\u001b[33msteering_penalty\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m == \u001b[34m0\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30msteering_reward\u001b[39;49;00m = \u001b[34m0\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34melse\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30msteering\u001b[39;49;00m = \u001b[30mvehicle\u001b[39;49;00m.\u001b[30mcurrent_action\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[34m0\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mprev_steering\u001b[39;49;00m = \u001b[35mself\u001b[39;49;00m.\u001b[30mprevious_agent_actions\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[30mvehicle_id\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[34m0\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30msteering_diff\u001b[39;49;00m = \u001b[35mabs\u001b[39;49;00m\u001b[35m(\u001b[39;49;00m\u001b[30msteering\u001b[39;49;00m - \u001b[30mprev_steering\u001b[39;49;00m\u001b[35m)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30msteering_reward\u001b[39;49;00m = -\u001b[30msteering_diff\u001b[39;49;00m * \u001b[35mself\u001b[39;49;00m.\u001b[30mconfig\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33msteering_penalty\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m \u001b[32m# 0.25 is to make the reward more spiky\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30msteering_reward\u001b[39;49;00m = \u001b[30msteering_reward\u001b[39;49;00m * \u001b[30mvehicle\u001b[39;49;00m.\u001b[30mspeed\u001b[39;49;00m / \u001b[30mvehicle\u001b[39;49;00m.\u001b[30mmax_speed_km_h\u001b[39;49;00m \u001b[32m# when the vehicle is faster, the penalty is more significant\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mreward\u001b[39;49;00m += \u001b[30msteering_reward\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "\u001b[37m\u001b[39;49;00m\n", " \u001b[34mif\u001b[39;49;00m \u001b[33m'\u001b[39;49;00m\u001b[33mspeed_reward\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m \u001b[35min\u001b[39;49;00m \u001b[35mself\u001b[39;49;00m.\u001b[30mconfig\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mpositive_road\u001b[39;49;00m = \u001b[34m1\u001b[39;49;00m \u001b[34mif\u001b[39;49;00m \u001b[35mnot\u001b[39;49;00m \u001b[35mself\u001b[39;49;00m.\u001b[30m_is_out_of_road\u001b[39;49;00m\u001b[35m(\u001b[39;49;00m\u001b[30mvehicle\u001b[39;49;00m\u001b[35m)\u001b[39;49;00m \u001b[34melse\u001b[39;49;00m -\u001b[34m1\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mreward\u001b[39;49;00m += \u001b[35mself\u001b[39;49;00m.\u001b[30mconfig\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mspeed_reward\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m * \u001b[35m(\u001b[39;49;00m\u001b[30mvehicle\u001b[39;49;00m.\u001b[30mspeed_km_h\u001b[39;49;00m / \u001b[30mvehicle\u001b[39;49;00m.\u001b[30mmax_speed_km_h\u001b[39;49;00m\u001b[35m)\u001b[39;49;00m * \u001b[30mpositive_road\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "\u001b[37m\u001b[39;49;00m\n", " \u001b[34mif\u001b[39;49;00m \u001b[35mself\u001b[39;49;00m.\u001b[30mconfig\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mno_negative_reward\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mreward\u001b[39;49;00m = \u001b[35mmax\u001b[39;49;00m\u001b[35m(\u001b[39;49;00m\u001b[30mreward\u001b[39;49;00m\u001b[35m,\u001b[39;49;00m \u001b[34m0\u001b[39;49;00m\u001b[35m)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "\u001b[37m\u001b[39;49;00m\n", " \u001b[32m# crash penalty\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mif\u001b[39;49;00m \u001b[30mvehicle\u001b[39;49;00m.\u001b[30mcrash_vehicle\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mreward\u001b[39;49;00m = -\u001b[35mself\u001b[39;49;00m.\u001b[30mconfig\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mcrash_vehicle_penalty\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mif\u001b[39;49;00m \u001b[30mvehicle\u001b[39;49;00m.\u001b[30mcrash_object\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mreward\u001b[39;49;00m = -\u001b[35mself\u001b[39;49;00m.\u001b[30mconfig\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mcrash_object_penalty\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mif\u001b[39;49;00m \u001b[30mvehicle\u001b[39;49;00m.\u001b[30mcrash_human\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mreward\u001b[39;49;00m = -\u001b[35mself\u001b[39;49;00m.\u001b[30mconfig\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mcrash_human_penalty\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mif\u001b[39;49;00m \u001b[30mvehicle\u001b[39;49;00m.\u001b[30mcrash_building\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mreward\u001b[39;49;00m = -\u001b[35mself\u001b[39;49;00m.\u001b[30mconfig\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mcrash_building_penalty\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "\u001b[37m\u001b[39;49;00m\n", " \u001b[30mstep_info\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mstep_reward\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m = \u001b[30mreward\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "\u001b[37m\u001b[39;49;00m\n", " \u001b[32m# termination reward\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mif\u001b[39;49;00m \u001b[35mself\u001b[39;49;00m.\u001b[30m_is_arrive_destination\u001b[39;49;00m\u001b[35m(\u001b[39;49;00m\u001b[30mvehicle\u001b[39;49;00m\u001b[35m)\u001b[39;49;00m \u001b[35mand\u001b[39;49;00m \u001b[35mnot\u001b[39;49;00m \u001b[35mself\u001b[39;49;00m.\u001b[30m_is_out_of_road\u001b[39;49;00m\u001b[35m(\u001b[39;49;00m\u001b[30mvehicle\u001b[39;49;00m\u001b[35m)\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mreward\u001b[39;49;00m = \u001b[35mself\u001b[39;49;00m.\u001b[30mconfig\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33msuccess_reward\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34melif\u001b[39;49;00m \u001b[35mself\u001b[39;49;00m.\u001b[30m_is_out_of_road\u001b[39;49;00m\u001b[35m(\u001b[39;49;00m\u001b[30mvehicle\u001b[39;49;00m\u001b[35m)\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mreward\u001b[39;49;00m = -\u001b[35mself\u001b[39;49;00m.\u001b[30mconfig\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mout_of_road_penalty\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "\u001b[37m\u001b[39;49;00m\n", " \u001b[32m# TODO LQY: all a callback to process these keys\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mstep_info\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mtrack_length\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m = \u001b[30mvehicle\u001b[39;49;00m.\u001b[30mnavigation\u001b[39;49;00m.\u001b[30mreference_trajectory\u001b[39;49;00m.\u001b[30mlength\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mstep_info\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mcarsize\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m = \u001b[35m[\u001b[39;49;00m\u001b[30mvehicle\u001b[39;49;00m.\u001b[30mWIDTH\u001b[39;49;00m\u001b[35m,\u001b[39;49;00m \u001b[30mvehicle\u001b[39;49;00m.\u001b[30mLENGTH\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[32m# add some new and informative keys\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mstep_info\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mroute_completion\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m = \u001b[30mvehicle\u001b[39;49;00m.\u001b[30mnavigation\u001b[39;49;00m.\u001b[30mroute_completion\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mstep_info\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mcurriculum_level\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m = \u001b[35mself\u001b[39;49;00m.\u001b[30mengine\u001b[39;49;00m.\u001b[30mcurrent_level\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mstep_info\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mscenario_index\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m = \u001b[35mself\u001b[39;49;00m.\u001b[30mengine\u001b[39;49;00m.\u001b[30mcurrent_seed\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mstep_info\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mlateral_dist\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m = \u001b[30mlateral_now\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "\u001b[37m\u001b[39;49;00m\n", " \u001b[30mstep_info\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mstep_reward_lateral\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m = \u001b[30mlateral_penalty\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mstep_info\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mstep_reward_heading\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m = \u001b[30mheading_penalty\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mstep_info\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mstep_reward_action_smooth\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m = \u001b[30msteering_range_penalty\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mstep_info\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33msteering_reward\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m = \u001b[30msteering_reward\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "\u001b[37m\u001b[39;49;00m\n", " \u001b[35mself\u001b[39;49;00m.\u001b[30mrecord_previous_agent_state\u001b[39;49;00m\u001b[35m(\u001b[39;49;00m\u001b[30mvehicle_id\u001b[39;49;00m\u001b[35m)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m \u001b[35mfloat\u001b[39;49;00m\u001b[35m(\u001b[39;49;00m\u001b[30mreward\u001b[39;49;00m\u001b[35m)\u001b[39;49;00m\u001b[35m,\u001b[39;49;00m \u001b[30mstep_info\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "\n" ] } ], "source": [ "from metaurban.envs import SidewalkStaticMetaUrbanEnv\n", "from metaurban.utils import print_source\n", "print_source(SidewalkStaticMetaUrbanEnv.reward_function)" ] }, { "cell_type": "markdown", "id": "cc81c42e", "metadata": {}, "source": [ "This reward function is composed of three parts as follows:\n", "\n", "$R = R_{term} + c_1 R_{disp} + c_2 R_{lateral} + c_3 R_{steering} + c_4 R_{crash}$\n", "\n", "- **Terminal reward** $R_{term}$: a sparse reward set to $+5$ if the vehicle reaches the destination, and $-5$ for out of route. If given $R_{term}\\neq 0$ at any time step $t$, the episode will be terminated at $t$ immediately. \n", "- **Displacement reward** $R_{disp}$: a dense reward defined as $R_{disp}=d_t-d_{t-1}$, wherein the $d_t$ and $d_{1}$ denote the longitudinal position of the ego agent in Frenet coordinates of current lane at time $t$ and $t-1$, respectively. We set the weight of $R_{disp}$ as $c_1=0.5$.\n", "\n", "- **Lateral reward** $R_{lateral}$: a dense reward defined as $R_{lateral}=-||l_t||$, wherein the $l_t$ denotes the lateral offset of the ego agent in Frenet coordinates of current lane at time $t$, which is designed to prevent agent driving on non walkable areas. We set the weight of $R_{lateral}$ as $c_2=1.0$.\n", "\n", "- **Steering smoothness reward** $R_{steering}$: a dense reward defined as $R_{steering}=-||s_t-s_{t-1}||\\cdot v_t$, wherein the $s_t$ and $s_{t-1}$ denotes the steering of the agent at $t$ and $t-1$, respectively. And $v_t$ denotes the speed of the agent at time $t$. This reward term is designed as a regularization to prevent the agent changing the steering too frequently. We set the weight of $R_{steering}$ as $c_3=0.1$.\n", "\n", "- **Crash reward** $R_{crash}$: a dense negative reward defined as $-1(c_{t})$, wherein the $c_{t}$ denotes the collision between agents and any other objects at time $t$ and $1(\\cdot)$ is the indicator function. It's notable we do not use the termination strategy for collision as in MetaDrive~\\citep{li2022metadrive}. We set the weight of $R_{crash}$ as $c_4=1.0$." ] }, { "cell_type": "markdown", "id": "e11c8234", "metadata": {}, "source": [ "## Cost Function\n", "\n", "Similar to the reward function, we also provide default cost function to measure the safety during driving. The cost function will be placed in the returned information dict as `info[\"cost\"]` after `env.step` function.\n", "\n", "- `crash_vehicle_cost = 1.0`: yield cost when crashing to other vehicles.\n", "- `crash_human_cost = 1.0`: yield cost when crashing to other vehicles.\n", "- `crash_object_cost = 1.0`: yield cost when crashing to objects, such as cones and triangles.\n", "\n", "The implementation of cost function is simple:" ] }, { "cell_type": "code", "execution_count": 3, "id": "227b6983", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mdef\u001b[39;49;00m\u001b[37m \u001b[39;49;00m\u001b[35mcost_function\u001b[39;49;00m\u001b[35m(\u001b[39;49;00m\u001b[35mself\u001b[39;49;00m\u001b[35m,\u001b[39;49;00m \u001b[30mvehicle_id\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m \u001b[35mstr\u001b[39;49;00m\u001b[35m)\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mvehicle\u001b[39;49;00m = \u001b[35mself\u001b[39;49;00m.\u001b[30magents\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[30mvehicle_id\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mstep_info\u001b[39;49;00m = \u001b[35mdict\u001b[39;49;00m\u001b[35m(\u001b[39;49;00m\u001b[35m)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mstep_info\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mcost\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m = \u001b[34m0\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mif\u001b[39;49;00m \u001b[35mself\u001b[39;49;00m.\u001b[30m_is_out_of_road\u001b[39;49;00m\u001b[35m(\u001b[39;49;00m\u001b[30mvehicle\u001b[39;49;00m\u001b[35m)\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mstep_info\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mcost\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m = \u001b[35mself\u001b[39;49;00m.\u001b[30mconfig\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mout_of_road_cost\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34melif\u001b[39;49;00m \u001b[30mvehicle\u001b[39;49;00m.\u001b[30mcrash_vehicle\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mstep_info\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mcost\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m = \u001b[35mself\u001b[39;49;00m.\u001b[30mconfig\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mcrash_vehicle_cost\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34melif\u001b[39;49;00m \u001b[30mvehicle\u001b[39;49;00m.\u001b[30mcrash_object\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mstep_info\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mcost\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m = \u001b[35mself\u001b[39;49;00m.\u001b[30mconfig\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mcrash_object_cost\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m \u001b[30mstep_info\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[33mcost\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m\u001b[35m,\u001b[39;49;00m \u001b[30mstep_info\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "\n" ] } ], "source": [ "from metaurban.utils import print_source\n", "from metaurban.envs import SidewalkStaticMetaUrbanEnv\n", "print_source(SidewalkStaticMetaUrbanEnv.cost_function)" ] }, { "cell_type": "markdown", "id": "6c6d4d7c", "metadata": {}, "source": [ "You can modify this function to add more information to the `step_info` dict. For example, you can log what kind of object raises this cost. Thus you can calculate how many cars the ego vehicle collides with in one episode by summing up the number of vehicle crashes in each step. " ] }, { "cell_type": "markdown", "id": "bc0f70d9", "metadata": {}, "source": [ "## Termination and Truncation\n", "\n", "MetaUrban will terminate an episode of a vehicle if:\n", "\n", "1. the target vehicle arrive its destination,\n", "2. the vehicle drives out of the road,\n", "3. the vehicle crashes to other agents (vehicles),\n", "4. the vehicle crashes to obstacles,\n", "5. the vehicle crashes to human, \n", "6. reach max step (horizon) limits.\n", "\n", "The above termination function is implemented as:" ] }, { "cell_type": "code", "execution_count": 4, "id": "96d6424d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mdef\u001b[39;49;00m\u001b[37m \u001b[39;49;00m\u001b[35mdone_function\u001b[39;49;00m\u001b[35m(\u001b[39;49;00m\u001b[35mself\u001b[39;49;00m\u001b[35m,\u001b[39;49;00m \u001b[30mvehicle_id\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m \u001b[35mstr\u001b[39;49;00m\u001b[35m)\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mvehicle\u001b[39;49;00m = \u001b[35mself\u001b[39;49;00m.\u001b[30magents\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[30mvehicle_id\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mdone\u001b[39;49;00m = \u001b[34mFalse\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mmax_step\u001b[39;49;00m = \u001b[35mself\u001b[39;49;00m.\u001b[30mconfig\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mhorizon\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m \u001b[35mis\u001b[39;49;00m \u001b[35mnot\u001b[39;49;00m \u001b[34mNone\u001b[39;49;00m \u001b[35mand\u001b[39;49;00m \u001b[35mself\u001b[39;49;00m.\u001b[30mepisode_lengths\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[30mvehicle_id\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m >= \u001b[35mself\u001b[39;49;00m.\u001b[30mconfig\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mhorizon\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mdone_info\u001b[39;49;00m = \u001b[35m{\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mTerminationState\u001b[39;49;00m.\u001b[30mCRASH_VEHICLE\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m \u001b[30mvehicle\u001b[39;49;00m.\u001b[30mcrash_vehicle\u001b[39;49;00m\u001b[35m,\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mTerminationState\u001b[39;49;00m.\u001b[30mCRASH_OBJECT\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m \u001b[30mvehicle\u001b[39;49;00m.\u001b[30mcrash_object\u001b[39;49;00m\u001b[35m,\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mTerminationState\u001b[39;49;00m.\u001b[30mCRASH_BUILDING\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m \u001b[30mvehicle\u001b[39;49;00m.\u001b[30mcrash_building\u001b[39;49;00m\u001b[35m,\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mTerminationState\u001b[39;49;00m.\u001b[30mCRASH_HUMAN\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m \u001b[30mvehicle\u001b[39;49;00m.\u001b[30mcrash_human\u001b[39;49;00m\u001b[35m,\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mTerminationState\u001b[39;49;00m.\u001b[30mCRASH_SIDEWALK\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m \u001b[30mvehicle\u001b[39;49;00m.\u001b[30mcrash_sidewalk\u001b[39;49;00m\u001b[35m,\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mTerminationState\u001b[39;49;00m.\u001b[30mOUT_OF_ROAD\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m \u001b[35mself\u001b[39;49;00m.\u001b[30m_is_out_of_road\u001b[39;49;00m\u001b[35m(\u001b[39;49;00m\u001b[30mvehicle\u001b[39;49;00m\u001b[35m)\u001b[39;49;00m\u001b[35m,\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mTerminationState\u001b[39;49;00m.\u001b[30mSUCCESS\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m \u001b[35mself\u001b[39;49;00m.\u001b[30m_is_arrive_destination\u001b[39;49;00m\u001b[35m(\u001b[39;49;00m\u001b[30mvehicle\u001b[39;49;00m\u001b[35m)\u001b[39;49;00m \u001b[35mand\u001b[39;49;00m \u001b[35mnot\u001b[39;49;00m \u001b[35mself\u001b[39;49;00m.\u001b[30m_is_out_of_road\u001b[39;49;00m\u001b[35m(\u001b[39;49;00m\u001b[30mvehicle\u001b[39;49;00m\u001b[35m)\u001b[39;49;00m\u001b[35m,\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mTerminationState\u001b[39;49;00m.\u001b[30mMAX_STEP\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m \u001b[30mmax_step\u001b[39;49;00m\u001b[35m,\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mTerminationState\u001b[39;49;00m.\u001b[30mENV_SEED\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m \u001b[35mself\u001b[39;49;00m.\u001b[30mcurrent_seed\u001b[39;49;00m\u001b[35m,\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[32m# TerminationState.CURRENT_BLOCK: self.agent.navigation.current_road.block_ID(),\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[32m# crash_vehicle=False, crash_object=False, crash_building=False, out_of_road=False, arrive_dest=False,\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[35m}\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "\u001b[37m\u001b[39;49;00m\n", " \u001b[32m# for compatibility\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[32m# crash almost equals to crashing with vehicles\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mdone_info\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[30mTerminationState\u001b[39;49;00m.\u001b[30mCRASH\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m = \u001b[35m(\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mdone_info\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[30mTerminationState\u001b[39;49;00m.\u001b[30mCRASH_VEHICLE\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m \u001b[35mor\u001b[39;49;00m \u001b[30mdone_info\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[30mTerminationState\u001b[39;49;00m.\u001b[30mCRASH_OBJECT\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[35mor\u001b[39;49;00m \u001b[30mdone_info\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[30mTerminationState\u001b[39;49;00m.\u001b[30mCRASH_BUILDING\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m \u001b[35mor\u001b[39;49;00m \u001b[30mdone_info\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[30mTerminationState\u001b[39;49;00m.\u001b[30mCRASH_SIDEWALK\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[35mor\u001b[39;49;00m \u001b[30mdone_info\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[30mTerminationState\u001b[39;49;00m.\u001b[30mCRASH_HUMAN\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[35m)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "\u001b[37m\u001b[39;49;00m\n", " \u001b[32m# determine env return\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mif\u001b[39;49;00m \u001b[30mdone_info\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[30mTerminationState\u001b[39;49;00m.\u001b[30mSUCCESS\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mdone\u001b[39;49;00m = \u001b[34mTrue\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[35mself\u001b[39;49;00m.\u001b[30mlogger\u001b[39;49;00m.\u001b[30minfo\u001b[39;49;00m\u001b[35m(\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[33m\"\u001b[39;49;00m\u001b[33mEpisode ended! Scenario Index: \u001b[39;49;00m\u001b[33m{}\u001b[39;49;00m\u001b[33m Reason: arrive_dest.\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m.\u001b[30mformat\u001b[39;49;00m\u001b[35m(\u001b[39;49;00m\u001b[35mself\u001b[39;49;00m.\u001b[30mcurrent_seed\u001b[39;49;00m\u001b[35m)\u001b[39;49;00m\u001b[35m,\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mextra\u001b[39;49;00m=\u001b[35m{\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mlog_once\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m \u001b[34mTrue\u001b[39;49;00m\u001b[35m}\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[35m)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mif\u001b[39;49;00m \u001b[30mdone_info\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[30mTerminationState\u001b[39;49;00m.\u001b[30mOUT_OF_ROAD\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mdone\u001b[39;49;00m = \u001b[34mTrue\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[35mself\u001b[39;49;00m.\u001b[30mlogger\u001b[39;49;00m.\u001b[30minfo\u001b[39;49;00m\u001b[35m(\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[33m\"\u001b[39;49;00m\u001b[33mEpisode ended! Scenario Index: \u001b[39;49;00m\u001b[33m{}\u001b[39;49;00m\u001b[33m Reason: out_of_road.\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m.\u001b[30mformat\u001b[39;49;00m\u001b[35m(\u001b[39;49;00m\u001b[35mself\u001b[39;49;00m.\u001b[30mcurrent_seed\u001b[39;49;00m\u001b[35m)\u001b[39;49;00m\u001b[35m,\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mextra\u001b[39;49;00m=\u001b[35m{\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mlog_once\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m \u001b[34mTrue\u001b[39;49;00m\u001b[35m}\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[35m)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mif\u001b[39;49;00m \u001b[30mdone_info\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[30mTerminationState\u001b[39;49;00m.\u001b[30mCRASH_VEHICLE\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m \u001b[35mand\u001b[39;49;00m \u001b[35mself\u001b[39;49;00m.\u001b[30mconfig\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mcrash_vehicle_done\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mdone\u001b[39;49;00m = \u001b[34mTrue\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[35mself\u001b[39;49;00m.\u001b[30mlogger\u001b[39;49;00m.\u001b[30minfo\u001b[39;49;00m\u001b[35m(\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[33m\"\u001b[39;49;00m\u001b[33mEpisode ended! Scenario Index: \u001b[39;49;00m\u001b[33m{}\u001b[39;49;00m\u001b[33m Reason: crash vehicle \u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m.\u001b[30mformat\u001b[39;49;00m\u001b[35m(\u001b[39;49;00m\u001b[35mself\u001b[39;49;00m.\u001b[30mcurrent_seed\u001b[39;49;00m\u001b[35m)\u001b[39;49;00m\u001b[35m,\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mextra\u001b[39;49;00m=\u001b[35m{\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mlog_once\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m \u001b[34mTrue\u001b[39;49;00m\u001b[35m}\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[35m)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mif\u001b[39;49;00m \u001b[30mdone_info\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[30mTerminationState\u001b[39;49;00m.\u001b[30mCRASH_OBJECT\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m \u001b[35mand\u001b[39;49;00m \u001b[35mself\u001b[39;49;00m.\u001b[30mconfig\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mcrash_object_done\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mdone\u001b[39;49;00m = \u001b[34mTrue\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[35mself\u001b[39;49;00m.\u001b[30mlogger\u001b[39;49;00m.\u001b[30minfo\u001b[39;49;00m\u001b[35m(\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[33m\"\u001b[39;49;00m\u001b[33mEpisode ended! Scenario Index: \u001b[39;49;00m\u001b[33m{}\u001b[39;49;00m\u001b[33m Reason: crash object \u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m.\u001b[30mformat\u001b[39;49;00m\u001b[35m(\u001b[39;49;00m\u001b[35mself\u001b[39;49;00m.\u001b[30mcurrent_seed\u001b[39;49;00m\u001b[35m)\u001b[39;49;00m\u001b[35m,\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mextra\u001b[39;49;00m=\u001b[35m{\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mlog_once\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m \u001b[34mTrue\u001b[39;49;00m\u001b[35m}\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[35m)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mif\u001b[39;49;00m \u001b[30mdone_info\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[30mTerminationState\u001b[39;49;00m.\u001b[30mCRASH_BUILDING\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m \u001b[35mand\u001b[39;49;00m \u001b[35mself\u001b[39;49;00m.\u001b[30mconfig\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mcrash_building_done\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mdone\u001b[39;49;00m = \u001b[34mTrue\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[35mself\u001b[39;49;00m.\u001b[30mlogger\u001b[39;49;00m.\u001b[30minfo\u001b[39;49;00m\u001b[35m(\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[33m\"\u001b[39;49;00m\u001b[33mEpisode ended! Scenario Index: \u001b[39;49;00m\u001b[33m{}\u001b[39;49;00m\u001b[33m Reason: crash building \u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m.\u001b[30mformat\u001b[39;49;00m\u001b[35m(\u001b[39;49;00m\u001b[35mself\u001b[39;49;00m.\u001b[30mcurrent_seed\u001b[39;49;00m\u001b[35m)\u001b[39;49;00m\u001b[35m,\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mextra\u001b[39;49;00m=\u001b[35m{\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mlog_once\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m \u001b[34mTrue\u001b[39;49;00m\u001b[35m}\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[35m)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mif\u001b[39;49;00m \u001b[30mdone_info\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[30mTerminationState\u001b[39;49;00m.\u001b[30mCRASH_HUMAN\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m \u001b[35mand\u001b[39;49;00m \u001b[35mself\u001b[39;49;00m.\u001b[30mconfig\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mcrash_human_done\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mdone\u001b[39;49;00m = \u001b[34mTrue\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[35mself\u001b[39;49;00m.\u001b[30mlogger\u001b[39;49;00m.\u001b[30minfo\u001b[39;49;00m\u001b[35m(\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[33m\"\u001b[39;49;00m\u001b[33mEpisode ended! Scenario Index: \u001b[39;49;00m\u001b[33m{}\u001b[39;49;00m\u001b[33m Reason: crash human\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m.\u001b[30mformat\u001b[39;49;00m\u001b[35m(\u001b[39;49;00m\u001b[35mself\u001b[39;49;00m.\u001b[30mcurrent_seed\u001b[39;49;00m\u001b[35m)\u001b[39;49;00m\u001b[35m,\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mextra\u001b[39;49;00m=\u001b[35m{\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mlog_once\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m \u001b[34mTrue\u001b[39;49;00m\u001b[35m}\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[35m)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mif\u001b[39;49;00m \u001b[30mdone_info\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[30mTerminationState\u001b[39;49;00m.\u001b[30mMAX_STEP\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[32m# single agent horizon has the same meaning as max_step_per_agent\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mif\u001b[39;49;00m \u001b[35mself\u001b[39;49;00m.\u001b[30mconfig\u001b[39;49;00m\u001b[35m[\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mtruncate_as_terminate\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m]\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mdone\u001b[39;49;00m = \u001b[34mTrue\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[35mself\u001b[39;49;00m.\u001b[30mlogger\u001b[39;49;00m.\u001b[30minfo\u001b[39;49;00m\u001b[35m(\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[33m\"\u001b[39;49;00m\u001b[33mEpisode ended! Scenario Index: \u001b[39;49;00m\u001b[33m{}\u001b[39;49;00m\u001b[33m Reason: max step \u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m.\u001b[30mformat\u001b[39;49;00m\u001b[35m(\u001b[39;49;00m\u001b[35mself\u001b[39;49;00m.\u001b[30mcurrent_seed\u001b[39;49;00m\u001b[35m)\u001b[39;49;00m\u001b[35m,\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[30mextra\u001b[39;49;00m=\u001b[35m{\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mlog_once\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[35m:\u001b[39;49;00m \u001b[34mTrue\u001b[39;49;00m\u001b[35m}\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[35m)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m \u001b[30mdone\u001b[39;49;00m\u001b[35m,\u001b[39;49;00m \u001b[30mdone_info\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "\n" ] } ], "source": [ "print_source(SidewalkStaticMetaUrbanEnv.done_function)" ] }, { "cell_type": "markdown", "id": "42e7c8a7", "metadata": {}, "source": [ "## Step Information\n", "\n", "The step information dict `info` contains rich information about current state of the environment and the target vehicle. \n", "The step info is collected from various sources such as the engine, reward function, termination function, traffic manager, agent manager and so on.\n", "We summarize the dict as follows:\n", "```\n", " {\n", " # Number of vehicles being overtaken by ego vehicle in this episode\n", " 'overtake_vehicle_num': 0,\n", "\n", " # Current velocity in km/h\n", " 'velocity': 0.0,\n", "\n", " # The current normalized steering signal in [-1, 1]\n", " 'steering': -0.06901532411575317,\n", "\n", " # The current normalized acceleration signal in [-1, 1]\n", " 'acceleration': -0.2931942343711853,\n", "\n", " # The normalized action after clipped who is applied to the ego vehicle\n", " 'raw_action': (-0.06901532411575317, -0.2931942343711853),\n", "\n", " # Whether crash to vehicle / object / building\n", " 'crash_vehicle': False,\n", " 'crash_object': False,\n", " 'crash_building': False,\n", " 'crash': False, # Whether any kind of crash happens\n", "\n", " # Whether going out of the road / arrive destination\n", " # or exceeding the maximal episode length\n", " 'out_of_road': False,\n", " 'arrive_dest': False,\n", " 'max_step': False,\n", "\n", " # The reward in this time step / the whole episode so far\n", " 'step_reward': 0.0,\n", " 'episode_reward': 0.0,\n", "\n", " # The cost in this time step\n", " 'cost': 0,\n", "\n", " # The length of current episode\n", " 'episode_length': 1\n", " }\n", "```\n", "\n", "The content of this dict keeps updating, and thus the content above may be out of date.\n", "We encourage users to write customized data to this dict, so more status can be exposed to monitor the simulation even without visualization. " ] }, { "cell_type": "markdown", "id": "0a339830", "metadata": {}, "source": [ "## Customization\n", "To compose your own reward, cost and termination function. Just make a new environment and override the `reward_function`, `cost_function`, and `termination_function` of the base environment class. You can also record more information in `step_info` returned by these functions and deliver it outside the simulator." ] }, { "cell_type": "code", "execution_count": 1, "id": "b2a81768", "metadata": { "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\u001b[38;20m[INFO] Environment: MyEnv\u001b[0m\n", "\u001b[38;20m[INFO] MetaUrban version: 0.0.1\u001b[0m\n", "\u001b[38;20m[INFO] Sensors: [lidar: Lidar(), side_detector: SideDetector(), lane_line_detector: LaneLineDetector()]\u001b[0m\n", "\u001b[38;20m[INFO] Render Mode: none\u001b[0m\n", "\u001b[38;20m[INFO] Horizon (Max steps per agent): None\u001b[0m\n", "\u001b[38;20m[INFO] Assets version: 0.0.1\u001b[0m\n", "\u001b[38;20m[INFO] Known Pipes: glxGraphicsPipe\u001b[0m\n", "\u001b[38;20m[INFO] Start Scenario Index: 0, Num Scenarios : 1\u001b[0m\n", "\u001b[33;20m[WARNING] Not set var:walk_on_all_regions, so that agents can walk on all regions (orca_navigation.py:561)\u001b[0m\n", "\u001b[38;20m[INFO] Agents can walk on all regions\u001b[0m\n", "\u001b[38;20m[INFO] Agents can walk on all regions\u001b[0m\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "reward: -10, `is_customized` in info: True\n" ] } ], "source": [ "from metaurban.envs import SidewalkStaticMetaUrbanEnv\n", "\n", "class MyEnv(SidewalkStaticMetaUrbanEnv):\n", " \n", " def reward_function(*args, **kwargs):\n", " return -10, {\"is_customized\": True}\n", " \n", "env=MyEnv({'object_density': 0.1})\n", "env.reset()\n", "_,r,_,_,info = env.step([0,0])\n", "assert r==-10 and info[\"is_customized\"]\n", "print(\"reward: {}, `is_customized` in info: {}\".format(r, info[\"is_customized\"]))\n", "env.close()" ] } ], "metadata": { "kernelspec": { "display_name": "metaurban", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.16" }, "mystnb": { "execution_mode": "off" } }, "nbformat": 4, "nbformat_minor": 5 }