gym

Project maintained by masalskyi Hosted on GitHub Pages — Theme by mattgraham

Mountain car v0

Mountain Car is a classical control environment. The main purpose of player is to achieve the flag. All that player can do is to apply force to right or to left or not to move at all. The game ends when:

The position of the car is greater than or equal to 0.5 (the goal position on top of the right hill)
The length of the episode is 200.

Video results:

The solution is based on deep q-learning algorithm. The model has the structure:

from tensorflow.keras.models import Model
import tensorflow.keras.layers as layers
def build_model(states, actions):
    inputs = layers.Input(shape=(1, states))
    x = layers.Dense(64, activation="relu") (inputs)
    x = layers.Dense(64, activation="relu") (x)
    x = layers.Flatten()(x)
    outputs = layers.Dense(actions, activation="linear")(x)
    return Model(inputs, outputs, name="mountain_car_player")

The model was trained on 100000 steps with Adam optimizer(lr=1e-3) and BoltzmannQPolicy, also using checkpoint callback to save the model that achieve the best rewards. The main problem of the game is a reward function. It gives -1 for every step that player not achieved a goal. But it does not provide any usefull information of how to do this. Using a theoretical materials from paper, and medium post I performed a reward shaping in a such way:

from gym.envs.classic_control.mountain_car import MountainCarEnv

class MountainCarModifiedReward(MountainCarEnv):
    def step(self, action: int):
        previous_state = self.state
        new_state, reward, done, info = super().step(action)
        modified_reward = reward + 300 * (0.95 * abs(new_state[1]) - abs(previous_state[1]))
        if new_state[0] >= 0.5:
            modified_reward += 100
        return new_state, modified_reward, done, info

The main idea was that to climb a hill the car need a big velocity, so i used a potential reward shaping to stimulate a high velocity. I also add +100 reward for actual winning a game to stimulate to actually win a game but not to achieve a big score using only velocity rewards.