Cart pole is a classical control environment. The main purpose of player is balancing the stick. All that player can do is to move right or left. Actions are discrete. The game ends when:
Video results:
The solution is based on deep q-learning algorithm. The model has the structure:
from tensorflow.keras.models import Sequential
import tensorflow.keras.layers as layers
def build_model(states, actions):
model = Sequential()
model.add(layers.Flatten(input_shape=(1,states)))
model.add(layers.Dense(64, activation="relu"))
model.add(layers.Dense(32, activation="relu"))
model.add(layers.Dense(actions, activation="linear"))
return model
The model was trained on 100000 steps with Adam optimizer(lr=1e-3) and BoltzmannQPolicy, also using checkpoint callback to save the model that achieve the best rewards. Callbacks were taken from here.