Simulate a new episode starting from
initial_state where an action of all agents in
agents have to be chose. The way they choose their actions depends on the implementation. When a final state is reached, learnEpisode shall be called for all the agents.
@param initial_state
@param agents