The models mimic the environment, given a state and an action, the model should predict the resulting state and next reward. The models are used to plan, that is, decide on a course of action involving future situations before they arise. The addition of RL models and planning is a recent development, classical RL can be viewed as anti-planning. Now it is clear that RL methods are closely related to dynamic programming methods. So the RL algorithms can be seen on a continuum between the trial and error strategies and deliberative planning.