It was invented in 1994 by Leo Breiman, a retired Berklee professor. The idea is to keep randomly choosing subsets of the training set and training models on them. We store them to use for prediction later. Having accumulated some amount of trained models, we run them on our data and take the mean of their predictions. The process in which we create each subset of a dataset to train on is called bootstrapping (in this sense, bagging is bootstrap aggregation).
The reason this works so well, is that despite they models are not trained on the full dataset and are thus not as accurate as they could be, the errors they make are uncorrelated. That means that upon taking the average of the models, the errors cancel out (the average is zero) and we obtain a good result - better than should the prediction come from a single model.