Usually what people mean by mostly sorted in CS is that there is some small K such that each element in the input is no more than K places from the position it would be in if the input was sorted.
Well you could extend the definition to allow a small number of items which are entirely out of place. The point is that the right sort algorithm depends a lot on tthe distribution of input data and how much you care about worst-case vs average case trade offs.