HPLT 2.0 cleaned datasets by language families and the number of documents