Group-name-version-lang1-lang2
(JavaScript flavored) Regular
expressions are welcome!
Examples:
.*crawl
matches both commoncrawl and paracrawl-eng$|-eng-
matches all English datasets without a country code-eng(_US)?$|-eng(_US)?-
matches all English US datasets-eng(_[A-Z]{2})?$|-eng(_[A-Z]{2})?-
matches all English datasets, regardless of country
code