Prioritizing
The more frequently a morph occurs in a language, the more useful it is to learn. This is the fundamental principle behind AnkiMorphs--learn a language in the order that will be the most useful.
AnkiMorphs is a general purpose language learning tool, therefore, it has to be told which morphs occur most often. You
can do this in two ways, either have AnkiMorphs calculate the morph frequencies found in your
cards (Collection frequency
), or you can specify a custom .csv
file that contains that information.
Any .csv
file located in the folder [anki profile folder]/priority-files/
is
available for selection in note filters: morph priority.
But before we outline the custom priority files, we have to discuss morph lemmas and inflections.
Lemmas or Inflections?
There are scenarios where you might not want to give each individual inflection a separate priority:
-
Chinese technically does not have inflections, so any inflection data is artificial and leads to a wasteful use of resources.
-
Korean has an extreme number of inflections, leading to an explosion of priorities, which creates disproportionate penalties.
-
You might feel that you have a good enough grasp of the grammar of inflections, making it unnecessary to prioritize one over another.
If you never want to give separate priorities to inflections then you should choose a lemma only priority file. If you do care about inflection priorities, or if you might want to switch to lemma priorities on the fly, then choose an inflection priority file.
Custom Priority Files
You can use the Priority File Generator or the Study Plan Generator to create your own custom priority file, or you can download some pre-made ones at the bottom of this page.
Note:
- The
Occurrences
column is optional- Any lines after 1 million will be ignored by AnkiMorphs
Custom Lemma Priority Files
The lemma only
priority files follows this format:
- The first row contains column headers.
- The second row and down contain morph lemmas in descending order of frequency.
Custom Inflection Priority Files
The inflection
priority files follows this format:
- The first row contains column headers.
- The second row and down contain morphs in descending order of inflection frequency.
- The first column contains morph-lemmas, the second column contains morph-inflections (this is done to prevent morph collisions).
- The third column contains lemma priorities, the fourth column contains inflection priorities (this is done to so you can switch morph evaluation on the fly).
Morph Collision
Inflected morphs can be identical even if they are derived from different lemmas (base), e.g.:
Lemma : Inflection
有る ある
或る ある
To prevent misinterpretation of the inflected morphs, we also store the lemmas.
Downloadable Priority Files
Unless otherwise stated, these are inflection
priority files, generated using a 90% comprehension cutoff.
Cantonese
Note: This is a lemma only priority file that was not generated using AnkiMorphs, so it might not work very well (or at all).
- zhh-freq.csv
- Source:
existingwordcount.csv
found on words.hk - analysis
Catalan
- ca-news-priority.csv
- Source:
cat_news_2022_300K-sentences.txt
found on wortschatz - catalan corpora- Morphemizer:
spaCy: ca-core-news-sm
Chinese
Note: this is a lemma only priority file.
- zh-news-lemma-priority.csv
- Source:
zho_news_2020_300K-sentences.txt
found on wortschatz - chinese corpora- Morphemizer:
AnkiMorphs: Chinese
Croatian
- hr-news-priority.csv
- Source:
hrv_news_2020_300K-sentences.txt
found on wortschatz - croatian corpora- Morphemizer:
spaCy: hr-core-news-sm
Danish
- da-news-priority.csv
- Source:
dan_news_2022_300K-sentences.txt
found on wortschatz - danish corpora- Morphemizer:
spaCy: da-core-news-sm
Dutch
- nl-news-priority.csv
- Source:
nld_news_2022_300K-sentences.txt
found on wortschatz - dutch corpora- Morphemizer:
spaCy: nl-core-news-sm
English
- en-wiki-priority.csv
- Source:
eng_wikipedia_2016_300K-sentences.txt
found on wortschatz - english corpora- Morphemizer:
spaCy: en-core-web-sm
Finnish
- fi-news-priority.csv
- Source:
fin_news_2022_300K-sentences.txt
found on wortschatz - finnish corpora- Morphemizer:
spaCy: fi-core-news-sm
French
- fr-news-priority.csv
- Source:
fra_news_2022_300K-sentences.txt
found on wortschatz - french corpora- Morphemizer:
spaCy: fr-core-news-sm
German
- de-news-priority.csv
- Source:
deu_news_2022_300K-sentences.txt
found on wortschatz - german corpora- Morphemizer:
spaCy: de-core-news-md
Greek (Modern)
- el-news-priority.csv
- Source:
ell_news_2022_300K-sentences.txt
found on wortschatz - modern greek corpora- Morphemizer:
spaCy: el-core-news-sm
Italian
- it-news-priority.csv
- Source:
ita_news_2022_300K-sentences.txt
found on wortschatz - italian corpora- Morphemizer:
spaCy: it-core-news-sm
Japanese
- ja-news-priority.csv
- Source:
jpn_news_2011_300K-sentences.txt
found on wortschatz - japanese corpora- Morphemizer:
AnkiMorphs: Japanese
- ja-anime-priority.csv
- Source: NanakoRaws
- Morphemizer:
AnkiMorphs: Japanese
Korean
Note: this is a lemma only priority file.
- ko-news-lemma-priority.csv
- Source:
kor_news_2022_300K-sentences.txt
found on wortschatz - korean corpora- Morphemizer:
spaCy: ko-core-news-sm
Lithuanian
- lt-news-priority.csv
- Source:
lit_news_2020_300K-sentences.txt
found on wortschatz - lithuanian corpora- Morphemizer:
spaCy: lt-core-news-sm
Macedonian
- mk-news-priority.csv
- Source:
mkd_newscrawl_2011_300K-sentences.txt
found on wortschatz - macedonian corpora- Morphemizer:
spaCy: mk-core-news-sm
Norwegian (Bokmål)
- nb-news-priority.csv
- Source:
nob_news_2013_300K-sentences.txt
found on wortschatz - norwegian corpora- Morphemizer:
spaCy: nb-core-news-sm
Polish
- pl-news-priority.csv
- Source:
pol_news_2022_300K-sentences.txt
found on wortschatz - polish corpora- Morphemizer:
spaCy: pl-core-news-sm
Portuguese
- pt-news-priority.csv
- Source:
por_news_2022_300K-sentences.txt
found on wortschatz - portuguese corpora- Morphemizer:
spaCy: pt-core-news-sm
Romanian
- ro-news-priority.csv
- Source:
ron_news_2022_300K-sentences.txt
found on wortschatz - romanian corpora- Morphemizer:
spaCy: ro-core-news-sm
Russian
- ru-web-priority.csv
- Source:
rus-ru_web-public_2019_300K-sentences.txt
found on wortschatz - russian corpora- Morphemizer:
spaCy: ru-core-news-sm
Slovenian
- sl-news-priority.csv
- Source:
slv_news_2020_300K-sentences.txt
found on wortschatz - slovenian corpora- Morphemizer:
spaCy: sl-core-news-sm
Spanish
- es-news-priority.csv
- Source:
spa_news_2022_300K-sentences.txt
found on wortschatz - spanish corpora- Morphemizer:
spaCy: es-core-news-sm
Swedish
- sv-news-priority.csv
- Source:
swe_news_2022_300K-sentences.txt
found on wortschatz - swedish corpora- Morphemizer:
spaCy: sv-core-news-sm
Ukrainian
- uk-news-priority.csv
- Source:
ukr_news_2022_300K-sentences.txt
found on wortschatz - ukrainian corpora- Morphemizer:
spaCy: uk-core-news-sm