Prioritizing
The more frequently a morph occurs in a language, the more useful it is to learn. This is the fundamental principle behind AnkiMorphs--learn a language in the order that will be the most useful.
AnkiMorphs is a general purpose language learning tool, therefore, it has to be told which morphs occur most often. You
can do this in two ways, either have AnkiMorphs calculate the morph frequencies found in your
cards (Collection frequency
), or you can specify a custom .csv file that contains that information.
Custom Frequency Files
The custom .csv files follows this format:
- The first row contains column headers (AnkiMorphs skips reading this line).
- The second row and down contain morphs in descending order of frequency.
- The first column contains morph-lemmas, the second column contains morph-inflections (this is done to prevent morph collisions).
- All other columns are optional and are not read by AnkiMorphs.
Keep the files to 50K rows or fewer, any rows after that are ignored. The scoring algorithm needs to have a max limit on morph priorities to make it practical, hence this 50K limit.
Any .csv file located in the folder [anki profile folder]/frequency-files/
is
available for selection in note filters: morph priority.
Morph Collision
Inflected morphs can be identical even if they are derived from different lemmas (base), e.g.:
Lemma : Inflection
有る ある
或る ある
To prevent misinterpretation of the inflected morphs, we also store the lemmas.
Creating Your Own frequency.csv
You can use the Frequency File Generator or the Study Plan Generator to generate your own custom frequency file.
Downloadable Frequency Files
Note: Not all of these files contain the optional
Occurrence
column, but they still work just fine.
Cantonese
- zhh-freq.csv
- Source:
existingwordcount.csv
found on words.hk - analysis- Morphemizer:
AnkiMorphs: Chinese
Catalan
- ca-freq.csv
- Source:
cat_news_2022_300K-sentences.txt
found on wortschatz - catalan corpora- Morphemizer:
spaCy: ca-core-news-sm
Chinese
- zh-simplified-freq.csv
- Source:
zho_news_2020_300K-sentences.txt
found on wortschatz - chinese corpora- Morphemizer:
spaCy: zh-core-web-sm
Croatian
- hr-freq.csv
- Source:
hrv_news_2020_300K-sentences.txt
found on wortschatz - croatian corpora- Morphemizer:
spaCy: hr-core-news-sm
Danish
- da-freq.csv
- Source:
dan_news_2022_300K-sentences.txt
found on wortschatz - danish corpora- Morphemizer:
spaCy: da-core-news-sm
Dutch
- nl-freq.csv
- Source:
nld_news_2022_300K-sentences.txt
found on wortschatz - dutch corpora- Morphemizer:
spaCy: nl-core-news-sm
English
- en-wiki-freq.csv
- Source:
eng_wikipedia_2016_300K-sentences.txt
found on wortschatz - english corpora- Morphemizer:
spaCy: en-core-web-sm
Finnish
- fi-freq.csv
- Source:
fin_news_2022_300K-sentences.txt
found on wortschatz - finnish corpora- Morphemizer:
spaCy: fi-core-news-sm
French
- fr-freq.csv
- Source:
fra_news_2022_300K-sentences.txt
found on wortschatz - french corpora- Morphemizer:
spaCy: fr-core-news-sm
German
- de-freq.csv
- Source:
deu_news_2022_300K-sentences.txt
found on wortschatz - german corpora- Morphemizer:
spaCy: de-core-news-md
Greek (Modern)
- el-freq.csv
- Source:
ell_news_2022_300K-sentences.txt
found on wortschatz - modern greek corpora- Morphemizer:
spaCy: el-core-news-sm
Italian
- it-freq.csv
- Source:
ita_news_2022_300K-sentences.txt
found on wortschatz - italian corpora- Morphemizer:
spaCy: it-core-news-sm
Japanese
- Source:
jpn_news_2011_300K-sentences.txt
found on wortschatz - japanese corpora- Morphemizer:
AnkiMorphs: Japanese
- Source:
jpn_news_2011_300K-sentences.txt
found on wortschatz - japanese corpora- Morphemizer:
spaCy: ja_core_news_sm
- Source: NanakoRaws
- Morphemizer:
AnkiMorphs: Japanese
- Source: NanakoRaws
- Morphemizer:
spaCy: ja_core_news_sm
Korean
- ko-freq.csv
- Source:
kor_news_2022_300K-sentences.txt
found on wortschatz - korean corpora- Morphemizer:
spaCy: ko-core-news-sm
Lithuanian
- lt-freq.csv
- Source:
lit_news_2020_300K-sentences.txt
found on wortschatz - lithuanian corpora- Morphemizer:
spaCy: lt-core-news-sm
Macedonian
- mk-freq.csv
- Source:
mkd_newscrawl_2011_300K-sentences.txt
found on wortschatz - macedonian corpora- Morphemizer:
spaCy: mk-core-news-sm
Norwegian (Bokmål)
- nb-freq.csv
- Source:
nob_news_2013_300K-sentences.txt
found on wortschatz - norwegian corpora- Morphemizer:
spaCy: nb-core-news-sm
Polish
- pl-freq.csv
- Source:
pol_news_2022_300K-sentences.txt
found on wortschatz - polish corpora- Morphemizer:
spaCy: pl-core-news-sm
Portuguese
- pt-freq.csv
- Source:
por_news_2022_300K-sentences.txt
found on wortschatz - portuguese corpora- Morphemizer:
spaCy: pt-core-news-sm
Romanian
- ro-freq.csv
- Source:
ron_news_2022_300K-sentences.txt
found on wortschatz - romanian corpora- Morphemizer:
spaCy: ro-core-news-sm
Russian
- ru-freq.csv
- Source:
rus-ru_web-public_2019_300K-sentences.txt
found on wortschatz - russian corpora- Morphemizer:
spaCy: ru-core-news-sm
Slovenian
- sl-freq.csv
- Source:
slv_news_2020_300K-sentences.txt
found on wortschatz - slovenian corpora- Morphemizer:
spaCy: sl-core-news-sm
Spanish
- es-freq.csv
- Source:
spa_news_2022_300K-sentences.txt
found on wortschatz - spanish corpora- Morphemizer:
spaCy: es-core-news-sm
Swedish
- sv-freq.csv
- Source:
swe_news_2022_300K-sentences.txt
found on wortschatz - swedish corpora- Morphemizer:
spaCy: sv-core-news-sm
Ukrainian
- uk-freq.csv
- Source:
ukr_news_2022_300K-sentences.txt
found on wortschatz - ukrainian corpora- Morphemizer:
spaCy: uk-core-news-sm