Hacking Mandarin

· 5 min read

I have begun taking my Mandarin study more seriously since my arrival to Taiwan. My first milestone will be to reach a point where I have enough of a foundation upon which to build my vocabulary and grammar by conversing and consuming media — a much more engaging method of learning.

Spaced repetition algorithms have been shown to be optimal for memorization. Excited to learn, I found an Anki deck with the top 3000 most frequently used traditional characters.

I thought, “okay, I’ll just study this deck go from there. I earned a bachelor of memorization about a decade ago — I’ll have this down before I’m out of quarantine.” I couldn’t have been more wrong.

By my fourth day of study, this idea began to fall apart. I wasn’t just memorizing one-to-one — a character is displayed and I needed to match it to some definitions, the romanized transcription of the sound (pinyin), and pronounce it correctly. New cards were being introduced faster than I could keep up. There were already a few duplicate sounds within the first 50 most frequent characters. Nailing down all three became insurmountable.

I revisited my priorities: listening, speaking, and reading. In that order. I’m okay with being limited to digital writing. I often hear that mastering tones has the greatest impact to fluency. There are only 1612 syllables and being able to hear and speak them clearly is imperative. There exist websites that list all possible sounds with audio samples. If I can tune my ear to hear the difference and make the sounds myself, I believe I could reach my goal of reaching a more engaging learning style.

I found a few pinyin charts and settled on one with the highest quality audio samples. I clicked to listen to less than ten sounds before thinking, “this is too cumbersome.” I want to grind.

So I did what any self-respecting person would do. I cracked open Firefox Developer Tools and switched to the Network tab. Ah! Just as I had hoped, each sound in its own file and conveniently named with the pinyin.

I switched over to the Inspector tab to see if the filenames are embedded in the HTML or some JavaScript. Beauty! They are embedded within the HTML. I can work with this. I copy the inner HTML of the chart to a file on my machine. If I could tease out the filenames from the markup then I could download and process them in a way that is optimal for my self education.

I reached for sed because the regular expression for matching a relative filename was simple enough — [a-z0-9]+\.mp3 — but then I realized I wanted the opposite. I wanted to replace everything but the match. I’d still like to know how to do that in one pass but I have some Mandarin to learn so instead I preceded the filename with the newline character and cleaned up the remnants to get a list of filenames, one per line.

$ sed 's/\([a-z0-9]\+\.mp3\)/\n\1/g' pinyin.html | \
    sed 's/mp3.*$/mp3/g' > files.txt

There was one line of junk remaining at the top of my imperfect solution so I went ahead and manually remove it. Now, these are relative filenames. If I’m going to download them, I need fully qualified URLs.

$ sed 's/^/https:\/\/example\.com\/sounds\//g' files.txt > urls.txt

Great! They look good — let’s download them all.

$ xargs -n 1 curl -O < urls.txt

My goal was to train my ears and mouth. I wanted Mandarin bootcamp! I played through the directory of sounds and realized that I needed a one second gap between each, giving me enough time to repeat what I heard.

I learned that ffmpeg can concatenate audio files by operating on a list that looks similar to the files.txt I created earlier. I used my text editor to sort and wrap them with a gap.mp3 inserted between each.

$ cat ffmpeg.txt
file 'a1.mp3'
file 'gap.mp3'
file 'a2.mp3'
file 'gap.mp3'

I created an audio file with one second of silence and stitched it all together.

$ ffmpeg -f lavfi -i anullsrc=channel_layout=5.1:sample_rate=44100 -t 1 gap.mp3
$ ffmpeg -f concat -i ffmpeg.txt -c copy pinyin.mp3

I completed my first run through in just under an hour. These are shapes and sounds I don’t usually make. My mouth was sore — not pain, but the feeling of knowing you gave it your all when you finish a tough workout. I noticed immediate improvement. I was previously struggling with the second tone but after 1612 tries, I was bound to improve. This approach also helped bring awareness to other areas I need to focus on.

I wasn’t fully satisfied with the audio. The original clips range from approximately 0.8 seconds to 1.3 seconds in duration and my audio player was outputting errors on varying sample rates. Let’s fix this up.

$ for f in *.mp3; do ffmpeg -i $f -ar 44100 -af apad=pad_dur=1 -t 1 out/$f; done
$ mv out/* .
$ rm -rf out
$ ffmpeg -f concat -i ffmpeg.txt -c copy pinyin.mp3

Consistent sample durations allow me to get into a rhythmic meditation. I follow along with a script on my $PATH named pinyin that outputs the pinyin of all sounds in the order they are played back. I extended the script to support playing individual sounds.

#!/usr/bin/env sh


if [ $# -eq 0 ]; then
  basename --multiple "$dir/sounds/"* | sed 's/\.mp3//' | less
  exit 0

if [ $# -eq 1 -a "$1" = 'play' ]; then
  mpv --no-video "$dir/pinyin.mp3"
  exit 0

for arg in "$@"; do
  if [ ! -f "$file" ]; then
    echo "$file not found" >&2
    exit 1
  mpv --no-terminal --no-video "$file"

I think adding a video track with the pinyin in a large font would help but I’m satisfied with this solution for now.