One of the great things about Sphinx, is that it natively provides internationalization (i18n) mechanisms to facilitate translation using the common gettext method. The gettext files can be translated a number of different ways, but I have had great results using Zanata, a separate open source project that facilitates community driven translation.
Note: RedHat has chosen to step away from Zanata and the last time I checked no one has stepped forward to carry the project forward. Regardless of the tool used, these concepts are still applicable.
Regardless of how the generated files are translated though, the process should end up being pretty similar:
The sphinx translation process using gettext
and zanata-cli
.
Preparation
A fairly common convention is to create a separate git repository where the internationalization process takes place. This usually means creating a submodule for the main documentation source repository, the docs directory in this case, or tying in the original sources repository using another method.
For the purposes of the article, the structure of the internationalization repository is shown below. The configuration section will cover what each of the files and folders are for.
.
├── build.sh
├── Makefile
├── zanata.xml
├── templates
├── locales
│ └── LC_MESSAGES
│ ├── de_DE
│ └── es
└── docs (submodule or similar)
└── source
├── conf.py
├── _static
└── index.rst
To prepare the Zanata side of things, you will need to sign up
for an account on their site, create a new project,
and project version. Most use master
or latest
for the main project version, but any
convention works. It’s also possible to have multiple versions to match the
different versions of your docs.
Note: I won’t cover it in this article, but you can manage the available languages for translation, translations, translators, reviewers, and much more for each project through the Zanata web interface.
Finally, in order to easily sync translations with Zanata using the command line, install the Zanata CLI client.
Configuration
The sphinx conf.py value that holds the path of the directory containing the .po files might need to be changed from it’s default depending on your repo’s folder structure. In order to keep the translation files separate from source, we’ll use:
locale_dirs = ['../../locales'] # relative to source directory
Next, configure authentication for zanata-cli
by creating zanata.ini
and also configure access to the project version by creating zanata.xml.
An example zanata.xml, once configured, would look something like:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<config xmlns="http://zanata.org/namespace/config/">
<url>https://translate.zanata.org/</url>
<project>my-docs-project</project>
<project-version>latest</project-version>
<project-type>gettext</project-type>
<src-dir>templates</src-dir>
<trans-dir>locales</trans-dir>
<rules>
<rule pattern="**/*.pot">{path}/{locale_with_underscore}/LC_MESSAGES/{filename}.po</rule>
</rules>
</config>
Building the docs
As shown in the initial diagram, the first step is to generate the .pot files
by invoking the gettext builder using make gettext
in the Makefile:
gettext:
$(SPHINXBUILD) -b gettext -t i18n ./docs/source/ ./templates/
@echo
@echo "Build finished. The message catalogs are in ../templates."
Next is to push the .pot files to Zanata:
zanata-cli push --disable-ssl-cert
After the translators have done sufficient work on certain languages, pull the .po files for those languages from Zanata by using the language flag. For example, pulling the German translations:
zanata-cli pull -l de-DE --disable-ssl-cert
Tip: Only building languages that reach a certain percentage of translation completion, say 70%, allows the document to be more easily read in that language.
Now it is possible to build with the latest translations for a language:
sphinx-build -b html -D language=de_DE ./docs/source/ ./build/de_DE/latest/
To simplify this process you can combine those steps in a shell script,
build.sh. You can even run zanata-cli
in non-interactive mode so that it
is more portable for your continuous deployment pipeline:
#!/bin/bash
# build .pot files
sphinx-build -b gettext -t i18n ./docs/source/ ./templates/
# push .pot files to Zanata
zanata-cli -B push --url https://translate.zanata.org/ --username YOUR_USERNAME --key YOUR_KEY --disable-ssl-cert
# pull latest .po files from Zanata for each translated language
for locale in de-DE es; do
zanata-cli -B pull --url https://translate.zanata.org/ --username YOUR_USERNAME --key YOUR_KEY -l $locale --disable-ssl-cert
done
# build translated docs for each language
for lang in de_DE es; do
sphinx-build -b html -D language=$lang ./docs/source/ ./build/$lang/latest
done
Optionally, in order to keep search engines from choosing a semi-translated language for English search results, a fairly common problem which may puzzle your users, you can use my sphinx-sitemap extension to auto-generate a multi-lingual sitemap for your documentation.
Note: Shout out to Frank Kloeker for helping me get through this the first time with his i18n docs.