Command line interface (CLI) reference

This page contains the full reference documentation for each command in the CLI. See also Command line interface (CLI) user guide for guidelines on using the CLI.

The ReadAlongs CLI has four key commands:

Each command can be run with -h or --help to display its usage manual, e.g., readalongs -h, readalongs align --help.

readalongs align

Align TEXTFILE and AUDIOFILE and create output files as OUTPUT_BASE.* in directory OUTPUT_BASE/.

TEXTFILE: Input text file path (in XML, or plain text with -i)

With -i, TEXTFILE should be plain text:
- The text in TEXTFILE should be plain UTF-8 text without any markup.
- Paragraph breaks are indicated by inserting one blank line.
- Page breaks are indicated by inserting two blank lines.
Without -i, TEXTFILE can be in one of three XML formats:
- the output of ‘readalongs prepare’,
- the output of ‘readalongs tokenize’, or
- the output of ‘readalongs g2p’.

One can add the known ARPABET phonetics in the XML for words (<w> elements) that are not correctly handled by g2p in the output of ‘readalongs tokenize’ or ‘readalongs g2p’, via the ARPABET attribute.

One can add anchor elements in the XML, e.g., ‘<anchor time=”2.345s”/>’, to mark known anchor points between the audio and text stream.

AUDIOFILE: Input audio file path, in any format supported by ffmpeg

OUTPUT_BASE: Output files will be saved as OUTPUT_BASE/OUTPUT_BASE.*

readalongs align [OPTIONS] TEXTFILE AUDIOFILE OUTPUT_BASE

Options

-b, --bare

Bare alignments do not split silences between words

-c, --config <config>

Use ReadAlong-Studio configuration file (in JSON format)

-C, --closed-captioning

Export sentences to WebVTT and SRT files

-d, --debug

Add debugging messages to logger

-f, --force-overwrite

Force overwrite output files

-i, --text-input

Input is plain text (otherwise it’s assumed to be XML)

-l, --language <language>

The language code for text in TEXTFILE (use only with -i, i.e., with plain text input)

Options

alq | atj | ckt | crg-dv | crg-tmd | crj | crk | crl | crm | csw | ctp | dan | eng | fra | git | gla | gwi | haa | ikt | iku | iku-sro | kkz | kwk-boas | kwk-napa | kwk-umista | lml | mic | moh | moh-festival | oji | oji-syl | see | srs | str | tau | tce | tgx | tli | ttm | und | und-ascii | win

-u, --unit <unit>

Unit (w = word, m = morpheme) to align to

Options

w | m

-s, --save-temps

Save intermediate stages of processing and temporary files (dictionary, FSG, tokenization, etc)

-t, --text-grid

Export to Praat TextGrid & ELAN eaf file

-H, --html

Export to a single-file HTML format

-x, --output-xhtml

Output simple XHTML instead of XML

--g2p-fallback <g2p_fallback>

Colon-separated list of fallback langs for g2p; enables the g2p cascade

--g2p-verbose

Display verbose g2p error messages

Arguments

TEXTFILE

Required argument

AUDIOFILE

Required argument

OUTPUT_BASE

Required argument

readalongs prepare

Prepare XMLFILE for ‘readalongs align’ from PLAINTEXTFILE. PLAINTEXTFILE must be plain text encoded in UTF-8, with one sentence per line, paragraph breaks marked by a blank line, and page breaks marked by two blank lines.

PLAINTEXTFILE: Path to the plain text input file, or - for stdin

XMLFILE: Path to the XML output file, or - for stdout [default: PLAINTEXTFILE.xml]

readalongs prepare [OPTIONS] PLAINTEXTFILE [XMLFILE]

Options

-d, --debug

Add debugging messages to logger

-f, --force-overwrite

Force overwrite output files

-l, --language <language>

Required The language code for text in PLAINTEXTFILE

Options

alq | atj | ckt | crg-dv | crg-tmd | crj | crk | crl | crm | csw | ctp | dan | eng | fra | git | gla | gwi | haa | ikt | iku | iku-sro | kkz | kwk-boas | kwk-napa | kwk-umista | lml | mic | moh | moh-festival | oji | oji-syl | see | srs | str | tau | tce | tgx | tli | ttm | und | und-ascii | win

Arguments

PLAINTEXTFILE

Required argument

XMLFILE

Optional argument

readalongs tokenize

Tokenize XMLFILE for ‘readalongs align’ into TOKFILE. XMLFILE should have been produced by ‘readalongs prepare’. TOKFILE can then be augmented with word-specific language codes. ‘readalongs align’ can be called with either XMLFILE or TOKFILE as XML input.

XMLFILE: Path to the XML file to tokenize, or - for stdin

TOKFILE: Output path for the tok’d XML, or - for stdout [default: XMLFILE.tokenized.xml]

readalongs tokenize [OPTIONS] XMLFILE [TOKFILE]

Options

-d, --debug

Add debugging messages to logger

-f, --force-overwrite

Force overwrite output files

Arguments

XMLFILE

Required argument

TOKFILE

Optional argument

readalongs g2p

Apply g2p mappings to TOKFILE into G2PFILE. TOKFILE should have been produced by ‘readalongs tokenize’. G2PFILE can then be modified to adjust the phonetic representation as needed. ‘readalongs align’ can be called with G2PFILE instead of TOKFILE as XML input.

WARNING: the output is not yet compatible with align and cannot be used as input to align.

TOKFILE: Path to the input tokenized XML file, or - for stdin

G2PFILE: Output path for the g2p’d XML, or - for stdout [default: TOKFILE with .g2p. inserted]

readalongs g2p [OPTIONS] TOKFILE [G2PFILE]

Options

--g2p-fallback <g2p_fallback>

Colon-separated list of fallback langs for g2p; enables the g2p cascade

-f, --force-overwrite

Force overwrite output files

--g2p-verbose

Display verbose messages about g2p errors.

-d, --debug

Add debugging messages to logger

Arguments

TOKFILE

Required argument

G2PFILE

Optional argument