This page contains the full reference documentation for each command in the CLI. See also Command line interface (CLI) user guide for guidelines on using the CLI.
The ReadAlongs CLI has four key commands:
readalongs align: full alignment pipeline, from plain text or XML to a viewable readalong
readalongs prepare: convert a plain text file into XML, for align
readalongs tokenize: tokenize a prepared XML file
readalongs g2p: g2p a tokenized XML file
Each command can be run with -h or --help to display its usage manual,
e.g., readalongs -h, readalongs align --help.
Align TEXTFILE and AUDIOFILE and create output files as OUTPUT_BASE.* in directory OUTPUT_BASE/.
TEXTFILE: Input text file path (in XML, or plain text with -i)
One can add the known ARPABET phonetics in the XML for words (<w> elements) that are not correctly handled by g2p in the output of ‘readalongs tokenize’ or ‘readalongs g2p’, via the ARPABET attribute.
One can add anchor elements in the XML, e.g., ‘<anchor time=”2.345s”/>’, to mark known anchor points between the audio and text stream.
AUDIOFILE: Input audio file path, in any format supported by ffmpeg
OUTPUT_BASE: Output files will be saved as OUTPUT_BASE/OUTPUT_BASE.*
readalongs align [OPTIONS] TEXTFILE AUDIOFILE OUTPUT_BASE
Options
Bare alignments do not split silences between words
Use ReadAlong-Studio configuration file (in JSON format)
Export sentences to WebVTT and SRT files
Add debugging messages to logger
Force overwrite output files
Input is plain text (otherwise it’s assumed to be XML)
The language code for text in TEXTFILE (use only with -i, i.e., with plain text input)
alq | atj | ckt | crg-dv | crg-tmd | crj | crk | crl | crm | csw | ctp | dan | eng | fra | git | gla | gwi | haa | ikt | iku | iku-sro | kkz | kwk-boas | kwk-napa | kwk-umista | lml | mic | moh | moh-festival | oji | oji-syl | see | srs | str | tau | tce | tgx | tli | ttm | und | und-ascii | win
Unit (w = word, m = morpheme) to align to
w | m
Save intermediate stages of processing and temporary files (dictionary, FSG, tokenization, etc)
Export to Praat TextGrid & ELAN eaf file
Export to a single-file HTML format
Output simple XHTML instead of XML
Colon-separated list of fallback langs for g2p; enables the g2p cascade
Display verbose g2p error messages
Arguments
Required argument
Required argument
Required argument
Prepare XMLFILE for ‘readalongs align’ from PLAINTEXTFILE. PLAINTEXTFILE must be plain text encoded in UTF-8, with one sentence per line, paragraph breaks marked by a blank line, and page breaks marked by two blank lines.
PLAINTEXTFILE: Path to the plain text input file, or - for stdin
XMLFILE: Path to the XML output file, or - for stdout [default: PLAINTEXTFILE.xml]
readalongs prepare [OPTIONS] PLAINTEXTFILE [XMLFILE]
Options
Add debugging messages to logger
Force overwrite output files
Required The language code for text in PLAINTEXTFILE
alq | atj | ckt | crg-dv | crg-tmd | crj | crk | crl | crm | csw | ctp | dan | eng | fra | git | gla | gwi | haa | ikt | iku | iku-sro | kkz | kwk-boas | kwk-napa | kwk-umista | lml | mic | moh | moh-festival | oji | oji-syl | see | srs | str | tau | tce | tgx | tli | ttm | und | und-ascii | win
Arguments
Required argument
Optional argument
Tokenize XMLFILE for ‘readalongs align’ into TOKFILE. XMLFILE should have been produced by ‘readalongs prepare’. TOKFILE can then be augmented with word-specific language codes. ‘readalongs align’ can be called with either XMLFILE or TOKFILE as XML input.
XMLFILE: Path to the XML file to tokenize, or - for stdin
TOKFILE: Output path for the tok’d XML, or - for stdout [default: XMLFILE.tokenized.xml]
readalongs tokenize [OPTIONS] XMLFILE [TOKFILE]
Options
Add debugging messages to logger
Force overwrite output files
Arguments
Required argument
Optional argument
Apply g2p mappings to TOKFILE into G2PFILE. TOKFILE should have been produced by ‘readalongs tokenize’. G2PFILE can then be modified to adjust the phonetic representation as needed. ‘readalongs align’ can be called with G2PFILE instead of TOKFILE as XML input.
WARNING: the output is not yet compatible with align and cannot be used as input to align.
TOKFILE: Path to the input tokenized XML file, or - for stdin
G2PFILE: Output path for the g2p’d XML, or - for stdout [default: TOKFILE with .g2p. inserted]
readalongs g2p [OPTIONS] TOKFILE [G2PFILE]
Options
Colon-separated list of fallback langs for g2p; enables the g2p cascade
Force overwrite output files
Display verbose messages about g2p errors.
Add debugging messages to logger
Arguments
Required argument
Optional argument