MINPAIR
- Description
- Details
- Environment
- Downloads
- Localization
- Bugs
- Change Log
minpair generates a complete list of minimal pairs (words differing in exactly
one segment) from a list of words. The input should consist of one entry per line
in UTF-8 Unicode. By default, each entry consists of two parts, separated by a tab.
The first field is the word. The second field is an identifier. Typically this
will be a gloss or record number.
The output lists the two segments contrasting in the
minimal pair, then the two words, each followed by its identifier, if supplied,
and then the context for the
difference, with a difference site marker (by default an underscore)
marking the site of the difference. The segments differing are listed in
a fixed order (that of the character codes) so that all tokens of the
same pair will sort together.
By default minpair
searches only for pairs of words of the same
length differing in exactly one segment. Command line options allow the
addition of single insertions/deletions and single transpositions.
In order to find all minimal pairs it is normally necessary for the input notation
to use one character for each segment. Even in IPA transcription, this is
often not the case. minpair
provides for this situation by accepting definitions of multigraphs.
For instance, if you put the sequences p', t', and k',
representing glottalized /p/, /t/, and /k/, in the multigraph definition file,
minpair will treat them as single segments.
The multigraph definition file should consist of the character sequences
that are to be treated as single segments, one per line.
Like all other input, this file should be encoded in UTF-8 Unicode.
Sequences declared as multigraphs are compressed to a single UTF-32 codepoint so that
they will compare as single segments, then decompressed on output.
The basic program has a command-line interface. mpg provides an optional
graphical interface. mpg will also arrange for the output of minpair
to be sorted if a suitable sort utility is available. Standard sort utilities
like Unix sort will do, but if the data contains multigraphs, the
best results will be obtained using
msort since it can
read and use the same multigraph definitions as does minpair.
It is also possible to use mpg without minpair.
mpg can find minimal pairs involving substitutions but currently
cannot handle indels and transpositions. mpg is much slower than minpair.
On a list of 10,000 words, for example, minpair took 4 seconds while mpg took 321
seconds. The difference is much less significant for shorter word lists.
mpg is also able to find pairs of words that differ in two positions,
which minpair does not know how to do. This is useful when looking
for phonological rules. The maximum distance between the two positions
may be specified.
Back to Top
Language | C (main program), Tcl/Tk (GUI) |
Dependencies (weak) | dillo [used by some help functions by default - other browsers may be used instead] |
| msort [Recommended if sorted output is desired.] |
Environment | POSIX |
License | GNU General Public License, Version 3 |
Current version | 5.2 |
Last modified | 2009-11-14 |
Back to Top
minpair should compile and run without difficulty
on any POSIX-conformant system.
It is known to compile and run without modification under
GNU/Linux, FreeBSD, and SunOs. If the internationalization and localization
libraries used by minpair are not available, as under Mac OS X, the
autoconfiguration system will omit them.
mpg should run anywhere
that Tcl/Tk is available, including GNU/Linux, FreeBSD, Mac OS X, and Microsoft Windows.
However, a few features may not work on non-Unix systems. In particular,
the Abort Minpair command depends on the existence of a Unix-style kill
program that can be used to send a signal to another process.
mpg will run properly under Mac OS X if you have installed X11 and use Tk-X11.
(As of Mac OS X 10.4 "Tiger", X11 is an optional install provided on the distribution
CD.) mpg now adapts itself to Tk-Aqua sufficiently well as to be usable but
necessarily behaves somewhat differently.
Back to Top
Microsoft Windows users who do not need minpair (the C program) and do not know
how to install Tcl/Tk may download
mpg.exe,
which is a single-file executable containing mpg and the Tcl/Tk interpreter.
(This executable will not work on 64 bit processors.)
If you would like to be notified of new releases,
subscribe to minpair
at Freshmeat.
Back to Top
Both minpair and mpg are internationalized using the GNU gettext
system. A French message catalog is provided for minpair.
Back to Top
No bugs are known.
Back to Top
5.2 - 2009-11-14
- Multigraph definition files may now contain more than one multigraph per line,
separated by whitespace. This allows msort sort order definition files to be used
for their collating sequence definitions.
5.1
The changes affect only mpg.
- Fixes a bug arising from the interaction of glosses and multigraphs.
- Improves handling of glosses in other ways.
- Eliminates the dependency on the iwidgets package, simplifying installation.
- File names are now minimized.
- I/o channels are now explicitly configured for utf-8.
5.0
- GNU autoconfiguration is now available.
- It is now possible to run mpg without minpair. mpg can find
minimal pairs involving substitutions but not indels or transpositions. mpg is slower
than minpair but fast enough as to be tolerable for lists of several thousand words.
- mpg can find pairs of words differing in exactly two positions. This is useful
in looking for phonological rules.
- Improved codepoint validation in popup for entering characters by Unicode codepoint.
Now clear message window at the beginning of each attempt to insert a character.
Also gave the popup a title.
- Corrected error in accented letter chart in mpg that had
an erroneous value for i with double grave.
- Updated font control panel to new version that provides color control.
- Scrollbars now scroll by a large increment if the right mouse button is used in mpg.
- Added list of Tcl commands available in init file to the help menu of mpg.
- Added Save Configuration command to mpg.
- Made a number of changes in the system for defining custom character
insertion charts. There are now two commands available
in init files: ReadCharacterChart, which takes a filename
as argument reads from the file, and DefineCharacterChart,
which takes an in-place tcl list as argument.
Character chart specifications now require \u immediately
preceding the hex codeponts
4.4
- The GUI now sorts the output by default so that all tokens
of the same minimal pair will be grouped together. The sort
is actually performed by a sort utility run as a child process.
msort
will be used if it is available since
it understands multigraphs. If msort is not available,
a utility named sort will be used if available.
-
The GUI has been renamed mpg. MinpairG was too
long and funny looking.
- The GUI has been reorganized and beautified.
- It is now possible to configure fonts interactively.
- The balloon help toggle was moved from the Help menu to the newly created
Configure menu.
- Many aspects of the GUI can now be configured by means of an initialization file.
These include features of the GUI itself such as fonts and colors and default settings
for minpair parameters. It is also possible to define custom character
insertion charts in the initialization file.
- Those familiar with the International Phonetic Alphabet may if they desire
reduce the size of the IPA consonant and vowel charts by suppressing either or both
of the row and column labels, interactively or via the initialization file.
- Command line flags were added to report the program version, list the command line
flags, prevent the reading of the initialization file, and set the debug flag.
- Some further adaptations to Tk-Aqua were made.
4.3
- The maximum number of multigraphs permitted was doubled.
- System identification by the GUI has been improved.
- The GUI now detects that it is running under Tk-Aqua on Macs and
adapts itself.
- A bug in the GUI was fixed that triggered an error when the
vowel chart was deiconified.
-
Instead of just trying to use the default browser, the GUI now
works through a list of browsers, trying each in turn until
it finds one that is available.
4.2
- A progress bar and abort button were added to the GUI.
4.1
- An optional graphical user interface, MinpairG, has been added.
4.0
- By default the input now consists of two columns per line rather than one,
separated by a tab. The second column is intended for an identifier, typically
a gloss or item number. A new command line option allows the use of the
earlier single-column input.
3.1 - 2005-02-19
- The program has been internationalized. A French translation is now available.
- The deprecated S and C printf conversion specifications have been replaced.
3.0 - 2005-02-10
-
The principal change in version 3.0 is the addition of the ability to compress multigraphs.
- Output is now written on the standard output (since one almost invariably wants to sort
it before using it) and input is now read from standard input unless a file name is
specified on the command line.
2.0 - 2003-12-13
- All input and output is now UTF-8 Unicode.
- DOS support has been dropped.
1.0 - 1993-6-17
Back to Top
Back to Bill Poser's software page.