Jim Henry jimhenry1973 gmail.com via listserv.brown.edu
Jul 2
to CONLANG
On Mon, Jul 1, 2013 at 1:11 PM, Mustafa Umut Sarac
I am interested in Syldavian and I want to ask is there any software to analysis the known dictionary from Tintin books and generate new words. I think there are many programs at web but I dont know the general situation of artificial intelligence and capable programs to do my job.
I have several times used custom Perl or Awk scripts to analyze the
recurrent letter/phoneme sequences in the vocabulary of
under-documented conlangs and describe their phonotactics and the
frequencies of various phonemes in various positions. However, I
don't know of any off-the-shelf software that will do this for any
arbitrary language.
For example, below is an Awk script I used to analyze the phonotactic
patterns in the lexicon of Rex May's Ceqli. If you aren't familiar
with Awk, the things you need to know to re-implement a similar
program for a particular conlang in another language are these:
1. the main loop in { } is applied on every line of the input file.
2. The $1 variable is the first (space-delimited) field of the current
line of the input file - i.e. the first word on the line.
3. The gsub function is a global search and replace, and the
two-argument ersion defaults to operating on the current input line.
4. You'll also need to change all the lines which have a specific list
of letters as a regular expression character class, to match the
particular language you're analyzing; digraphs will need special
handling, especially if they can be ambiguous. Here I'm replacing any
character of a given class with a higher-level symbol such as V for
vowel, S for semivowel, F for fricative and so on. I'd then put the
output through further analysis, with other scripts, to see how often
each overall pattern occurred.
{
orig = $1;
# first fix diphthongs
gsub ( /ai/, "ay" );
gsub ( /ia/, "ya" );
gsub ( /au/, "aw" );
gsub ( /ua/, "wa" );
gsub ( /ue/, "we" );
gsub ( /oi/, "oy" );
gsub ( /io/, "yo" );
gsub ( /ei/, "ey" );
gsub ( /ie/, "ye" );
gsub ( /ui/, "wi" );
gsub ( /iu/, "yu" );
# now replace various letters with type-symbols
gsub( /[aeiou]/, "V" );
gsub( /[yw]/, "S" );
gsub( /[lr]/, "L" );
gsub( /[nmq]/, "N" );
gsub( /[sxhfvz]/, "F" );
gsub( /[pbtdkg]/, "P" );
gsub( /[cj]/, "A" );
print $1, "\t", orig;
# print $1;
pattern=$1;
gsub ( /[PFANL]/, "C", pattern );
print pattern "\t" $1 "\t" orig;
}
Jim Henry jimhenry1973 gmail.com via listserv.brown.edu
Jul 2
to CONLANG
On Mon, Jul 1, 2013 at 5:21 PM, Jim Henry <jimhenry1973 gmail.com> wrote
I talked in my last message about how to analyze a conlang's corpus or
lexicon. Once you've got your analysis done, there are a variety of
tools available for generating new vocabulary; that's the easy part.
See here:
http://www.frathwiki.com/Software_tools_for_conlangingAlex Fink 000024@gmail.com via listserv.brown.edu
Jul 2
to CONLANG
On Tue, 2 Jul 2013 01:05:45 +0300, Mustafa Umut Sarac
Essentially, it seems to me, you want to reverse-engineer the sound and spelling changes and whatnot that are being used to convert known Germanic (and Slavic, etc.) words into Syldavian.
There might, one day, be programs which are capable of this kind of thing, but that day remains on the distant horizon. About the nearest work here I know of is <http://www.pnas.org/content/early/2013/02/05/1204678110.abstract>, which took advantage of data sets much ampler and better for the task to begin with: many more languages, many more words, and (more crucially than either of those!) the cognate sets were aligned in advance, so the program didn't have to know anything about semantics. In fact, it didn't know anything about phonology, either, and got by; a program running on the meager data set we have here probably couldn't afford this. The paper cites earlier work which tried to proceed more deterministically and small-scalewise, but these ran into troubles as well.
My advice to you would be to attack this problem by human and not by computer! Indeed, the lexicon you linked was compiled by Mark Rosenfelder, who has already tried his hand at this and made up a few Syldavian words of his own: for instance, this word _löwn_ "love" is 100% a Rosenfelder invention, with no attestation in Hergé. Have you tried writing Mark and asking him, or posting on his fora <http://www.incatena.org/>? He's a good guy, I imagine he'd be glad to give you pointers
Jim Henry jimhenry1973 gmail.com via listserv.brown.edu
Jul 2 to CONLANG ,
The method I talked about would work for an a priori conlang. But it
wouldn't give satisfactory results for an a posteriori conlang like
Syldavian. If you still want me to, I'll try to find time to hack up
a script to figure out what phoneme sequences occur and how frequent
they are, but I think Alex's suggestion is more useful: to match the
existing Syldavian material you'll need to borrow words from French,
German, and Slavic languages, and adapt them to the spirit of the
language, not randomly generate vocabulary based merely on statistical
properties of the Syldavian lexicon.