Hyphenation (end-of-line division) of “Germany” and some other common words
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty{ margin-bottom:0;
}
up vote
3
down vote
favorite
I am currently trying to build a database of English words and their hyphenations (end-of-line divisions) (en-US, if it matters), and thereby have come across some words which I have found contradicting hyphenations for. If those words were exotic, I would not be wondering about it, but some of them are frequently used. For example:
Germany
: Merriam-Webster -Ger-ma-ny
; Hunspell (which by far is the most dominant spell checker and hyphenator in the open source scene, driving applications like LibreOffice, OpenOffice, Firefox, Thunderbird and the like) -Ger-many
freely
: Merriam-Webster -free-ly
; Hunspell -freely
rapid
: Merriam-Webster -rap-id
; Hunspell -rapid
I have read a lot of articles (most of them on this site) about hyphenation. The general consensus seems to be that we should look up the respective word and its hyphenation in authoritative sources. But what if those sources contradict each other?
Another advice which often was given was that we just should hyphenate between syllables. Since I am not a native English speaker, this is extremely difficult for me. While I would have done it right with Germany
and freely
, I would never have done it right with rapid
(in my world, it would have been ra-pid
).
I always have considered the Oxford English Dictionary to be the most authoritative English dictionary. Imagine my surprise when I saw that they neither show hyphenation nor syllabication. The Wiktionary does show hyphenation, but only for some words; the examples mentioned above, being very common words, are not among them, so it's worthless in this respect.
Could somebody please give me a hint what I should do if two important sources which both can (somehow) be considered authoritative show contradicting hyphenations, and even more important, could somebody please tell me if there is a reliable method to identify words which are suspect in this respect in the first place?
To explain the latter: I am currently using the hunspell data to build my database semi-automatically; otherwise, I couldn't handle it. The hunspell data is the only one I have found to be usable to get the hyphenation of a word quite easily.
As a second step, I would like to be able to identify and separate suspect words, which I then could look up manually in different sources (hoping that only about 5% of the words are suspect).
EDIT 1
As a reaction to one of the comments, I now have found a word where at least 3 characters are left at each side after hyphenation, but where different "authorities" hyphenate differently:
Microsoft Word 2010 hyphenates inconceivable
as in-con-ceiv-a-ble
, where Merriam-Webster has in-con-ceiv-able
.
Another one: Merriam-Webster says cli-ent
, where hunspell says client
, i.e. does not hyphenate that word at all.
EDIT 2
@Hot Licks has pointed out that the dictionaries are showing syllable boundaries, not hyphenation points (if any). However, at least in case of Merriam-Webster, this is the same. From their dictionary API documentation:
<hw>...</hw> (text = boldface)
HEADWORD
- This is the first bold word in an entry
- contains "syllable" break points (that is,
end-of-line hyphenation points) here indicated
by asterisks, which will translate to raised dot,
{point} in Merriam-Webster font.
- may contain superscript homograph numbers
{h,1}, {h,2}, etc., in the same font (bold)
- single word space after <hw> field
Please note the text following the second hyphen. IMHO, that means that each syllable boundary is a hyphenation point, and vice versa.
EDIT 3
I have found more precise information. From Merriam-Webster's guide to pronunciation:
Hyphens are used to separate syllables in pronunciation
transcriptions. [...]
The centered dots in boldface entry words indicate potential
end-of-line division points and not syllabication. [...] As a
result, the hyphens indicating syllable breaks and the centered
dots indicating end-of-line division often do not fall in the same
places.
hyphenation dictionaries contradiction
bumped to the homepage by Community♦ 4 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
|
show 20 more comments
up vote
3
down vote
favorite
I am currently trying to build a database of English words and their hyphenations (end-of-line divisions) (en-US, if it matters), and thereby have come across some words which I have found contradicting hyphenations for. If those words were exotic, I would not be wondering about it, but some of them are frequently used. For example:
Germany
: Merriam-Webster -Ger-ma-ny
; Hunspell (which by far is the most dominant spell checker and hyphenator in the open source scene, driving applications like LibreOffice, OpenOffice, Firefox, Thunderbird and the like) -Ger-many
freely
: Merriam-Webster -free-ly
; Hunspell -freely
rapid
: Merriam-Webster -rap-id
; Hunspell -rapid
I have read a lot of articles (most of them on this site) about hyphenation. The general consensus seems to be that we should look up the respective word and its hyphenation in authoritative sources. But what if those sources contradict each other?
Another advice which often was given was that we just should hyphenate between syllables. Since I am not a native English speaker, this is extremely difficult for me. While I would have done it right with Germany
and freely
, I would never have done it right with rapid
(in my world, it would have been ra-pid
).
I always have considered the Oxford English Dictionary to be the most authoritative English dictionary. Imagine my surprise when I saw that they neither show hyphenation nor syllabication. The Wiktionary does show hyphenation, but only for some words; the examples mentioned above, being very common words, are not among them, so it's worthless in this respect.
Could somebody please give me a hint what I should do if two important sources which both can (somehow) be considered authoritative show contradicting hyphenations, and even more important, could somebody please tell me if there is a reliable method to identify words which are suspect in this respect in the first place?
To explain the latter: I am currently using the hunspell data to build my database semi-automatically; otherwise, I couldn't handle it. The hunspell data is the only one I have found to be usable to get the hyphenation of a word quite easily.
As a second step, I would like to be able to identify and separate suspect words, which I then could look up manually in different sources (hoping that only about 5% of the words are suspect).
EDIT 1
As a reaction to one of the comments, I now have found a word where at least 3 characters are left at each side after hyphenation, but where different "authorities" hyphenate differently:
Microsoft Word 2010 hyphenates inconceivable
as in-con-ceiv-a-ble
, where Merriam-Webster has in-con-ceiv-able
.
Another one: Merriam-Webster says cli-ent
, where hunspell says client
, i.e. does not hyphenate that word at all.
EDIT 2
@Hot Licks has pointed out that the dictionaries are showing syllable boundaries, not hyphenation points (if any). However, at least in case of Merriam-Webster, this is the same. From their dictionary API documentation:
<hw>...</hw> (text = boldface)
HEADWORD
- This is the first bold word in an entry
- contains "syllable" break points (that is,
end-of-line hyphenation points) here indicated
by asterisks, which will translate to raised dot,
{point} in Merriam-Webster font.
- may contain superscript homograph numbers
{h,1}, {h,2}, etc., in the same font (bold)
- single word space after <hw> field
Please note the text following the second hyphen. IMHO, that means that each syllable boundary is a hyphenation point, and vice versa.
EDIT 3
I have found more precise information. From Merriam-Webster's guide to pronunciation:
Hyphens are used to separate syllables in pronunciation
transcriptions. [...]
The centered dots in boldface entry words indicate potential
end-of-line division points and not syllabication. [...] As a
result, the hyphens indicating syllable breaks and the centered
dots indicating end-of-line division often do not fall in the same
places.
hyphenation dictionaries contradiction
bumped to the homepage by Community♦ 4 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
1
Generally speaking, you should never hyphenate a word and leave fewer than 3 characters on either side.
– Hot Licks
Aug 4 at 11:38
2
What the dictionaries show is not the hyphenation points but the syllable boundaries.
– Hot Licks
Aug 4 at 12:00
1
Please correct me if I am wrong, but Merriam-Webster seems to show the hyphenations. For example, consider merriam-webster.com/dictionary/calculation. Directly under the title (in giant letters), there are three entries. The left denotes the type of the word (in this case,noun
), the second is what I have considered to be the hyphenation, and the third is the pronunciation, and I always thought the syllable boundaries are part of the pronunciation. Please correct me if I am wrong (which may very well be the case).
– Binarus
Aug 4 at 12:11
1
MW is showing the syllable boundaries. Generally, hyphenation occurs on syllable boundaries, but there are limits as to which boundaries can be used
– Hot Licks
Aug 4 at 12:29
2
@Hot Licks Please forgive me, but it seems you are wrong regarding what the dictionaries show, at least in the case of MW. Please take a look at my EDIT 3.
– Binarus
Aug 4 at 14:08
|
show 20 more comments
up vote
3
down vote
favorite
up vote
3
down vote
favorite
I am currently trying to build a database of English words and their hyphenations (end-of-line divisions) (en-US, if it matters), and thereby have come across some words which I have found contradicting hyphenations for. If those words were exotic, I would not be wondering about it, but some of them are frequently used. For example:
Germany
: Merriam-Webster -Ger-ma-ny
; Hunspell (which by far is the most dominant spell checker and hyphenator in the open source scene, driving applications like LibreOffice, OpenOffice, Firefox, Thunderbird and the like) -Ger-many
freely
: Merriam-Webster -free-ly
; Hunspell -freely
rapid
: Merriam-Webster -rap-id
; Hunspell -rapid
I have read a lot of articles (most of them on this site) about hyphenation. The general consensus seems to be that we should look up the respective word and its hyphenation in authoritative sources. But what if those sources contradict each other?
Another advice which often was given was that we just should hyphenate between syllables. Since I am not a native English speaker, this is extremely difficult for me. While I would have done it right with Germany
and freely
, I would never have done it right with rapid
(in my world, it would have been ra-pid
).
I always have considered the Oxford English Dictionary to be the most authoritative English dictionary. Imagine my surprise when I saw that they neither show hyphenation nor syllabication. The Wiktionary does show hyphenation, but only for some words; the examples mentioned above, being very common words, are not among them, so it's worthless in this respect.
Could somebody please give me a hint what I should do if two important sources which both can (somehow) be considered authoritative show contradicting hyphenations, and even more important, could somebody please tell me if there is a reliable method to identify words which are suspect in this respect in the first place?
To explain the latter: I am currently using the hunspell data to build my database semi-automatically; otherwise, I couldn't handle it. The hunspell data is the only one I have found to be usable to get the hyphenation of a word quite easily.
As a second step, I would like to be able to identify and separate suspect words, which I then could look up manually in different sources (hoping that only about 5% of the words are suspect).
EDIT 1
As a reaction to one of the comments, I now have found a word where at least 3 characters are left at each side after hyphenation, but where different "authorities" hyphenate differently:
Microsoft Word 2010 hyphenates inconceivable
as in-con-ceiv-a-ble
, where Merriam-Webster has in-con-ceiv-able
.
Another one: Merriam-Webster says cli-ent
, where hunspell says client
, i.e. does not hyphenate that word at all.
EDIT 2
@Hot Licks has pointed out that the dictionaries are showing syllable boundaries, not hyphenation points (if any). However, at least in case of Merriam-Webster, this is the same. From their dictionary API documentation:
<hw>...</hw> (text = boldface)
HEADWORD
- This is the first bold word in an entry
- contains "syllable" break points (that is,
end-of-line hyphenation points) here indicated
by asterisks, which will translate to raised dot,
{point} in Merriam-Webster font.
- may contain superscript homograph numbers
{h,1}, {h,2}, etc., in the same font (bold)
- single word space after <hw> field
Please note the text following the second hyphen. IMHO, that means that each syllable boundary is a hyphenation point, and vice versa.
EDIT 3
I have found more precise information. From Merriam-Webster's guide to pronunciation:
Hyphens are used to separate syllables in pronunciation
transcriptions. [...]
The centered dots in boldface entry words indicate potential
end-of-line division points and not syllabication. [...] As a
result, the hyphens indicating syllable breaks and the centered
dots indicating end-of-line division often do not fall in the same
places.
hyphenation dictionaries contradiction
I am currently trying to build a database of English words and their hyphenations (end-of-line divisions) (en-US, if it matters), and thereby have come across some words which I have found contradicting hyphenations for. If those words were exotic, I would not be wondering about it, but some of them are frequently used. For example:
Germany
: Merriam-Webster -Ger-ma-ny
; Hunspell (which by far is the most dominant spell checker and hyphenator in the open source scene, driving applications like LibreOffice, OpenOffice, Firefox, Thunderbird and the like) -Ger-many
freely
: Merriam-Webster -free-ly
; Hunspell -freely
rapid
: Merriam-Webster -rap-id
; Hunspell -rapid
I have read a lot of articles (most of them on this site) about hyphenation. The general consensus seems to be that we should look up the respective word and its hyphenation in authoritative sources. But what if those sources contradict each other?
Another advice which often was given was that we just should hyphenate between syllables. Since I am not a native English speaker, this is extremely difficult for me. While I would have done it right with Germany
and freely
, I would never have done it right with rapid
(in my world, it would have been ra-pid
).
I always have considered the Oxford English Dictionary to be the most authoritative English dictionary. Imagine my surprise when I saw that they neither show hyphenation nor syllabication. The Wiktionary does show hyphenation, but only for some words; the examples mentioned above, being very common words, are not among them, so it's worthless in this respect.
Could somebody please give me a hint what I should do if two important sources which both can (somehow) be considered authoritative show contradicting hyphenations, and even more important, could somebody please tell me if there is a reliable method to identify words which are suspect in this respect in the first place?
To explain the latter: I am currently using the hunspell data to build my database semi-automatically; otherwise, I couldn't handle it. The hunspell data is the only one I have found to be usable to get the hyphenation of a word quite easily.
As a second step, I would like to be able to identify and separate suspect words, which I then could look up manually in different sources (hoping that only about 5% of the words are suspect).
EDIT 1
As a reaction to one of the comments, I now have found a word where at least 3 characters are left at each side after hyphenation, but where different "authorities" hyphenate differently:
Microsoft Word 2010 hyphenates inconceivable
as in-con-ceiv-a-ble
, where Merriam-Webster has in-con-ceiv-able
.
Another one: Merriam-Webster says cli-ent
, where hunspell says client
, i.e. does not hyphenate that word at all.
EDIT 2
@Hot Licks has pointed out that the dictionaries are showing syllable boundaries, not hyphenation points (if any). However, at least in case of Merriam-Webster, this is the same. From their dictionary API documentation:
<hw>...</hw> (text = boldface)
HEADWORD
- This is the first bold word in an entry
- contains "syllable" break points (that is,
end-of-line hyphenation points) here indicated
by asterisks, which will translate to raised dot,
{point} in Merriam-Webster font.
- may contain superscript homograph numbers
{h,1}, {h,2}, etc., in the same font (bold)
- single word space after <hw> field
Please note the text following the second hyphen. IMHO, that means that each syllable boundary is a hyphenation point, and vice versa.
EDIT 3
I have found more precise information. From Merriam-Webster's guide to pronunciation:
Hyphens are used to separate syllables in pronunciation
transcriptions. [...]
The centered dots in boldface entry words indicate potential
end-of-line division points and not syllabication. [...] As a
result, the hyphens indicating syllable breaks and the centered
dots indicating end-of-line division often do not fall in the same
places.
hyphenation dictionaries contradiction
hyphenation dictionaries contradiction
edited Aug 4 at 14:39
asked Aug 4 at 10:58
Binarus
1355
1355
bumped to the homepage by Community♦ 4 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
bumped to the homepage by Community♦ 4 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
1
Generally speaking, you should never hyphenate a word and leave fewer than 3 characters on either side.
– Hot Licks
Aug 4 at 11:38
2
What the dictionaries show is not the hyphenation points but the syllable boundaries.
– Hot Licks
Aug 4 at 12:00
1
Please correct me if I am wrong, but Merriam-Webster seems to show the hyphenations. For example, consider merriam-webster.com/dictionary/calculation. Directly under the title (in giant letters), there are three entries. The left denotes the type of the word (in this case,noun
), the second is what I have considered to be the hyphenation, and the third is the pronunciation, and I always thought the syllable boundaries are part of the pronunciation. Please correct me if I am wrong (which may very well be the case).
– Binarus
Aug 4 at 12:11
1
MW is showing the syllable boundaries. Generally, hyphenation occurs on syllable boundaries, but there are limits as to which boundaries can be used
– Hot Licks
Aug 4 at 12:29
2
@Hot Licks Please forgive me, but it seems you are wrong regarding what the dictionaries show, at least in the case of MW. Please take a look at my EDIT 3.
– Binarus
Aug 4 at 14:08
|
show 20 more comments
1
Generally speaking, you should never hyphenate a word and leave fewer than 3 characters on either side.
– Hot Licks
Aug 4 at 11:38
2
What the dictionaries show is not the hyphenation points but the syllable boundaries.
– Hot Licks
Aug 4 at 12:00
1
Please correct me if I am wrong, but Merriam-Webster seems to show the hyphenations. For example, consider merriam-webster.com/dictionary/calculation. Directly under the title (in giant letters), there are three entries. The left denotes the type of the word (in this case,noun
), the second is what I have considered to be the hyphenation, and the third is the pronunciation, and I always thought the syllable boundaries are part of the pronunciation. Please correct me if I am wrong (which may very well be the case).
– Binarus
Aug 4 at 12:11
1
MW is showing the syllable boundaries. Generally, hyphenation occurs on syllable boundaries, but there are limits as to which boundaries can be used
– Hot Licks
Aug 4 at 12:29
2
@Hot Licks Please forgive me, but it seems you are wrong regarding what the dictionaries show, at least in the case of MW. Please take a look at my EDIT 3.
– Binarus
Aug 4 at 14:08
1
1
Generally speaking, you should never hyphenate a word and leave fewer than 3 characters on either side.
– Hot Licks
Aug 4 at 11:38
Generally speaking, you should never hyphenate a word and leave fewer than 3 characters on either side.
– Hot Licks
Aug 4 at 11:38
2
2
What the dictionaries show is not the hyphenation points but the syllable boundaries.
– Hot Licks
Aug 4 at 12:00
What the dictionaries show is not the hyphenation points but the syllable boundaries.
– Hot Licks
Aug 4 at 12:00
1
1
Please correct me if I am wrong, but Merriam-Webster seems to show the hyphenations. For example, consider merriam-webster.com/dictionary/calculation. Directly under the title (in giant letters), there are three entries. The left denotes the type of the word (in this case,
noun
), the second is what I have considered to be the hyphenation, and the third is the pronunciation, and I always thought the syllable boundaries are part of the pronunciation. Please correct me if I am wrong (which may very well be the case).– Binarus
Aug 4 at 12:11
Please correct me if I am wrong, but Merriam-Webster seems to show the hyphenations. For example, consider merriam-webster.com/dictionary/calculation. Directly under the title (in giant letters), there are three entries. The left denotes the type of the word (in this case,
noun
), the second is what I have considered to be the hyphenation, and the third is the pronunciation, and I always thought the syllable boundaries are part of the pronunciation. Please correct me if I am wrong (which may very well be the case).– Binarus
Aug 4 at 12:11
1
1
MW is showing the syllable boundaries. Generally, hyphenation occurs on syllable boundaries, but there are limits as to which boundaries can be used
– Hot Licks
Aug 4 at 12:29
MW is showing the syllable boundaries. Generally, hyphenation occurs on syllable boundaries, but there are limits as to which boundaries can be used
– Hot Licks
Aug 4 at 12:29
2
2
@Hot Licks Please forgive me, but it seems you are wrong regarding what the dictionaries show, at least in the case of MW. Please take a look at my EDIT 3.
– Binarus
Aug 4 at 14:08
@Hot Licks Please forgive me, but it seems you are wrong regarding what the dictionaries show, at least in the case of MW. Please take a look at my EDIT 3.
– Binarus
Aug 4 at 14:08
|
show 20 more comments
1 Answer
1
active
oldest
votes
up vote
0
down vote
If you search hunspell hyphenation you should find an end-of-line hyphenation library (import from TeX) that should suit your needs. The min right and left lengths are variables.
I don't know if this can detect part-of-speech such as (verb) pro-ject vs (noun) proj-ect.
I don't believe this answers the question, since Hunspell is already mentioned in the question.
– Laurel
Nov 2 at 3:10
Hunspell was being used with partial success and I'm suggesting an add-on that should complete the task, rather than re-doing everything from scratch.
– AmI
Nov 2 at 3:20
@Aml In fact, this is what I currently use for the automatic part. The problem is that Hunspell's hyphenation differs a lot from Merriam-Webster's, for example. Hence, whenever I feel that Hunspell may have missed a hyphenation point, I manually look the word up in other dictionaries. This is still quite painful, but better than nothing. The most difficult part (for me) is to determine if Hunspell might have missed something, so I sometimes end up unnecessarily looking up 10 words in a row manually just because I can't trust Hunspell completely ... What a pity that MW does not offer an API.
– Binarus
Nov 20 at 8:30
I'm sorry -- I didn't realize that you were already using hyphen.tex. Because it is rule based rather than a full dictionary, it can't reliably handle breaks leaving less than 3 letters. It does have an exception list at the end where you can add on, but change the file name if you customize it. You could also build hyph_en_US.dic and customize that.
– AmI
Nov 20 at 18:34
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
If you search hunspell hyphenation you should find an end-of-line hyphenation library (import from TeX) that should suit your needs. The min right and left lengths are variables.
I don't know if this can detect part-of-speech such as (verb) pro-ject vs (noun) proj-ect.
I don't believe this answers the question, since Hunspell is already mentioned in the question.
– Laurel
Nov 2 at 3:10
Hunspell was being used with partial success and I'm suggesting an add-on that should complete the task, rather than re-doing everything from scratch.
– AmI
Nov 2 at 3:20
@Aml In fact, this is what I currently use for the automatic part. The problem is that Hunspell's hyphenation differs a lot from Merriam-Webster's, for example. Hence, whenever I feel that Hunspell may have missed a hyphenation point, I manually look the word up in other dictionaries. This is still quite painful, but better than nothing. The most difficult part (for me) is to determine if Hunspell might have missed something, so I sometimes end up unnecessarily looking up 10 words in a row manually just because I can't trust Hunspell completely ... What a pity that MW does not offer an API.
– Binarus
Nov 20 at 8:30
I'm sorry -- I didn't realize that you were already using hyphen.tex. Because it is rule based rather than a full dictionary, it can't reliably handle breaks leaving less than 3 letters. It does have an exception list at the end where you can add on, but change the file name if you customize it. You could also build hyph_en_US.dic and customize that.
– AmI
Nov 20 at 18:34
add a comment |
up vote
0
down vote
If you search hunspell hyphenation you should find an end-of-line hyphenation library (import from TeX) that should suit your needs. The min right and left lengths are variables.
I don't know if this can detect part-of-speech such as (verb) pro-ject vs (noun) proj-ect.
I don't believe this answers the question, since Hunspell is already mentioned in the question.
– Laurel
Nov 2 at 3:10
Hunspell was being used with partial success and I'm suggesting an add-on that should complete the task, rather than re-doing everything from scratch.
– AmI
Nov 2 at 3:20
@Aml In fact, this is what I currently use for the automatic part. The problem is that Hunspell's hyphenation differs a lot from Merriam-Webster's, for example. Hence, whenever I feel that Hunspell may have missed a hyphenation point, I manually look the word up in other dictionaries. This is still quite painful, but better than nothing. The most difficult part (for me) is to determine if Hunspell might have missed something, so I sometimes end up unnecessarily looking up 10 words in a row manually just because I can't trust Hunspell completely ... What a pity that MW does not offer an API.
– Binarus
Nov 20 at 8:30
I'm sorry -- I didn't realize that you were already using hyphen.tex. Because it is rule based rather than a full dictionary, it can't reliably handle breaks leaving less than 3 letters. It does have an exception list at the end where you can add on, but change the file name if you customize it. You could also build hyph_en_US.dic and customize that.
– AmI
Nov 20 at 18:34
add a comment |
up vote
0
down vote
up vote
0
down vote
If you search hunspell hyphenation you should find an end-of-line hyphenation library (import from TeX) that should suit your needs. The min right and left lengths are variables.
I don't know if this can detect part-of-speech such as (verb) pro-ject vs (noun) proj-ect.
If you search hunspell hyphenation you should find an end-of-line hyphenation library (import from TeX) that should suit your needs. The min right and left lengths are variables.
I don't know if this can detect part-of-speech such as (verb) pro-ject vs (noun) proj-ect.
answered Nov 2 at 3:08
AmI
3,2041517
3,2041517
I don't believe this answers the question, since Hunspell is already mentioned in the question.
– Laurel
Nov 2 at 3:10
Hunspell was being used with partial success and I'm suggesting an add-on that should complete the task, rather than re-doing everything from scratch.
– AmI
Nov 2 at 3:20
@Aml In fact, this is what I currently use for the automatic part. The problem is that Hunspell's hyphenation differs a lot from Merriam-Webster's, for example. Hence, whenever I feel that Hunspell may have missed a hyphenation point, I manually look the word up in other dictionaries. This is still quite painful, but better than nothing. The most difficult part (for me) is to determine if Hunspell might have missed something, so I sometimes end up unnecessarily looking up 10 words in a row manually just because I can't trust Hunspell completely ... What a pity that MW does not offer an API.
– Binarus
Nov 20 at 8:30
I'm sorry -- I didn't realize that you were already using hyphen.tex. Because it is rule based rather than a full dictionary, it can't reliably handle breaks leaving less than 3 letters. It does have an exception list at the end where you can add on, but change the file name if you customize it. You could also build hyph_en_US.dic and customize that.
– AmI
Nov 20 at 18:34
add a comment |
I don't believe this answers the question, since Hunspell is already mentioned in the question.
– Laurel
Nov 2 at 3:10
Hunspell was being used with partial success and I'm suggesting an add-on that should complete the task, rather than re-doing everything from scratch.
– AmI
Nov 2 at 3:20
@Aml In fact, this is what I currently use for the automatic part. The problem is that Hunspell's hyphenation differs a lot from Merriam-Webster's, for example. Hence, whenever I feel that Hunspell may have missed a hyphenation point, I manually look the word up in other dictionaries. This is still quite painful, but better than nothing. The most difficult part (for me) is to determine if Hunspell might have missed something, so I sometimes end up unnecessarily looking up 10 words in a row manually just because I can't trust Hunspell completely ... What a pity that MW does not offer an API.
– Binarus
Nov 20 at 8:30
I'm sorry -- I didn't realize that you were already using hyphen.tex. Because it is rule based rather than a full dictionary, it can't reliably handle breaks leaving less than 3 letters. It does have an exception list at the end where you can add on, but change the file name if you customize it. You could also build hyph_en_US.dic and customize that.
– AmI
Nov 20 at 18:34
I don't believe this answers the question, since Hunspell is already mentioned in the question.
– Laurel
Nov 2 at 3:10
I don't believe this answers the question, since Hunspell is already mentioned in the question.
– Laurel
Nov 2 at 3:10
Hunspell was being used with partial success and I'm suggesting an add-on that should complete the task, rather than re-doing everything from scratch.
– AmI
Nov 2 at 3:20
Hunspell was being used with partial success and I'm suggesting an add-on that should complete the task, rather than re-doing everything from scratch.
– AmI
Nov 2 at 3:20
@Aml In fact, this is what I currently use for the automatic part. The problem is that Hunspell's hyphenation differs a lot from Merriam-Webster's, for example. Hence, whenever I feel that Hunspell may have missed a hyphenation point, I manually look the word up in other dictionaries. This is still quite painful, but better than nothing. The most difficult part (for me) is to determine if Hunspell might have missed something, so I sometimes end up unnecessarily looking up 10 words in a row manually just because I can't trust Hunspell completely ... What a pity that MW does not offer an API.
– Binarus
Nov 20 at 8:30
@Aml In fact, this is what I currently use for the automatic part. The problem is that Hunspell's hyphenation differs a lot from Merriam-Webster's, for example. Hence, whenever I feel that Hunspell may have missed a hyphenation point, I manually look the word up in other dictionaries. This is still quite painful, but better than nothing. The most difficult part (for me) is to determine if Hunspell might have missed something, so I sometimes end up unnecessarily looking up 10 words in a row manually just because I can't trust Hunspell completely ... What a pity that MW does not offer an API.
– Binarus
Nov 20 at 8:30
I'm sorry -- I didn't realize that you were already using hyphen.tex. Because it is rule based rather than a full dictionary, it can't reliably handle breaks leaving less than 3 letters. It does have an exception list at the end where you can add on, but change the file name if you customize it. You could also build hyph_en_US.dic and customize that.
– AmI
Nov 20 at 18:34
I'm sorry -- I didn't realize that you were already using hyphen.tex. Because it is rule based rather than a full dictionary, it can't reliably handle breaks leaving less than 3 letters. It does have an exception list at the end where you can add on, but change the file name if you customize it. You could also build hyph_en_US.dic and customize that.
– AmI
Nov 20 at 18:34
add a comment |
Thanks for contributing an answer to English Language & Usage Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fenglish.stackexchange.com%2fquestions%2f458817%2fhyphenation-end-of-line-division-of-germany-and-some-other-common-words%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Generally speaking, you should never hyphenate a word and leave fewer than 3 characters on either side.
– Hot Licks
Aug 4 at 11:38
2
What the dictionaries show is not the hyphenation points but the syllable boundaries.
– Hot Licks
Aug 4 at 12:00
1
Please correct me if I am wrong, but Merriam-Webster seems to show the hyphenations. For example, consider merriam-webster.com/dictionary/calculation. Directly under the title (in giant letters), there are three entries. The left denotes the type of the word (in this case,
noun
), the second is what I have considered to be the hyphenation, and the third is the pronunciation, and I always thought the syllable boundaries are part of the pronunciation. Please correct me if I am wrong (which may very well be the case).– Binarus
Aug 4 at 12:11
1
MW is showing the syllable boundaries. Generally, hyphenation occurs on syllable boundaries, but there are limits as to which boundaries can be used
– Hot Licks
Aug 4 at 12:29
2
@Hot Licks Please forgive me, but it seems you are wrong regarding what the dictionaries show, at least in the case of MW. Please take a look at my EDIT 3.
– Binarus
Aug 4 at 14:08