Hyphenation (end-of-line division) of “Germany” and some other common words





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty{ margin-bottom:0;
}






up vote
3
down vote

favorite
2












I am currently trying to build a database of English words and their hyphenations (end-of-line divisions) (en-US, if it matters), and thereby have come across some words which I have found contradicting hyphenations for. If those words were exotic, I would not be wondering about it, but some of them are frequently used. For example:




  • Germany: Merriam-Webster - Ger-ma-ny; Hunspell (which by far is the most dominant spell checker and hyphenator in the open source scene, driving applications like LibreOffice, OpenOffice, Firefox, Thunderbird and the like) - Ger-many


  • freely: Merriam-Webster - free-ly; Hunspell - freely


  • rapid: Merriam-Webster - rap-id; Hunspell - rapid



I have read a lot of articles (most of them on this site) about hyphenation. The general consensus seems to be that we should look up the respective word and its hyphenation in authoritative sources. But what if those sources contradict each other?



Another advice which often was given was that we just should hyphenate between syllables. Since I am not a native English speaker, this is extremely difficult for me. While I would have done it right with Germany and freely, I would never have done it right with rapid (in my world, it would have been ra-pid).



I always have considered the Oxford English Dictionary to be the most authoritative English dictionary. Imagine my surprise when I saw that they neither show hyphenation nor syllabication. The Wiktionary does show hyphenation, but only for some words; the examples mentioned above, being very common words, are not among them, so it's worthless in this respect.



Could somebody please give me a hint what I should do if two important sources which both can (somehow) be considered authoritative show contradicting hyphenations, and even more important, could somebody please tell me if there is a reliable method to identify words which are suspect in this respect in the first place?



To explain the latter: I am currently using the hunspell data to build my database semi-automatically; otherwise, I couldn't handle it. The hunspell data is the only one I have found to be usable to get the hyphenation of a word quite easily.



As a second step, I would like to be able to identify and separate suspect words, which I then could look up manually in different sources (hoping that only about 5% of the words are suspect).



EDIT 1



As a reaction to one of the comments, I now have found a word where at least 3 characters are left at each side after hyphenation, but where different "authorities" hyphenate differently:



Microsoft Word 2010 hyphenates inconceivable as in-con-ceiv-a-ble, where Merriam-Webster has in-con-ceiv-able.



Another one: Merriam-Webster says cli-ent, where hunspell says client, i.e. does not hyphenate that word at all.



EDIT 2



@Hot Licks has pointed out that the dictionaries are showing syllable boundaries, not hyphenation points (if any). However, at least in case of Merriam-Webster, this is the same. From their dictionary API documentation:



<hw>...</hw>    (text = boldface)
HEADWORD
- This is the first bold word in an entry
- contains "syllable" break points (that is,
end-of-line hyphenation points) here indicated
by asterisks, which will translate to raised dot,
{point} in Merriam-Webster font.
- may contain superscript homograph numbers
{h,1}, {h,2}, etc., in the same font (bold)
- single word space after <hw> field


Please note the text following the second hyphen. IMHO, that means that each syllable boundary is a hyphenation point, and vice versa.



EDIT 3



I have found more precise information. From Merriam-Webster's guide to pronunciation:




Hyphens are used to separate syllables in pronunciation
transcriptions. [...]



The centered dots in boldface entry words indicate potential
end-of-line division points and not syllabication. [...] As a
result, the hyphens indicating syllable breaks and the centered
dots indicating end-of-line division often do not fall in the same
places.











share|improve this question
















bumped to the homepage by Community 4 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.











  • 1




    Generally speaking, you should never hyphenate a word and leave fewer than 3 characters on either side.
    – Hot Licks
    Aug 4 at 11:38






  • 2




    What the dictionaries show is not the hyphenation points but the syllable boundaries.
    – Hot Licks
    Aug 4 at 12:00






  • 1




    Please correct me if I am wrong, but Merriam-Webster seems to show the hyphenations. For example, consider merriam-webster.com/dictionary/calculation. Directly under the title (in giant letters), there are three entries. The left denotes the type of the word (in this case, noun), the second is what I have considered to be the hyphenation, and the third is the pronunciation, and I always thought the syllable boundaries are part of the pronunciation. Please correct me if I am wrong (which may very well be the case).
    – Binarus
    Aug 4 at 12:11








  • 1




    MW is showing the syllable boundaries. Generally, hyphenation occurs on syllable boundaries, but there are limits as to which boundaries can be used
    – Hot Licks
    Aug 4 at 12:29






  • 2




    @Hot Licks Please forgive me, but it seems you are wrong regarding what the dictionaries show, at least in the case of MW. Please take a look at my EDIT 3.
    – Binarus
    Aug 4 at 14:08

















up vote
3
down vote

favorite
2












I am currently trying to build a database of English words and their hyphenations (end-of-line divisions) (en-US, if it matters), and thereby have come across some words which I have found contradicting hyphenations for. If those words were exotic, I would not be wondering about it, but some of them are frequently used. For example:




  • Germany: Merriam-Webster - Ger-ma-ny; Hunspell (which by far is the most dominant spell checker and hyphenator in the open source scene, driving applications like LibreOffice, OpenOffice, Firefox, Thunderbird and the like) - Ger-many


  • freely: Merriam-Webster - free-ly; Hunspell - freely


  • rapid: Merriam-Webster - rap-id; Hunspell - rapid



I have read a lot of articles (most of them on this site) about hyphenation. The general consensus seems to be that we should look up the respective word and its hyphenation in authoritative sources. But what if those sources contradict each other?



Another advice which often was given was that we just should hyphenate between syllables. Since I am not a native English speaker, this is extremely difficult for me. While I would have done it right with Germany and freely, I would never have done it right with rapid (in my world, it would have been ra-pid).



I always have considered the Oxford English Dictionary to be the most authoritative English dictionary. Imagine my surprise when I saw that they neither show hyphenation nor syllabication. The Wiktionary does show hyphenation, but only for some words; the examples mentioned above, being very common words, are not among them, so it's worthless in this respect.



Could somebody please give me a hint what I should do if two important sources which both can (somehow) be considered authoritative show contradicting hyphenations, and even more important, could somebody please tell me if there is a reliable method to identify words which are suspect in this respect in the first place?



To explain the latter: I am currently using the hunspell data to build my database semi-automatically; otherwise, I couldn't handle it. The hunspell data is the only one I have found to be usable to get the hyphenation of a word quite easily.



As a second step, I would like to be able to identify and separate suspect words, which I then could look up manually in different sources (hoping that only about 5% of the words are suspect).



EDIT 1



As a reaction to one of the comments, I now have found a word where at least 3 characters are left at each side after hyphenation, but where different "authorities" hyphenate differently:



Microsoft Word 2010 hyphenates inconceivable as in-con-ceiv-a-ble, where Merriam-Webster has in-con-ceiv-able.



Another one: Merriam-Webster says cli-ent, where hunspell says client, i.e. does not hyphenate that word at all.



EDIT 2



@Hot Licks has pointed out that the dictionaries are showing syllable boundaries, not hyphenation points (if any). However, at least in case of Merriam-Webster, this is the same. From their dictionary API documentation:



<hw>...</hw>    (text = boldface)
HEADWORD
- This is the first bold word in an entry
- contains "syllable" break points (that is,
end-of-line hyphenation points) here indicated
by asterisks, which will translate to raised dot,
{point} in Merriam-Webster font.
- may contain superscript homograph numbers
{h,1}, {h,2}, etc., in the same font (bold)
- single word space after <hw> field


Please note the text following the second hyphen. IMHO, that means that each syllable boundary is a hyphenation point, and vice versa.



EDIT 3



I have found more precise information. From Merriam-Webster's guide to pronunciation:




Hyphens are used to separate syllables in pronunciation
transcriptions. [...]



The centered dots in boldface entry words indicate potential
end-of-line division points and not syllabication. [...] As a
result, the hyphens indicating syllable breaks and the centered
dots indicating end-of-line division often do not fall in the same
places.











share|improve this question
















bumped to the homepage by Community 4 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.











  • 1




    Generally speaking, you should never hyphenate a word and leave fewer than 3 characters on either side.
    – Hot Licks
    Aug 4 at 11:38






  • 2




    What the dictionaries show is not the hyphenation points but the syllable boundaries.
    – Hot Licks
    Aug 4 at 12:00






  • 1




    Please correct me if I am wrong, but Merriam-Webster seems to show the hyphenations. For example, consider merriam-webster.com/dictionary/calculation. Directly under the title (in giant letters), there are three entries. The left denotes the type of the word (in this case, noun), the second is what I have considered to be the hyphenation, and the third is the pronunciation, and I always thought the syllable boundaries are part of the pronunciation. Please correct me if I am wrong (which may very well be the case).
    – Binarus
    Aug 4 at 12:11








  • 1




    MW is showing the syllable boundaries. Generally, hyphenation occurs on syllable boundaries, but there are limits as to which boundaries can be used
    – Hot Licks
    Aug 4 at 12:29






  • 2




    @Hot Licks Please forgive me, but it seems you are wrong regarding what the dictionaries show, at least in the case of MW. Please take a look at my EDIT 3.
    – Binarus
    Aug 4 at 14:08













up vote
3
down vote

favorite
2









up vote
3
down vote

favorite
2






2





I am currently trying to build a database of English words and their hyphenations (end-of-line divisions) (en-US, if it matters), and thereby have come across some words which I have found contradicting hyphenations for. If those words were exotic, I would not be wondering about it, but some of them are frequently used. For example:




  • Germany: Merriam-Webster - Ger-ma-ny; Hunspell (which by far is the most dominant spell checker and hyphenator in the open source scene, driving applications like LibreOffice, OpenOffice, Firefox, Thunderbird and the like) - Ger-many


  • freely: Merriam-Webster - free-ly; Hunspell - freely


  • rapid: Merriam-Webster - rap-id; Hunspell - rapid



I have read a lot of articles (most of them on this site) about hyphenation. The general consensus seems to be that we should look up the respective word and its hyphenation in authoritative sources. But what if those sources contradict each other?



Another advice which often was given was that we just should hyphenate between syllables. Since I am not a native English speaker, this is extremely difficult for me. While I would have done it right with Germany and freely, I would never have done it right with rapid (in my world, it would have been ra-pid).



I always have considered the Oxford English Dictionary to be the most authoritative English dictionary. Imagine my surprise when I saw that they neither show hyphenation nor syllabication. The Wiktionary does show hyphenation, but only for some words; the examples mentioned above, being very common words, are not among them, so it's worthless in this respect.



Could somebody please give me a hint what I should do if two important sources which both can (somehow) be considered authoritative show contradicting hyphenations, and even more important, could somebody please tell me if there is a reliable method to identify words which are suspect in this respect in the first place?



To explain the latter: I am currently using the hunspell data to build my database semi-automatically; otherwise, I couldn't handle it. The hunspell data is the only one I have found to be usable to get the hyphenation of a word quite easily.



As a second step, I would like to be able to identify and separate suspect words, which I then could look up manually in different sources (hoping that only about 5% of the words are suspect).



EDIT 1



As a reaction to one of the comments, I now have found a word where at least 3 characters are left at each side after hyphenation, but where different "authorities" hyphenate differently:



Microsoft Word 2010 hyphenates inconceivable as in-con-ceiv-a-ble, where Merriam-Webster has in-con-ceiv-able.



Another one: Merriam-Webster says cli-ent, where hunspell says client, i.e. does not hyphenate that word at all.



EDIT 2



@Hot Licks has pointed out that the dictionaries are showing syllable boundaries, not hyphenation points (if any). However, at least in case of Merriam-Webster, this is the same. From their dictionary API documentation:



<hw>...</hw>    (text = boldface)
HEADWORD
- This is the first bold word in an entry
- contains "syllable" break points (that is,
end-of-line hyphenation points) here indicated
by asterisks, which will translate to raised dot,
{point} in Merriam-Webster font.
- may contain superscript homograph numbers
{h,1}, {h,2}, etc., in the same font (bold)
- single word space after <hw> field


Please note the text following the second hyphen. IMHO, that means that each syllable boundary is a hyphenation point, and vice versa.



EDIT 3



I have found more precise information. From Merriam-Webster's guide to pronunciation:




Hyphens are used to separate syllables in pronunciation
transcriptions. [...]



The centered dots in boldface entry words indicate potential
end-of-line division points and not syllabication. [...] As a
result, the hyphens indicating syllable breaks and the centered
dots indicating end-of-line division often do not fall in the same
places.











share|improve this question















I am currently trying to build a database of English words and their hyphenations (end-of-line divisions) (en-US, if it matters), and thereby have come across some words which I have found contradicting hyphenations for. If those words were exotic, I would not be wondering about it, but some of them are frequently used. For example:




  • Germany: Merriam-Webster - Ger-ma-ny; Hunspell (which by far is the most dominant spell checker and hyphenator in the open source scene, driving applications like LibreOffice, OpenOffice, Firefox, Thunderbird and the like) - Ger-many


  • freely: Merriam-Webster - free-ly; Hunspell - freely


  • rapid: Merriam-Webster - rap-id; Hunspell - rapid



I have read a lot of articles (most of them on this site) about hyphenation. The general consensus seems to be that we should look up the respective word and its hyphenation in authoritative sources. But what if those sources contradict each other?



Another advice which often was given was that we just should hyphenate between syllables. Since I am not a native English speaker, this is extremely difficult for me. While I would have done it right with Germany and freely, I would never have done it right with rapid (in my world, it would have been ra-pid).



I always have considered the Oxford English Dictionary to be the most authoritative English dictionary. Imagine my surprise when I saw that they neither show hyphenation nor syllabication. The Wiktionary does show hyphenation, but only for some words; the examples mentioned above, being very common words, are not among them, so it's worthless in this respect.



Could somebody please give me a hint what I should do if two important sources which both can (somehow) be considered authoritative show contradicting hyphenations, and even more important, could somebody please tell me if there is a reliable method to identify words which are suspect in this respect in the first place?



To explain the latter: I am currently using the hunspell data to build my database semi-automatically; otherwise, I couldn't handle it. The hunspell data is the only one I have found to be usable to get the hyphenation of a word quite easily.



As a second step, I would like to be able to identify and separate suspect words, which I then could look up manually in different sources (hoping that only about 5% of the words are suspect).



EDIT 1



As a reaction to one of the comments, I now have found a word where at least 3 characters are left at each side after hyphenation, but where different "authorities" hyphenate differently:



Microsoft Word 2010 hyphenates inconceivable as in-con-ceiv-a-ble, where Merriam-Webster has in-con-ceiv-able.



Another one: Merriam-Webster says cli-ent, where hunspell says client, i.e. does not hyphenate that word at all.



EDIT 2



@Hot Licks has pointed out that the dictionaries are showing syllable boundaries, not hyphenation points (if any). However, at least in case of Merriam-Webster, this is the same. From their dictionary API documentation:



<hw>...</hw>    (text = boldface)
HEADWORD
- This is the first bold word in an entry
- contains "syllable" break points (that is,
end-of-line hyphenation points) here indicated
by asterisks, which will translate to raised dot,
{point} in Merriam-Webster font.
- may contain superscript homograph numbers
{h,1}, {h,2}, etc., in the same font (bold)
- single word space after <hw> field


Please note the text following the second hyphen. IMHO, that means that each syllable boundary is a hyphenation point, and vice versa.



EDIT 3



I have found more precise information. From Merriam-Webster's guide to pronunciation:




Hyphens are used to separate syllables in pronunciation
transcriptions. [...]



The centered dots in boldface entry words indicate potential
end-of-line division points and not syllabication. [...] As a
result, the hyphens indicating syllable breaks and the centered
dots indicating end-of-line division often do not fall in the same
places.








hyphenation dictionaries contradiction






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Aug 4 at 14:39

























asked Aug 4 at 10:58









Binarus

1355




1355





bumped to the homepage by Community 4 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







bumped to the homepage by Community 4 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.










  • 1




    Generally speaking, you should never hyphenate a word and leave fewer than 3 characters on either side.
    – Hot Licks
    Aug 4 at 11:38






  • 2




    What the dictionaries show is not the hyphenation points but the syllable boundaries.
    – Hot Licks
    Aug 4 at 12:00






  • 1




    Please correct me if I am wrong, but Merriam-Webster seems to show the hyphenations. For example, consider merriam-webster.com/dictionary/calculation. Directly under the title (in giant letters), there are three entries. The left denotes the type of the word (in this case, noun), the second is what I have considered to be the hyphenation, and the third is the pronunciation, and I always thought the syllable boundaries are part of the pronunciation. Please correct me if I am wrong (which may very well be the case).
    – Binarus
    Aug 4 at 12:11








  • 1




    MW is showing the syllable boundaries. Generally, hyphenation occurs on syllable boundaries, but there are limits as to which boundaries can be used
    – Hot Licks
    Aug 4 at 12:29






  • 2




    @Hot Licks Please forgive me, but it seems you are wrong regarding what the dictionaries show, at least in the case of MW. Please take a look at my EDIT 3.
    – Binarus
    Aug 4 at 14:08














  • 1




    Generally speaking, you should never hyphenate a word and leave fewer than 3 characters on either side.
    – Hot Licks
    Aug 4 at 11:38






  • 2




    What the dictionaries show is not the hyphenation points but the syllable boundaries.
    – Hot Licks
    Aug 4 at 12:00






  • 1




    Please correct me if I am wrong, but Merriam-Webster seems to show the hyphenations. For example, consider merriam-webster.com/dictionary/calculation. Directly under the title (in giant letters), there are three entries. The left denotes the type of the word (in this case, noun), the second is what I have considered to be the hyphenation, and the third is the pronunciation, and I always thought the syllable boundaries are part of the pronunciation. Please correct me if I am wrong (which may very well be the case).
    – Binarus
    Aug 4 at 12:11








  • 1




    MW is showing the syllable boundaries. Generally, hyphenation occurs on syllable boundaries, but there are limits as to which boundaries can be used
    – Hot Licks
    Aug 4 at 12:29






  • 2




    @Hot Licks Please forgive me, but it seems you are wrong regarding what the dictionaries show, at least in the case of MW. Please take a look at my EDIT 3.
    – Binarus
    Aug 4 at 14:08








1




1




Generally speaking, you should never hyphenate a word and leave fewer than 3 characters on either side.
– Hot Licks
Aug 4 at 11:38




Generally speaking, you should never hyphenate a word and leave fewer than 3 characters on either side.
– Hot Licks
Aug 4 at 11:38




2




2




What the dictionaries show is not the hyphenation points but the syllable boundaries.
– Hot Licks
Aug 4 at 12:00




What the dictionaries show is not the hyphenation points but the syllable boundaries.
– Hot Licks
Aug 4 at 12:00




1




1




Please correct me if I am wrong, but Merriam-Webster seems to show the hyphenations. For example, consider merriam-webster.com/dictionary/calculation. Directly under the title (in giant letters), there are three entries. The left denotes the type of the word (in this case, noun), the second is what I have considered to be the hyphenation, and the third is the pronunciation, and I always thought the syllable boundaries are part of the pronunciation. Please correct me if I am wrong (which may very well be the case).
– Binarus
Aug 4 at 12:11






Please correct me if I am wrong, but Merriam-Webster seems to show the hyphenations. For example, consider merriam-webster.com/dictionary/calculation. Directly under the title (in giant letters), there are three entries. The left denotes the type of the word (in this case, noun), the second is what I have considered to be the hyphenation, and the third is the pronunciation, and I always thought the syllable boundaries are part of the pronunciation. Please correct me if I am wrong (which may very well be the case).
– Binarus
Aug 4 at 12:11






1




1




MW is showing the syllable boundaries. Generally, hyphenation occurs on syllable boundaries, but there are limits as to which boundaries can be used
– Hot Licks
Aug 4 at 12:29




MW is showing the syllable boundaries. Generally, hyphenation occurs on syllable boundaries, but there are limits as to which boundaries can be used
– Hot Licks
Aug 4 at 12:29




2




2




@Hot Licks Please forgive me, but it seems you are wrong regarding what the dictionaries show, at least in the case of MW. Please take a look at my EDIT 3.
– Binarus
Aug 4 at 14:08




@Hot Licks Please forgive me, but it seems you are wrong regarding what the dictionaries show, at least in the case of MW. Please take a look at my EDIT 3.
– Binarus
Aug 4 at 14:08










1 Answer
1






active

oldest

votes

















up vote
0
down vote













If you search hunspell hyphenation you should find an end-of-line hyphenation library (import from TeX) that should suit your needs. The min right and left lengths are variables.



I don't know if this can detect part-of-speech such as (verb) pro-ject vs (noun) proj-ect.






share|improve this answer





















  • I don't believe this answers the question, since Hunspell is already mentioned in the question.
    – Laurel
    Nov 2 at 3:10










  • Hunspell was being used with partial success and I'm suggesting an add-on that should complete the task, rather than re-doing everything from scratch.
    – AmI
    Nov 2 at 3:20










  • @Aml In fact, this is what I currently use for the automatic part. The problem is that Hunspell's hyphenation differs a lot from Merriam-Webster's, for example. Hence, whenever I feel that Hunspell may have missed a hyphenation point, I manually look the word up in other dictionaries. This is still quite painful, but better than nothing. The most difficult part (for me) is to determine if Hunspell might have missed something, so I sometimes end up unnecessarily looking up 10 words in a row manually just because I can't trust Hunspell completely ... What a pity that MW does not offer an API.
    – Binarus
    Nov 20 at 8:30












  • I'm sorry -- I didn't realize that you were already using hyphen.tex. Because it is rule based rather than a full dictionary, it can't reliably handle breaks leaving less than 3 letters. It does have an exception list at the end where you can add on, but change the file name if you customize it. You could also build hyph_en_US.dic and customize that.
    – AmI
    Nov 20 at 18:34











Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "97"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fenglish.stackexchange.com%2fquestions%2f458817%2fhyphenation-end-of-line-division-of-germany-and-some-other-common-words%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
0
down vote













If you search hunspell hyphenation you should find an end-of-line hyphenation library (import from TeX) that should suit your needs. The min right and left lengths are variables.



I don't know if this can detect part-of-speech such as (verb) pro-ject vs (noun) proj-ect.






share|improve this answer





















  • I don't believe this answers the question, since Hunspell is already mentioned in the question.
    – Laurel
    Nov 2 at 3:10










  • Hunspell was being used with partial success and I'm suggesting an add-on that should complete the task, rather than re-doing everything from scratch.
    – AmI
    Nov 2 at 3:20










  • @Aml In fact, this is what I currently use for the automatic part. The problem is that Hunspell's hyphenation differs a lot from Merriam-Webster's, for example. Hence, whenever I feel that Hunspell may have missed a hyphenation point, I manually look the word up in other dictionaries. This is still quite painful, but better than nothing. The most difficult part (for me) is to determine if Hunspell might have missed something, so I sometimes end up unnecessarily looking up 10 words in a row manually just because I can't trust Hunspell completely ... What a pity that MW does not offer an API.
    – Binarus
    Nov 20 at 8:30












  • I'm sorry -- I didn't realize that you were already using hyphen.tex. Because it is rule based rather than a full dictionary, it can't reliably handle breaks leaving less than 3 letters. It does have an exception list at the end where you can add on, but change the file name if you customize it. You could also build hyph_en_US.dic and customize that.
    – AmI
    Nov 20 at 18:34















up vote
0
down vote













If you search hunspell hyphenation you should find an end-of-line hyphenation library (import from TeX) that should suit your needs. The min right and left lengths are variables.



I don't know if this can detect part-of-speech such as (verb) pro-ject vs (noun) proj-ect.






share|improve this answer





















  • I don't believe this answers the question, since Hunspell is already mentioned in the question.
    – Laurel
    Nov 2 at 3:10










  • Hunspell was being used with partial success and I'm suggesting an add-on that should complete the task, rather than re-doing everything from scratch.
    – AmI
    Nov 2 at 3:20










  • @Aml In fact, this is what I currently use for the automatic part. The problem is that Hunspell's hyphenation differs a lot from Merriam-Webster's, for example. Hence, whenever I feel that Hunspell may have missed a hyphenation point, I manually look the word up in other dictionaries. This is still quite painful, but better than nothing. The most difficult part (for me) is to determine if Hunspell might have missed something, so I sometimes end up unnecessarily looking up 10 words in a row manually just because I can't trust Hunspell completely ... What a pity that MW does not offer an API.
    – Binarus
    Nov 20 at 8:30












  • I'm sorry -- I didn't realize that you were already using hyphen.tex. Because it is rule based rather than a full dictionary, it can't reliably handle breaks leaving less than 3 letters. It does have an exception list at the end where you can add on, but change the file name if you customize it. You could also build hyph_en_US.dic and customize that.
    – AmI
    Nov 20 at 18:34













up vote
0
down vote










up vote
0
down vote









If you search hunspell hyphenation you should find an end-of-line hyphenation library (import from TeX) that should suit your needs. The min right and left lengths are variables.



I don't know if this can detect part-of-speech such as (verb) pro-ject vs (noun) proj-ect.






share|improve this answer












If you search hunspell hyphenation you should find an end-of-line hyphenation library (import from TeX) that should suit your needs. The min right and left lengths are variables.



I don't know if this can detect part-of-speech such as (verb) pro-ject vs (noun) proj-ect.







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 2 at 3:08









AmI

3,2041517




3,2041517












  • I don't believe this answers the question, since Hunspell is already mentioned in the question.
    – Laurel
    Nov 2 at 3:10










  • Hunspell was being used with partial success and I'm suggesting an add-on that should complete the task, rather than re-doing everything from scratch.
    – AmI
    Nov 2 at 3:20










  • @Aml In fact, this is what I currently use for the automatic part. The problem is that Hunspell's hyphenation differs a lot from Merriam-Webster's, for example. Hence, whenever I feel that Hunspell may have missed a hyphenation point, I manually look the word up in other dictionaries. This is still quite painful, but better than nothing. The most difficult part (for me) is to determine if Hunspell might have missed something, so I sometimes end up unnecessarily looking up 10 words in a row manually just because I can't trust Hunspell completely ... What a pity that MW does not offer an API.
    – Binarus
    Nov 20 at 8:30












  • I'm sorry -- I didn't realize that you were already using hyphen.tex. Because it is rule based rather than a full dictionary, it can't reliably handle breaks leaving less than 3 letters. It does have an exception list at the end where you can add on, but change the file name if you customize it. You could also build hyph_en_US.dic and customize that.
    – AmI
    Nov 20 at 18:34


















  • I don't believe this answers the question, since Hunspell is already mentioned in the question.
    – Laurel
    Nov 2 at 3:10










  • Hunspell was being used with partial success and I'm suggesting an add-on that should complete the task, rather than re-doing everything from scratch.
    – AmI
    Nov 2 at 3:20










  • @Aml In fact, this is what I currently use for the automatic part. The problem is that Hunspell's hyphenation differs a lot from Merriam-Webster's, for example. Hence, whenever I feel that Hunspell may have missed a hyphenation point, I manually look the word up in other dictionaries. This is still quite painful, but better than nothing. The most difficult part (for me) is to determine if Hunspell might have missed something, so I sometimes end up unnecessarily looking up 10 words in a row manually just because I can't trust Hunspell completely ... What a pity that MW does not offer an API.
    – Binarus
    Nov 20 at 8:30












  • I'm sorry -- I didn't realize that you were already using hyphen.tex. Because it is rule based rather than a full dictionary, it can't reliably handle breaks leaving less than 3 letters. It does have an exception list at the end where you can add on, but change the file name if you customize it. You could also build hyph_en_US.dic and customize that.
    – AmI
    Nov 20 at 18:34
















I don't believe this answers the question, since Hunspell is already mentioned in the question.
– Laurel
Nov 2 at 3:10




I don't believe this answers the question, since Hunspell is already mentioned in the question.
– Laurel
Nov 2 at 3:10












Hunspell was being used with partial success and I'm suggesting an add-on that should complete the task, rather than re-doing everything from scratch.
– AmI
Nov 2 at 3:20




Hunspell was being used with partial success and I'm suggesting an add-on that should complete the task, rather than re-doing everything from scratch.
– AmI
Nov 2 at 3:20












@Aml In fact, this is what I currently use for the automatic part. The problem is that Hunspell's hyphenation differs a lot from Merriam-Webster's, for example. Hence, whenever I feel that Hunspell may have missed a hyphenation point, I manually look the word up in other dictionaries. This is still quite painful, but better than nothing. The most difficult part (for me) is to determine if Hunspell might have missed something, so I sometimes end up unnecessarily looking up 10 words in a row manually just because I can't trust Hunspell completely ... What a pity that MW does not offer an API.
– Binarus
Nov 20 at 8:30






@Aml In fact, this is what I currently use for the automatic part. The problem is that Hunspell's hyphenation differs a lot from Merriam-Webster's, for example. Hence, whenever I feel that Hunspell may have missed a hyphenation point, I manually look the word up in other dictionaries. This is still quite painful, but better than nothing. The most difficult part (for me) is to determine if Hunspell might have missed something, so I sometimes end up unnecessarily looking up 10 words in a row manually just because I can't trust Hunspell completely ... What a pity that MW does not offer an API.
– Binarus
Nov 20 at 8:30














I'm sorry -- I didn't realize that you were already using hyphen.tex. Because it is rule based rather than a full dictionary, it can't reliably handle breaks leaving less than 3 letters. It does have an exception list at the end where you can add on, but change the file name if you customize it. You could also build hyph_en_US.dic and customize that.
– AmI
Nov 20 at 18:34




I'm sorry -- I didn't realize that you were already using hyphen.tex. Because it is rule based rather than a full dictionary, it can't reliably handle breaks leaving less than 3 letters. It does have an exception list at the end where you can add on, but change the file name if you customize it. You could also build hyph_en_US.dic and customize that.
– AmI
Nov 20 at 18:34


















draft saved

draft discarded




















































Thanks for contributing an answer to English Language & Usage Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fenglish.stackexchange.com%2fquestions%2f458817%2fhyphenation-end-of-line-division-of-germany-and-some-other-common-words%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

How did Captain America manage to do this?

迪纳利

南乌拉尔铁路局