What difference does it make matching a word with/without a trailing whitespace?












10















I am learning shell-scripting and for that I am using HackerRank. There is a question related to sed on the same site: 'Sed' command #1:




For each line in a given input file, transform the first occurrence of the word 'the' with 'this'. The search and transformation should be strictly case sensitive.




First of all I tried,



sed 's/the/this/'


but in that sample test case failed. Then I tried



sed 's/the /this /'


and it worked. So, the question arises what difference did the whitespaces created? Am I missing something here?










share|improve this question









New contributor




JHA is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





















  • I assume the first version also "worked", but not as you expected. It should have replaced the first occurrence of the letter sequence "the", but you probably looked at the first occurrence of the word " the ".

    – Dubu
    2 days ago











  • Well, in thiseory, yes, in practice, no.

    – Rolf
    yesterday


















10















I am learning shell-scripting and for that I am using HackerRank. There is a question related to sed on the same site: 'Sed' command #1:




For each line in a given input file, transform the first occurrence of the word 'the' with 'this'. The search and transformation should be strictly case sensitive.




First of all I tried,



sed 's/the/this/'


but in that sample test case failed. Then I tried



sed 's/the /this /'


and it worked. So, the question arises what difference did the whitespaces created? Am I missing something here?










share|improve this question









New contributor




JHA is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





















  • I assume the first version also "worked", but not as you expected. It should have replaced the first occurrence of the letter sequence "the", but you probably looked at the first occurrence of the word " the ".

    – Dubu
    2 days ago











  • Well, in thiseory, yes, in practice, no.

    – Rolf
    yesterday
















10












10








10


1






I am learning shell-scripting and for that I am using HackerRank. There is a question related to sed on the same site: 'Sed' command #1:




For each line in a given input file, transform the first occurrence of the word 'the' with 'this'. The search and transformation should be strictly case sensitive.




First of all I tried,



sed 's/the/this/'


but in that sample test case failed. Then I tried



sed 's/the /this /'


and it worked. So, the question arises what difference did the whitespaces created? Am I missing something here?










share|improve this question









New contributor




JHA is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












I am learning shell-scripting and for that I am using HackerRank. There is a question related to sed on the same site: 'Sed' command #1:




For each line in a given input file, transform the first occurrence of the word 'the' with 'this'. The search and transformation should be strictly case sensitive.




First of all I tried,



sed 's/the/this/'


but in that sample test case failed. Then I tried



sed 's/the /this /'


and it worked. So, the question arises what difference did the whitespaces created? Am I missing something here?







sed whitespace






share|improve this question









New contributor




JHA is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




JHA is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 2 days ago









Kusalananda

139k17259432




139k17259432






New contributor




JHA is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked Mar 31 at 20:33









JHAJHA

575




575




New contributor




JHA is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





JHA is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






JHA is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.













  • I assume the first version also "worked", but not as you expected. It should have replaced the first occurrence of the letter sequence "the", but you probably looked at the first occurrence of the word " the ".

    – Dubu
    2 days ago











  • Well, in thiseory, yes, in practice, no.

    – Rolf
    yesterday





















  • I assume the first version also "worked", but not as you expected. It should have replaced the first occurrence of the letter sequence "the", but you probably looked at the first occurrence of the word " the ".

    – Dubu
    2 days ago











  • Well, in thiseory, yes, in practice, no.

    – Rolf
    yesterday



















I assume the first version also "worked", but not as you expected. It should have replaced the first occurrence of the letter sequence "the", but you probably looked at the first occurrence of the word " the ".

– Dubu
2 days ago





I assume the first version also "worked", but not as you expected. It should have replaced the first occurrence of the letter sequence "the", but you probably looked at the first occurrence of the word " the ".

– Dubu
2 days ago













Well, in thiseory, yes, in practice, no.

– Rolf
yesterday







Well, in thiseory, yes, in practice, no.

– Rolf
yesterday












3 Answers
3






active

oldest

votes


















6














The difference is whether there is a space after the in the input text.

For instance:



With a sentence without a space, no replacement:



$ echo 'theman' | sed 's/the /this /'
theman


With a sentence with a space, works as expected:



$ echo 'the man' | sed 's/the /this /'
this man


With a sentence with another whitespace character,
no replacement will occur:



$ echo -e 'thetman' | sed 's/the /this /'
the man





share|improve this answer










New contributor




BDR is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





















  • I missed that. I had to take "the" as a string. Not a substring.

    – JHA
    Mar 31 at 20:53






  • 1





    @JHA: It also matters at the end of a line. e.g. the word "the" could appear at the end of a line as part of a file with line wrapping, but still be in the middle of a paragraph and thus still be a normal word in an English sentence. the( |$) might be closer to working, if that Extended regex works. Anyway, IDK what you mean "as a string" vs. substring. In both cases it's a substring of the whole line, and your testcases are insufficient to detect the cases where "the " fails. Kusalanada's answer is significantly better, I'd recommend accepting it.

    – Peter Cordes
    2 days ago





















18














It's a cheap and error-prone way of doing word matching.



Note that the with a space after it does not match the word thereby, so matching with a space after the avoids matching that string at the start of words. However, it still does match bathe (if followed by a space), and it does not match the at the end of a line.



To match the word the properly (or any other word), you should not use spaces around the word, as that would prevent you from matching it at the start or end of lines or if it's flanked by any other non-word character, such as any punctuation or tab character, for example.



Instead, use a zero-width word boundary pattern:



sed 's/<the>/this/'


The < and > matches the boundaries before and after the word, i.e. the space between a word character and a non-word character. A word character is generally any character matching [[:alnum:]_] (or [A-Za-z0-9_] in the POSIX locale).



With GNU sed, you could also use b in place of < and >:



sed 's/btheb/this/'





share|improve this answer

































    6














    sed works with regular expressions.
    Using sed 's/the /this /' you just make the space after the part of the matched pattern.



    Using sed 's/the/this/' you replace all occurrences of the with this no matter if a space exists after the.



    In the HackerRank exercise, the result is the same because to replace the with this is logical... you replace just a pro-noun which by default is followed by space (grammar rules).



    You can see the difference if you try for example to capitalize the in the word the theater:



    echo 'the theater' |sed 's/the /THE /g'
    THE theater
    #theater is ignored since the is not followed by space

    echo 'the theater' |sed 's/the/THE/g'
    THE THEater
    #both the are capitalized.





    share|improve this answer


























    • Thank you for the answer. Appreciated :)

      – JHA
      Mar 31 at 21:02











    • "you replace all occurrences" To be clear: Without the g after the replacement text, you replace only the first occurrence.

      – Dubu
      2 days ago












    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "106"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });






    JHA is a new contributor. Be nice, and check out our Code of Conduct.










    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f509765%2fwhat-difference-does-it-make-matching-a-word-with-without-a-trailing-whitespace%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    6














    The difference is whether there is a space after the in the input text.

    For instance:



    With a sentence without a space, no replacement:



    $ echo 'theman' | sed 's/the /this /'
    theman


    With a sentence with a space, works as expected:



    $ echo 'the man' | sed 's/the /this /'
    this man


    With a sentence with another whitespace character,
    no replacement will occur:



    $ echo -e 'thetman' | sed 's/the /this /'
    the man





    share|improve this answer










    New contributor




    BDR is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.





















    • I missed that. I had to take "the" as a string. Not a substring.

      – JHA
      Mar 31 at 20:53






    • 1





      @JHA: It also matters at the end of a line. e.g. the word "the" could appear at the end of a line as part of a file with line wrapping, but still be in the middle of a paragraph and thus still be a normal word in an English sentence. the( |$) might be closer to working, if that Extended regex works. Anyway, IDK what you mean "as a string" vs. substring. In both cases it's a substring of the whole line, and your testcases are insufficient to detect the cases where "the " fails. Kusalanada's answer is significantly better, I'd recommend accepting it.

      – Peter Cordes
      2 days ago


















    6














    The difference is whether there is a space after the in the input text.

    For instance:



    With a sentence without a space, no replacement:



    $ echo 'theman' | sed 's/the /this /'
    theman


    With a sentence with a space, works as expected:



    $ echo 'the man' | sed 's/the /this /'
    this man


    With a sentence with another whitespace character,
    no replacement will occur:



    $ echo -e 'thetman' | sed 's/the /this /'
    the man





    share|improve this answer










    New contributor




    BDR is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.





















    • I missed that. I had to take "the" as a string. Not a substring.

      – JHA
      Mar 31 at 20:53






    • 1





      @JHA: It also matters at the end of a line. e.g. the word "the" could appear at the end of a line as part of a file with line wrapping, but still be in the middle of a paragraph and thus still be a normal word in an English sentence. the( |$) might be closer to working, if that Extended regex works. Anyway, IDK what you mean "as a string" vs. substring. In both cases it's a substring of the whole line, and your testcases are insufficient to detect the cases where "the " fails. Kusalanada's answer is significantly better, I'd recommend accepting it.

      – Peter Cordes
      2 days ago
















    6












    6








    6







    The difference is whether there is a space after the in the input text.

    For instance:



    With a sentence without a space, no replacement:



    $ echo 'theman' | sed 's/the /this /'
    theman


    With a sentence with a space, works as expected:



    $ echo 'the man' | sed 's/the /this /'
    this man


    With a sentence with another whitespace character,
    no replacement will occur:



    $ echo -e 'thetman' | sed 's/the /this /'
    the man





    share|improve this answer










    New contributor




    BDR is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.










    The difference is whether there is a space after the in the input text.

    For instance:



    With a sentence without a space, no replacement:



    $ echo 'theman' | sed 's/the /this /'
    theman


    With a sentence with a space, works as expected:



    $ echo 'the man' | sed 's/the /this /'
    this man


    With a sentence with another whitespace character,
    no replacement will occur:



    $ echo -e 'thetman' | sed 's/the /this /'
    the man






    share|improve this answer










    New contributor




    BDR is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.









    share|improve this answer



    share|improve this answer








    edited Mar 31 at 21:31









    G-Man

    13.6k93770




    13.6k93770






    New contributor




    BDR is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.









    answered Mar 31 at 20:44









    BDRBDR

    1035




    1035




    New contributor




    BDR is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.





    New contributor





    BDR is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.






    BDR is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.













    • I missed that. I had to take "the" as a string. Not a substring.

      – JHA
      Mar 31 at 20:53






    • 1





      @JHA: It also matters at the end of a line. e.g. the word "the" could appear at the end of a line as part of a file with line wrapping, but still be in the middle of a paragraph and thus still be a normal word in an English sentence. the( |$) might be closer to working, if that Extended regex works. Anyway, IDK what you mean "as a string" vs. substring. In both cases it's a substring of the whole line, and your testcases are insufficient to detect the cases where "the " fails. Kusalanada's answer is significantly better, I'd recommend accepting it.

      – Peter Cordes
      2 days ago





















    • I missed that. I had to take "the" as a string. Not a substring.

      – JHA
      Mar 31 at 20:53






    • 1





      @JHA: It also matters at the end of a line. e.g. the word "the" could appear at the end of a line as part of a file with line wrapping, but still be in the middle of a paragraph and thus still be a normal word in an English sentence. the( |$) might be closer to working, if that Extended regex works. Anyway, IDK what you mean "as a string" vs. substring. In both cases it's a substring of the whole line, and your testcases are insufficient to detect the cases where "the " fails. Kusalanada's answer is significantly better, I'd recommend accepting it.

      – Peter Cordes
      2 days ago



















    I missed that. I had to take "the" as a string. Not a substring.

    – JHA
    Mar 31 at 20:53





    I missed that. I had to take "the" as a string. Not a substring.

    – JHA
    Mar 31 at 20:53




    1




    1





    @JHA: It also matters at the end of a line. e.g. the word "the" could appear at the end of a line as part of a file with line wrapping, but still be in the middle of a paragraph and thus still be a normal word in an English sentence. the( |$) might be closer to working, if that Extended regex works. Anyway, IDK what you mean "as a string" vs. substring. In both cases it's a substring of the whole line, and your testcases are insufficient to detect the cases where "the " fails. Kusalanada's answer is significantly better, I'd recommend accepting it.

    – Peter Cordes
    2 days ago







    @JHA: It also matters at the end of a line. e.g. the word "the" could appear at the end of a line as part of a file with line wrapping, but still be in the middle of a paragraph and thus still be a normal word in an English sentence. the( |$) might be closer to working, if that Extended regex works. Anyway, IDK what you mean "as a string" vs. substring. In both cases it's a substring of the whole line, and your testcases are insufficient to detect the cases where "the " fails. Kusalanada's answer is significantly better, I'd recommend accepting it.

    – Peter Cordes
    2 days ago















    18














    It's a cheap and error-prone way of doing word matching.



    Note that the with a space after it does not match the word thereby, so matching with a space after the avoids matching that string at the start of words. However, it still does match bathe (if followed by a space), and it does not match the at the end of a line.



    To match the word the properly (or any other word), you should not use spaces around the word, as that would prevent you from matching it at the start or end of lines or if it's flanked by any other non-word character, such as any punctuation or tab character, for example.



    Instead, use a zero-width word boundary pattern:



    sed 's/<the>/this/'


    The < and > matches the boundaries before and after the word, i.e. the space between a word character and a non-word character. A word character is generally any character matching [[:alnum:]_] (or [A-Za-z0-9_] in the POSIX locale).



    With GNU sed, you could also use b in place of < and >:



    sed 's/btheb/this/'





    share|improve this answer






























      18














      It's a cheap and error-prone way of doing word matching.



      Note that the with a space after it does not match the word thereby, so matching with a space after the avoids matching that string at the start of words. However, it still does match bathe (if followed by a space), and it does not match the at the end of a line.



      To match the word the properly (or any other word), you should not use spaces around the word, as that would prevent you from matching it at the start or end of lines or if it's flanked by any other non-word character, such as any punctuation or tab character, for example.



      Instead, use a zero-width word boundary pattern:



      sed 's/<the>/this/'


      The < and > matches the boundaries before and after the word, i.e. the space between a word character and a non-word character. A word character is generally any character matching [[:alnum:]_] (or [A-Za-z0-9_] in the POSIX locale).



      With GNU sed, you could also use b in place of < and >:



      sed 's/btheb/this/'





      share|improve this answer




























        18












        18








        18







        It's a cheap and error-prone way of doing word matching.



        Note that the with a space after it does not match the word thereby, so matching with a space after the avoids matching that string at the start of words. However, it still does match bathe (if followed by a space), and it does not match the at the end of a line.



        To match the word the properly (or any other word), you should not use spaces around the word, as that would prevent you from matching it at the start or end of lines or if it's flanked by any other non-word character, such as any punctuation or tab character, for example.



        Instead, use a zero-width word boundary pattern:



        sed 's/<the>/this/'


        The < and > matches the boundaries before and after the word, i.e. the space between a word character and a non-word character. A word character is generally any character matching [[:alnum:]_] (or [A-Za-z0-9_] in the POSIX locale).



        With GNU sed, you could also use b in place of < and >:



        sed 's/btheb/this/'





        share|improve this answer















        It's a cheap and error-prone way of doing word matching.



        Note that the with a space after it does not match the word thereby, so matching with a space after the avoids matching that string at the start of words. However, it still does match bathe (if followed by a space), and it does not match the at the end of a line.



        To match the word the properly (or any other word), you should not use spaces around the word, as that would prevent you from matching it at the start or end of lines or if it's flanked by any other non-word character, such as any punctuation or tab character, for example.



        Instead, use a zero-width word boundary pattern:



        sed 's/<the>/this/'


        The < and > matches the boundaries before and after the word, i.e. the space between a word character and a non-word character. A word character is generally any character matching [[:alnum:]_] (or [A-Za-z0-9_] in the POSIX locale).



        With GNU sed, you could also use b in place of < and >:



        sed 's/btheb/this/'






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited 2 days ago

























        answered Mar 31 at 20:53









        KusalanandaKusalananda

        139k17259432




        139k17259432























            6














            sed works with regular expressions.
            Using sed 's/the /this /' you just make the space after the part of the matched pattern.



            Using sed 's/the/this/' you replace all occurrences of the with this no matter if a space exists after the.



            In the HackerRank exercise, the result is the same because to replace the with this is logical... you replace just a pro-noun which by default is followed by space (grammar rules).



            You can see the difference if you try for example to capitalize the in the word the theater:



            echo 'the theater' |sed 's/the /THE /g'
            THE theater
            #theater is ignored since the is not followed by space

            echo 'the theater' |sed 's/the/THE/g'
            THE THEater
            #both the are capitalized.





            share|improve this answer


























            • Thank you for the answer. Appreciated :)

              – JHA
              Mar 31 at 21:02











            • "you replace all occurrences" To be clear: Without the g after the replacement text, you replace only the first occurrence.

              – Dubu
              2 days ago
















            6














            sed works with regular expressions.
            Using sed 's/the /this /' you just make the space after the part of the matched pattern.



            Using sed 's/the/this/' you replace all occurrences of the with this no matter if a space exists after the.



            In the HackerRank exercise, the result is the same because to replace the with this is logical... you replace just a pro-noun which by default is followed by space (grammar rules).



            You can see the difference if you try for example to capitalize the in the word the theater:



            echo 'the theater' |sed 's/the /THE /g'
            THE theater
            #theater is ignored since the is not followed by space

            echo 'the theater' |sed 's/the/THE/g'
            THE THEater
            #both the are capitalized.





            share|improve this answer


























            • Thank you for the answer. Appreciated :)

              – JHA
              Mar 31 at 21:02











            • "you replace all occurrences" To be clear: Without the g after the replacement text, you replace only the first occurrence.

              – Dubu
              2 days ago














            6












            6








            6







            sed works with regular expressions.
            Using sed 's/the /this /' you just make the space after the part of the matched pattern.



            Using sed 's/the/this/' you replace all occurrences of the with this no matter if a space exists after the.



            In the HackerRank exercise, the result is the same because to replace the with this is logical... you replace just a pro-noun which by default is followed by space (grammar rules).



            You can see the difference if you try for example to capitalize the in the word the theater:



            echo 'the theater' |sed 's/the /THE /g'
            THE theater
            #theater is ignored since the is not followed by space

            echo 'the theater' |sed 's/the/THE/g'
            THE THEater
            #both the are capitalized.





            share|improve this answer















            sed works with regular expressions.
            Using sed 's/the /this /' you just make the space after the part of the matched pattern.



            Using sed 's/the/this/' you replace all occurrences of the with this no matter if a space exists after the.



            In the HackerRank exercise, the result is the same because to replace the with this is logical... you replace just a pro-noun which by default is followed by space (grammar rules).



            You can see the difference if you try for example to capitalize the in the word the theater:



            echo 'the theater' |sed 's/the /THE /g'
            THE theater
            #theater is ignored since the is not followed by space

            echo 'the theater' |sed 's/the/THE/g'
            THE THEater
            #both the are capitalized.






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Mar 31 at 20:57









            JHA

            575




            575










            answered Mar 31 at 20:54









            George VasiliouGeorge Vasiliou

            5,79531030




            5,79531030













            • Thank you for the answer. Appreciated :)

              – JHA
              Mar 31 at 21:02











            • "you replace all occurrences" To be clear: Without the g after the replacement text, you replace only the first occurrence.

              – Dubu
              2 days ago



















            • Thank you for the answer. Appreciated :)

              – JHA
              Mar 31 at 21:02











            • "you replace all occurrences" To be clear: Without the g after the replacement text, you replace only the first occurrence.

              – Dubu
              2 days ago

















            Thank you for the answer. Appreciated :)

            – JHA
            Mar 31 at 21:02





            Thank you for the answer. Appreciated :)

            – JHA
            Mar 31 at 21:02













            "you replace all occurrences" To be clear: Without the g after the replacement text, you replace only the first occurrence.

            – Dubu
            2 days ago





            "you replace all occurrences" To be clear: Without the g after the replacement text, you replace only the first occurrence.

            – Dubu
            2 days ago










            JHA is a new contributor. Be nice, and check out our Code of Conduct.










            draft saved

            draft discarded


















            JHA is a new contributor. Be nice, and check out our Code of Conduct.













            JHA is a new contributor. Be nice, and check out our Code of Conduct.












            JHA is a new contributor. Be nice, and check out our Code of Conduct.
















            Thanks for contributing an answer to Unix & Linux Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f509765%2fwhat-difference-does-it-make-matching-a-word-with-without-a-trailing-whitespace%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How did Captain America manage to do this?

            迪纳利

            南乌拉尔铁路局