BASH Script hangs after some processing on Ubuntu












3















I have been running below script on a Red Hat server, and it works fine and finishes the job. The file I am feeding it, contains half a million lines in it (approximately 500000 lines), and that's why (to finish it faster) I have added an '&' at the end of while loop block



But now I have setup a Desktop with 8 GB of RAM running Ubuntu 18.04 on it, and running the same code only finishes a few thousand lines and then hangs. I read a bit about it and increased the stack limit to unlimited as well and still it hung after 80000 lines or so, Any suggestions about how can I optimize the code or tune my PC parameters to always finish the job?



while read -r CID60
do
{
OLT=$(echo "$CID60" | cut -d"|" -f5)
ONID=${OLT}:$(echo "$CID60" | cut -d, -f2 | sed 's/ //g ; s/).*|//')
echo $ONID,$(echo "$CID60" | cut -d"|" -f3) >> $localpath/CID_$logfile.csv
} &
done < $localpath/$CID7360


Input:



202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ASSN45| Unlocked|12-654-0330|Up|202-00_MSRFKH00OL6|P282018767.C2028 ( network, R1.S1.LT7.PON8.ONT81.SERV1 )|

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ASSN46| Unlocked|12-654-0330|Down|202-00_MSRFKH00OL6|P282017856.C881 ( local, R1.S1.LT7.PON8.ONT81.C1.P1 )|

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ASSN52| Unlocked|12-664-1186|Up|202-00_MSRFKH00OL6|P282012623.C2028 ( network, R1.S1.LT7.PON8.ONT75.SERV1 )|


output:



202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186


my output of interest is 5th column ( separated with pipe | ) being concatenated with part of last column, and then the third column










share|improve this question




















  • 1





    that's an awful lot of processes to fire off at, more or less, the same time. You might want to wait after some number of lines, or investigate other strategies to parallelize the job (such as GNU parallel)

    – glenn jackman
    2 days ago











  • @PerlDuck I have added the input and output of the script. of course it won't run as it is since some of the variables are defined out of this code. Also I am thinking to try sed or awk to do this job, it might be a lot quicker but I need to learn how to write such expression....

    – Ibraheem
    yesterday











  • @glennjackman I have been reading about parallel, can you suggest some way how I can use it in a loop like this one above?

    – Ibraheem
    yesterday











  • Your code seems amenable to a single sed instruction operating on the input file that would run thousands of times faster. awk would also be a solution.

    – xenoid
    yesterday











  • @xenoid can you please suggest some sed expression?

    – Ibraheem
    yesterday
















3















I have been running below script on a Red Hat server, and it works fine and finishes the job. The file I am feeding it, contains half a million lines in it (approximately 500000 lines), and that's why (to finish it faster) I have added an '&' at the end of while loop block



But now I have setup a Desktop with 8 GB of RAM running Ubuntu 18.04 on it, and running the same code only finishes a few thousand lines and then hangs. I read a bit about it and increased the stack limit to unlimited as well and still it hung after 80000 lines or so, Any suggestions about how can I optimize the code or tune my PC parameters to always finish the job?



while read -r CID60
do
{
OLT=$(echo "$CID60" | cut -d"|" -f5)
ONID=${OLT}:$(echo "$CID60" | cut -d, -f2 | sed 's/ //g ; s/).*|//')
echo $ONID,$(echo "$CID60" | cut -d"|" -f3) >> $localpath/CID_$logfile.csv
} &
done < $localpath/$CID7360


Input:



202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ASSN45| Unlocked|12-654-0330|Up|202-00_MSRFKH00OL6|P282018767.C2028 ( network, R1.S1.LT7.PON8.ONT81.SERV1 )|

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ASSN46| Unlocked|12-654-0330|Down|202-00_MSRFKH00OL6|P282017856.C881 ( local, R1.S1.LT7.PON8.ONT81.C1.P1 )|

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ASSN52| Unlocked|12-664-1186|Up|202-00_MSRFKH00OL6|P282012623.C2028 ( network, R1.S1.LT7.PON8.ONT75.SERV1 )|


output:



202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186


my output of interest is 5th column ( separated with pipe | ) being concatenated with part of last column, and then the third column










share|improve this question




















  • 1





    that's an awful lot of processes to fire off at, more or less, the same time. You might want to wait after some number of lines, or investigate other strategies to parallelize the job (such as GNU parallel)

    – glenn jackman
    2 days ago











  • @PerlDuck I have added the input and output of the script. of course it won't run as it is since some of the variables are defined out of this code. Also I am thinking to try sed or awk to do this job, it might be a lot quicker but I need to learn how to write such expression....

    – Ibraheem
    yesterday











  • @glennjackman I have been reading about parallel, can you suggest some way how I can use it in a loop like this one above?

    – Ibraheem
    yesterday











  • Your code seems amenable to a single sed instruction operating on the input file that would run thousands of times faster. awk would also be a solution.

    – xenoid
    yesterday











  • @xenoid can you please suggest some sed expression?

    – Ibraheem
    yesterday














3












3








3








I have been running below script on a Red Hat server, and it works fine and finishes the job. The file I am feeding it, contains half a million lines in it (approximately 500000 lines), and that's why (to finish it faster) I have added an '&' at the end of while loop block



But now I have setup a Desktop with 8 GB of RAM running Ubuntu 18.04 on it, and running the same code only finishes a few thousand lines and then hangs. I read a bit about it and increased the stack limit to unlimited as well and still it hung after 80000 lines or so, Any suggestions about how can I optimize the code or tune my PC parameters to always finish the job?



while read -r CID60
do
{
OLT=$(echo "$CID60" | cut -d"|" -f5)
ONID=${OLT}:$(echo "$CID60" | cut -d, -f2 | sed 's/ //g ; s/).*|//')
echo $ONID,$(echo "$CID60" | cut -d"|" -f3) >> $localpath/CID_$logfile.csv
} &
done < $localpath/$CID7360


Input:



202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ASSN45| Unlocked|12-654-0330|Up|202-00_MSRFKH00OL6|P282018767.C2028 ( network, R1.S1.LT7.PON8.ONT81.SERV1 )|

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ASSN46| Unlocked|12-654-0330|Down|202-00_MSRFKH00OL6|P282017856.C881 ( local, R1.S1.LT7.PON8.ONT81.C1.P1 )|

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ASSN52| Unlocked|12-664-1186|Up|202-00_MSRFKH00OL6|P282012623.C2028 ( network, R1.S1.LT7.PON8.ONT75.SERV1 )|


output:



202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186


my output of interest is 5th column ( separated with pipe | ) being concatenated with part of last column, and then the third column










share|improve this question
















I have been running below script on a Red Hat server, and it works fine and finishes the job. The file I am feeding it, contains half a million lines in it (approximately 500000 lines), and that's why (to finish it faster) I have added an '&' at the end of while loop block



But now I have setup a Desktop with 8 GB of RAM running Ubuntu 18.04 on it, and running the same code only finishes a few thousand lines and then hangs. I read a bit about it and increased the stack limit to unlimited as well and still it hung after 80000 lines or so, Any suggestions about how can I optimize the code or tune my PC parameters to always finish the job?



while read -r CID60
do
{
OLT=$(echo "$CID60" | cut -d"|" -f5)
ONID=${OLT}:$(echo "$CID60" | cut -d, -f2 | sed 's/ //g ; s/).*|//')
echo $ONID,$(echo "$CID60" | cut -d"|" -f3) >> $localpath/CID_$logfile.csv
} &
done < $localpath/$CID7360


Input:



202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ASSN45| Unlocked|12-654-0330|Up|202-00_MSRFKH00OL6|P282018767.C2028 ( network, R1.S1.LT7.PON8.ONT81.SERV1 )|

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ASSN46| Unlocked|12-654-0330|Down|202-00_MSRFKH00OL6|P282017856.C881 ( local, R1.S1.LT7.PON8.ONT81.C1.P1 )|

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ASSN52| Unlocked|12-664-1186|Up|202-00_MSRFKH00OL6|P282012623.C2028 ( network, R1.S1.LT7.PON8.ONT75.SERV1 )|


output:



202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186


my output of interest is 5th column ( separated with pipe | ) being concatenated with part of last column, and then the third column







bash text-processing background-process






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited yesterday









GAD3R

1,523821




1,523821










asked 2 days ago









IbraheemIbraheem

185




185








  • 1





    that's an awful lot of processes to fire off at, more or less, the same time. You might want to wait after some number of lines, or investigate other strategies to parallelize the job (such as GNU parallel)

    – glenn jackman
    2 days ago











  • @PerlDuck I have added the input and output of the script. of course it won't run as it is since some of the variables are defined out of this code. Also I am thinking to try sed or awk to do this job, it might be a lot quicker but I need to learn how to write such expression....

    – Ibraheem
    yesterday











  • @glennjackman I have been reading about parallel, can you suggest some way how I can use it in a loop like this one above?

    – Ibraheem
    yesterday











  • Your code seems amenable to a single sed instruction operating on the input file that would run thousands of times faster. awk would also be a solution.

    – xenoid
    yesterday











  • @xenoid can you please suggest some sed expression?

    – Ibraheem
    yesterday














  • 1





    that's an awful lot of processes to fire off at, more or less, the same time. You might want to wait after some number of lines, or investigate other strategies to parallelize the job (such as GNU parallel)

    – glenn jackman
    2 days ago











  • @PerlDuck I have added the input and output of the script. of course it won't run as it is since some of the variables are defined out of this code. Also I am thinking to try sed or awk to do this job, it might be a lot quicker but I need to learn how to write such expression....

    – Ibraheem
    yesterday











  • @glennjackman I have been reading about parallel, can you suggest some way how I can use it in a loop like this one above?

    – Ibraheem
    yesterday











  • Your code seems amenable to a single sed instruction operating on the input file that would run thousands of times faster. awk would also be a solution.

    – xenoid
    yesterday











  • @xenoid can you please suggest some sed expression?

    – Ibraheem
    yesterday








1




1





that's an awful lot of processes to fire off at, more or less, the same time. You might want to wait after some number of lines, or investigate other strategies to parallelize the job (such as GNU parallel)

– glenn jackman
2 days ago





that's an awful lot of processes to fire off at, more or less, the same time. You might want to wait after some number of lines, or investigate other strategies to parallelize the job (such as GNU parallel)

– glenn jackman
2 days ago













@PerlDuck I have added the input and output of the script. of course it won't run as it is since some of the variables are defined out of this code. Also I am thinking to try sed or awk to do this job, it might be a lot quicker but I need to learn how to write such expression....

– Ibraheem
yesterday





@PerlDuck I have added the input and output of the script. of course it won't run as it is since some of the variables are defined out of this code. Also I am thinking to try sed or awk to do this job, it might be a lot quicker but I need to learn how to write such expression....

– Ibraheem
yesterday













@glennjackman I have been reading about parallel, can you suggest some way how I can use it in a loop like this one above?

– Ibraheem
yesterday





@glennjackman I have been reading about parallel, can you suggest some way how I can use it in a loop like this one above?

– Ibraheem
yesterday













Your code seems amenable to a single sed instruction operating on the input file that would run thousands of times faster. awk would also be a solution.

– xenoid
yesterday





Your code seems amenable to a single sed instruction operating on the input file that would run thousands of times faster. awk would also be a solution.

– xenoid
yesterday













@xenoid can you please suggest some sed expression?

– Ibraheem
yesterday





@xenoid can you please suggest some sed expression?

– Ibraheem
yesterday










4 Answers
4






active

oldest

votes


















3














Perl solution



This script doesn't do anything in parallel but is quite fast regardless.
Save it as filter.pl (or whatever name you prefer) and make it executable.



#!/usr/bin/env perl

use strict;
use warnings;

while( <> ) {
if ( /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/ ) {
print "$2:$3,$1n";
}
}


I copied your sample data until I got 1,572,864 lines and then ran it as follows:



me@ubuntu:~> time ./filter.pl < input.txt > output.txt
real 0m3,603s
user 0m3,487s
sys 0m0,100s

me@ubuntu:~> tail -3 output.txt
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186


If you prefer one-liners, do:



perl -lne 'print "$2:$3,$1" if /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/;' < input.txt > output.txt





share|improve this answer



















  • 1





    Indeed this perl solution has been fastest, took about less than a second to process 300K lines, I am preparing some other lookups like scripts, I will be looking forward to further help, thanks everyone, all were helpful, but @perlduck's solution was fastest, and as my original while loop wasn't producing results in order, so the order won't matter for me anyway

    – Ibraheem
    yesterday











  • @Ibraheem, Yes this perl solution is very good, probably with a great margin fast enough for your purpose. -- But try my tr and cut solution, which is actually faster in my computer (and I think easier to understand and modify), and wait for a solution with parallel and perl by PerlDuck, which I think can be the fastest of them all.

    – sudodus
    yesterday











  • @sudodus I tried your solution, it was really fast (took about 0.205 seconds), but the columns are not coming as I want them and it has a pipe in the middle,

    – Ibraheem
    yesterday











  • @Ibraheem, Is it important to have the format that you want (order of columns and separators between the column)? The reason why my solution is fast is that it does as little as possible, still showing what you need (but in a different order). If you prefer another separator, it is possible, space ' ' would cost no extra time, another separator would cost some extra time for a tr or tr -s command, but not very much.

    – sudodus
    yesterday






  • 1





    I finally made a oneliner with awk, which is on par with the perl oneliner (slightly faster in my computer), maybe easier to understand and edit, if you would need that in the future. The outputs of these two oneliners are exactly the same for the test case. See the end of my answer. Any of the two solutions should be good for you.

    – sudodus
    yesterday



















4














A pure sed solution:



sed -r 's/^[^|]+|[^|]+|([^|]+)|[^|]+|([^|]+)|.+( .+, ([^ ]+).+/2:3,1/' <in.dat >out.dat





share|improve this answer


























  • +1: Nice with a pure sed solution :-) But my cut and sed solution is faster ;-)

    – sudodus
    yesterday













  • Yes I know. But mine produces the result in the requested order 🤨🤨

    – xenoid
    yesterday













  • That's right, we will see how important it is to get exactly what the OP prescribes. By the way, I think you drop one character, MSRFKH00OL6 --> MSRFKH00OL in your output. I think you can fix that with a minor edit.

    – sudodus
    yesterday






  • 1





    @sudodus Yes, transcription error. Fixed :)

    – xenoid
    yesterday











  • I timed your new one-liner and it works well, actually slightly faster than before. I don't know if there was something else happening in my computer, anyway, I edited my answer to show the new result :-)

    – sudodus
    yesterday



















2














Oneliner



If the order of the items and the separators can be different from what you specify in the question, I thought the following one-liner would do it,



< input tr ' ' '|' | cut -d '|' -f 4,6,10 > output


but in a comment you wrote that you need exactly the specified format.



I added a solution with 'awk', which is approximately on par with PerlDuck's solution with perl. See the end of this answer.



< input awk '{gsub("\|"," "); print $5 ":" $9 "," $3}' > output


Test



The test was done in my computer with Lubuntu 18.04.1 LTS, 2*2 processors and 4 GiB RAM.



I made a huge infile by 'doubling 20 times' from your demo input (1572864 lines), so some margin to your 500000 lines,



Oneliner with cut and sed:



$ < infile cut -d '|' -f 3,5,6 | sed -e 's/|[A-Z].*, /|/' -e 's/ )$//' > outfile
$ wc -l infile
1572864 infile
$ wc -l outfile
1572864 outfile
$ tail outfile
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1


Timing



We might expect, that a pure sed solution would be faster, but I think that reordering of the data slows it down, so that the cut and sed solution is faster. Both solutions work without any problem in my computer.



Oneliner with cut and sed:



$ time < infile cut -d '|' -f 3,5,6 | sed -e 's/|[A-Z].*, /|/' -e 's/ )$//' > outfile

real 0m8,132s
user 0m8,633s
sys 0m0,617s


A pure sed oneliner by xenoid:



$ time sed -r 's/^[^|]+|[^|]+|([^|]+)|[^|]+|([^|]+)|.+( .+, ([^ ]+).+/2:3,1/' <infile > outfile-sed 

real 1m8,686s
user 1m8,259s
sys 0m0,344s


A perl oneliner by PerlDuck is faster than the previous oneliners:



$ time perl -lne 'print "$2:$3,$1" if /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/;' < infile > outfile.perl

real 0m5,929s
user 0m5,339s
sys 0m0,256s


Oneliner with tr and cut with a tr -s command:



I used tr to convert the spaces in the input file to pipeline characters and then cut could do it all without sed. As you can see, tr is much faster than sed. The tr -s command removes double pipes in the input, which is a good idea, particularly if there can be repeated spaces or pipes in the input file. It does not cost much.



$ time < infile tr ' ' '|' | tr -s '|' '|' | cut -d '|' -f 3,5,9 > outfile-tr-cut

real 0m1,277s
user 0m1,781s
sys 0m0,925s


Oneliner with tr and cut without the tr -s command, fastest so far:



time < infile tr ' ' '|' | cut -d '|' -f 4,6,10 > outfile-tr-cut

real 0m1,199s
user 0m1,020s
sys 0m0,618s


$ tail outfile-tr-cut
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1


Oneliner with awk, fast but not the fastest,



< input awk '{gsub("\|"," "); print $5 ":" $9 "," $3}' > output

$ time < infile awk '{gsub("\|"," "); print $5 ":" $9 "," $3}' > outfile.awk

real 0m5,091s
user 0m4,724s
sys 0m0,365s


Speed summary: the 'real' time according to time rounded to 1 decimal



1m 8.7s - sed
8.1s - cut & sed
5.9s - perl
5.1s - awk
1.2s - tr & cut


Finally, I note that the oneliners with sed, perl and awk create an output file with the prescribed format.



$ tail outfile.awk
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186





share|improve this answer





















  • 2





    Nice :-) Try perl -lne 'print "$2:$3,$1" if /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/;' < input.txt > output.txt. I also repeated the input until I got 1,572,864 lines and it runs in ~3,5 seconds on my machine.

    – PerlDuck
    yesterday








  • 1





    Note that the desired output is not | separated but uses : and ,.

    – PerlDuck
    yesterday






  • 1





    @PerlDuck, 1. Yes, I will time your perl expression :-) 2. I know (and wrote about it in the beginning of my answer), that it is not exactly what the OP wants, but similar enough to be useful, and, I think, faster than if rearranged to the exact specification.

    – sudodus
    yesterday











  • Sorry, I missed your introductory sentence about the different separators. Btw., this is an interesting approach using GNU parallel ;-)

    – PerlDuck
    yesterday











  • @PerlDuck, If you make an answer with your fast perl oneliner, I will upvote it :-)

    – sudodus
    yesterday



















2














Python



import sys,re

pattern=re.compile(r'^.+|.+|(.+)|.+|(.+)|.+, (.+) )|$')

for line in sys.stdin:
match=pattern.match(line)
if match:
print(match.group(2)+':'+match.group(3)+','+match.group(1))


(works with both Python2 and Python3)



Using a regex with non-greedy matches is 4x faster (avoids backtracking?) and puts python on par with the cut/sed method (python2 being a bit faster than python3)



import sys,re

pattern=re.compile(r'^[^|]+?|[^|]+?|([^|]+?)|[^|]+?|([^|]+?)|[^,]+?, (.+) )|$')

for line in sys.stdin:
match=pattern.match(line)
if match:
print(match.group(2)+':'+match.group(3)+','+match.group(1))





share|improve this answer


























  • This one also works fine as expected but a bit slower then the perl one,

    – Ibraheem
    14 hours ago











Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "89"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1114510%2fbash-script-hangs-after-some-processing-on-ubuntu%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























4 Answers
4






active

oldest

votes








4 Answers
4






active

oldest

votes









active

oldest

votes






active

oldest

votes









3














Perl solution



This script doesn't do anything in parallel but is quite fast regardless.
Save it as filter.pl (or whatever name you prefer) and make it executable.



#!/usr/bin/env perl

use strict;
use warnings;

while( <> ) {
if ( /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/ ) {
print "$2:$3,$1n";
}
}


I copied your sample data until I got 1,572,864 lines and then ran it as follows:



me@ubuntu:~> time ./filter.pl < input.txt > output.txt
real 0m3,603s
user 0m3,487s
sys 0m0,100s

me@ubuntu:~> tail -3 output.txt
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186


If you prefer one-liners, do:



perl -lne 'print "$2:$3,$1" if /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/;' < input.txt > output.txt





share|improve this answer



















  • 1





    Indeed this perl solution has been fastest, took about less than a second to process 300K lines, I am preparing some other lookups like scripts, I will be looking forward to further help, thanks everyone, all were helpful, but @perlduck's solution was fastest, and as my original while loop wasn't producing results in order, so the order won't matter for me anyway

    – Ibraheem
    yesterday











  • @Ibraheem, Yes this perl solution is very good, probably with a great margin fast enough for your purpose. -- But try my tr and cut solution, which is actually faster in my computer (and I think easier to understand and modify), and wait for a solution with parallel and perl by PerlDuck, which I think can be the fastest of them all.

    – sudodus
    yesterday











  • @sudodus I tried your solution, it was really fast (took about 0.205 seconds), but the columns are not coming as I want them and it has a pipe in the middle,

    – Ibraheem
    yesterday











  • @Ibraheem, Is it important to have the format that you want (order of columns and separators between the column)? The reason why my solution is fast is that it does as little as possible, still showing what you need (but in a different order). If you prefer another separator, it is possible, space ' ' would cost no extra time, another separator would cost some extra time for a tr or tr -s command, but not very much.

    – sudodus
    yesterday






  • 1





    I finally made a oneliner with awk, which is on par with the perl oneliner (slightly faster in my computer), maybe easier to understand and edit, if you would need that in the future. The outputs of these two oneliners are exactly the same for the test case. See the end of my answer. Any of the two solutions should be good for you.

    – sudodus
    yesterday
















3














Perl solution



This script doesn't do anything in parallel but is quite fast regardless.
Save it as filter.pl (or whatever name you prefer) and make it executable.



#!/usr/bin/env perl

use strict;
use warnings;

while( <> ) {
if ( /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/ ) {
print "$2:$3,$1n";
}
}


I copied your sample data until I got 1,572,864 lines and then ran it as follows:



me@ubuntu:~> time ./filter.pl < input.txt > output.txt
real 0m3,603s
user 0m3,487s
sys 0m0,100s

me@ubuntu:~> tail -3 output.txt
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186


If you prefer one-liners, do:



perl -lne 'print "$2:$3,$1" if /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/;' < input.txt > output.txt





share|improve this answer



















  • 1





    Indeed this perl solution has been fastest, took about less than a second to process 300K lines, I am preparing some other lookups like scripts, I will be looking forward to further help, thanks everyone, all were helpful, but @perlduck's solution was fastest, and as my original while loop wasn't producing results in order, so the order won't matter for me anyway

    – Ibraheem
    yesterday











  • @Ibraheem, Yes this perl solution is very good, probably with a great margin fast enough for your purpose. -- But try my tr and cut solution, which is actually faster in my computer (and I think easier to understand and modify), and wait for a solution with parallel and perl by PerlDuck, which I think can be the fastest of them all.

    – sudodus
    yesterday











  • @sudodus I tried your solution, it was really fast (took about 0.205 seconds), but the columns are not coming as I want them and it has a pipe in the middle,

    – Ibraheem
    yesterday











  • @Ibraheem, Is it important to have the format that you want (order of columns and separators between the column)? The reason why my solution is fast is that it does as little as possible, still showing what you need (but in a different order). If you prefer another separator, it is possible, space ' ' would cost no extra time, another separator would cost some extra time for a tr or tr -s command, but not very much.

    – sudodus
    yesterday






  • 1





    I finally made a oneliner with awk, which is on par with the perl oneliner (slightly faster in my computer), maybe easier to understand and edit, if you would need that in the future. The outputs of these two oneliners are exactly the same for the test case. See the end of my answer. Any of the two solutions should be good for you.

    – sudodus
    yesterday














3












3








3







Perl solution



This script doesn't do anything in parallel but is quite fast regardless.
Save it as filter.pl (or whatever name you prefer) and make it executable.



#!/usr/bin/env perl

use strict;
use warnings;

while( <> ) {
if ( /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/ ) {
print "$2:$3,$1n";
}
}


I copied your sample data until I got 1,572,864 lines and then ran it as follows:



me@ubuntu:~> time ./filter.pl < input.txt > output.txt
real 0m3,603s
user 0m3,487s
sys 0m0,100s

me@ubuntu:~> tail -3 output.txt
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186


If you prefer one-liners, do:



perl -lne 'print "$2:$3,$1" if /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/;' < input.txt > output.txt





share|improve this answer













Perl solution



This script doesn't do anything in parallel but is quite fast regardless.
Save it as filter.pl (or whatever name you prefer) and make it executable.



#!/usr/bin/env perl

use strict;
use warnings;

while( <> ) {
if ( /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/ ) {
print "$2:$3,$1n";
}
}


I copied your sample data until I got 1,572,864 lines and then ran it as follows:



me@ubuntu:~> time ./filter.pl < input.txt > output.txt
real 0m3,603s
user 0m3,487s
sys 0m0,100s

me@ubuntu:~> tail -3 output.txt
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186


If you prefer one-liners, do:



perl -lne 'print "$2:$3,$1" if /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/;' < input.txt > output.txt






share|improve this answer












share|improve this answer



share|improve this answer










answered yesterday









PerlDuckPerlDuck

6,18211334




6,18211334








  • 1





    Indeed this perl solution has been fastest, took about less than a second to process 300K lines, I am preparing some other lookups like scripts, I will be looking forward to further help, thanks everyone, all were helpful, but @perlduck's solution was fastest, and as my original while loop wasn't producing results in order, so the order won't matter for me anyway

    – Ibraheem
    yesterday











  • @Ibraheem, Yes this perl solution is very good, probably with a great margin fast enough for your purpose. -- But try my tr and cut solution, which is actually faster in my computer (and I think easier to understand and modify), and wait for a solution with parallel and perl by PerlDuck, which I think can be the fastest of them all.

    – sudodus
    yesterday











  • @sudodus I tried your solution, it was really fast (took about 0.205 seconds), but the columns are not coming as I want them and it has a pipe in the middle,

    – Ibraheem
    yesterday











  • @Ibraheem, Is it important to have the format that you want (order of columns and separators between the column)? The reason why my solution is fast is that it does as little as possible, still showing what you need (but in a different order). If you prefer another separator, it is possible, space ' ' would cost no extra time, another separator would cost some extra time for a tr or tr -s command, but not very much.

    – sudodus
    yesterday






  • 1





    I finally made a oneliner with awk, which is on par with the perl oneliner (slightly faster in my computer), maybe easier to understand and edit, if you would need that in the future. The outputs of these two oneliners are exactly the same for the test case. See the end of my answer. Any of the two solutions should be good for you.

    – sudodus
    yesterday














  • 1





    Indeed this perl solution has been fastest, took about less than a second to process 300K lines, I am preparing some other lookups like scripts, I will be looking forward to further help, thanks everyone, all were helpful, but @perlduck's solution was fastest, and as my original while loop wasn't producing results in order, so the order won't matter for me anyway

    – Ibraheem
    yesterday











  • @Ibraheem, Yes this perl solution is very good, probably with a great margin fast enough for your purpose. -- But try my tr and cut solution, which is actually faster in my computer (and I think easier to understand and modify), and wait for a solution with parallel and perl by PerlDuck, which I think can be the fastest of them all.

    – sudodus
    yesterday











  • @sudodus I tried your solution, it was really fast (took about 0.205 seconds), but the columns are not coming as I want them and it has a pipe in the middle,

    – Ibraheem
    yesterday











  • @Ibraheem, Is it important to have the format that you want (order of columns and separators between the column)? The reason why my solution is fast is that it does as little as possible, still showing what you need (but in a different order). If you prefer another separator, it is possible, space ' ' would cost no extra time, another separator would cost some extra time for a tr or tr -s command, but not very much.

    – sudodus
    yesterday






  • 1





    I finally made a oneliner with awk, which is on par with the perl oneliner (slightly faster in my computer), maybe easier to understand and edit, if you would need that in the future. The outputs of these two oneliners are exactly the same for the test case. See the end of my answer. Any of the two solutions should be good for you.

    – sudodus
    yesterday








1




1





Indeed this perl solution has been fastest, took about less than a second to process 300K lines, I am preparing some other lookups like scripts, I will be looking forward to further help, thanks everyone, all were helpful, but @perlduck's solution was fastest, and as my original while loop wasn't producing results in order, so the order won't matter for me anyway

– Ibraheem
yesterday





Indeed this perl solution has been fastest, took about less than a second to process 300K lines, I am preparing some other lookups like scripts, I will be looking forward to further help, thanks everyone, all were helpful, but @perlduck's solution was fastest, and as my original while loop wasn't producing results in order, so the order won't matter for me anyway

– Ibraheem
yesterday













@Ibraheem, Yes this perl solution is very good, probably with a great margin fast enough for your purpose. -- But try my tr and cut solution, which is actually faster in my computer (and I think easier to understand and modify), and wait for a solution with parallel and perl by PerlDuck, which I think can be the fastest of them all.

– sudodus
yesterday





@Ibraheem, Yes this perl solution is very good, probably with a great margin fast enough for your purpose. -- But try my tr and cut solution, which is actually faster in my computer (and I think easier to understand and modify), and wait for a solution with parallel and perl by PerlDuck, which I think can be the fastest of them all.

– sudodus
yesterday













@sudodus I tried your solution, it was really fast (took about 0.205 seconds), but the columns are not coming as I want them and it has a pipe in the middle,

– Ibraheem
yesterday





@sudodus I tried your solution, it was really fast (took about 0.205 seconds), but the columns are not coming as I want them and it has a pipe in the middle,

– Ibraheem
yesterday













@Ibraheem, Is it important to have the format that you want (order of columns and separators between the column)? The reason why my solution is fast is that it does as little as possible, still showing what you need (but in a different order). If you prefer another separator, it is possible, space ' ' would cost no extra time, another separator would cost some extra time for a tr or tr -s command, but not very much.

– sudodus
yesterday





@Ibraheem, Is it important to have the format that you want (order of columns and separators between the column)? The reason why my solution is fast is that it does as little as possible, still showing what you need (but in a different order). If you prefer another separator, it is possible, space ' ' would cost no extra time, another separator would cost some extra time for a tr or tr -s command, but not very much.

– sudodus
yesterday




1




1





I finally made a oneliner with awk, which is on par with the perl oneliner (slightly faster in my computer), maybe easier to understand and edit, if you would need that in the future. The outputs of these two oneliners are exactly the same for the test case. See the end of my answer. Any of the two solutions should be good for you.

– sudodus
yesterday





I finally made a oneliner with awk, which is on par with the perl oneliner (slightly faster in my computer), maybe easier to understand and edit, if you would need that in the future. The outputs of these two oneliners are exactly the same for the test case. See the end of my answer. Any of the two solutions should be good for you.

– sudodus
yesterday













4














A pure sed solution:



sed -r 's/^[^|]+|[^|]+|([^|]+)|[^|]+|([^|]+)|.+( .+, ([^ ]+).+/2:3,1/' <in.dat >out.dat





share|improve this answer


























  • +1: Nice with a pure sed solution :-) But my cut and sed solution is faster ;-)

    – sudodus
    yesterday













  • Yes I know. But mine produces the result in the requested order 🤨🤨

    – xenoid
    yesterday













  • That's right, we will see how important it is to get exactly what the OP prescribes. By the way, I think you drop one character, MSRFKH00OL6 --> MSRFKH00OL in your output. I think you can fix that with a minor edit.

    – sudodus
    yesterday






  • 1





    @sudodus Yes, transcription error. Fixed :)

    – xenoid
    yesterday











  • I timed your new one-liner and it works well, actually slightly faster than before. I don't know if there was something else happening in my computer, anyway, I edited my answer to show the new result :-)

    – sudodus
    yesterday
















4














A pure sed solution:



sed -r 's/^[^|]+|[^|]+|([^|]+)|[^|]+|([^|]+)|.+( .+, ([^ ]+).+/2:3,1/' <in.dat >out.dat





share|improve this answer


























  • +1: Nice with a pure sed solution :-) But my cut and sed solution is faster ;-)

    – sudodus
    yesterday













  • Yes I know. But mine produces the result in the requested order 🤨🤨

    – xenoid
    yesterday













  • That's right, we will see how important it is to get exactly what the OP prescribes. By the way, I think you drop one character, MSRFKH00OL6 --> MSRFKH00OL in your output. I think you can fix that with a minor edit.

    – sudodus
    yesterday






  • 1





    @sudodus Yes, transcription error. Fixed :)

    – xenoid
    yesterday











  • I timed your new one-liner and it works well, actually slightly faster than before. I don't know if there was something else happening in my computer, anyway, I edited my answer to show the new result :-)

    – sudodus
    yesterday














4












4








4







A pure sed solution:



sed -r 's/^[^|]+|[^|]+|([^|]+)|[^|]+|([^|]+)|.+( .+, ([^ ]+).+/2:3,1/' <in.dat >out.dat





share|improve this answer















A pure sed solution:



sed -r 's/^[^|]+|[^|]+|([^|]+)|[^|]+|([^|]+)|.+( .+, ([^ ]+).+/2:3,1/' <in.dat >out.dat






share|improve this answer














share|improve this answer



share|improve this answer








edited yesterday

























answered yesterday









xenoidxenoid

1,5781416




1,5781416













  • +1: Nice with a pure sed solution :-) But my cut and sed solution is faster ;-)

    – sudodus
    yesterday













  • Yes I know. But mine produces the result in the requested order 🤨🤨

    – xenoid
    yesterday













  • That's right, we will see how important it is to get exactly what the OP prescribes. By the way, I think you drop one character, MSRFKH00OL6 --> MSRFKH00OL in your output. I think you can fix that with a minor edit.

    – sudodus
    yesterday






  • 1





    @sudodus Yes, transcription error. Fixed :)

    – xenoid
    yesterday











  • I timed your new one-liner and it works well, actually slightly faster than before. I don't know if there was something else happening in my computer, anyway, I edited my answer to show the new result :-)

    – sudodus
    yesterday



















  • +1: Nice with a pure sed solution :-) But my cut and sed solution is faster ;-)

    – sudodus
    yesterday













  • Yes I know. But mine produces the result in the requested order 🤨🤨

    – xenoid
    yesterday













  • That's right, we will see how important it is to get exactly what the OP prescribes. By the way, I think you drop one character, MSRFKH00OL6 --> MSRFKH00OL in your output. I think you can fix that with a minor edit.

    – sudodus
    yesterday






  • 1





    @sudodus Yes, transcription error. Fixed :)

    – xenoid
    yesterday











  • I timed your new one-liner and it works well, actually slightly faster than before. I don't know if there was something else happening in my computer, anyway, I edited my answer to show the new result :-)

    – sudodus
    yesterday

















+1: Nice with a pure sed solution :-) But my cut and sed solution is faster ;-)

– sudodus
yesterday







+1: Nice with a pure sed solution :-) But my cut and sed solution is faster ;-)

– sudodus
yesterday















Yes I know. But mine produces the result in the requested order 🤨🤨

– xenoid
yesterday







Yes I know. But mine produces the result in the requested order 🤨🤨

– xenoid
yesterday















That's right, we will see how important it is to get exactly what the OP prescribes. By the way, I think you drop one character, MSRFKH00OL6 --> MSRFKH00OL in your output. I think you can fix that with a minor edit.

– sudodus
yesterday





That's right, we will see how important it is to get exactly what the OP prescribes. By the way, I think you drop one character, MSRFKH00OL6 --> MSRFKH00OL in your output. I think you can fix that with a minor edit.

– sudodus
yesterday




1




1





@sudodus Yes, transcription error. Fixed :)

– xenoid
yesterday





@sudodus Yes, transcription error. Fixed :)

– xenoid
yesterday













I timed your new one-liner and it works well, actually slightly faster than before. I don't know if there was something else happening in my computer, anyway, I edited my answer to show the new result :-)

– sudodus
yesterday





I timed your new one-liner and it works well, actually slightly faster than before. I don't know if there was something else happening in my computer, anyway, I edited my answer to show the new result :-)

– sudodus
yesterday











2














Oneliner



If the order of the items and the separators can be different from what you specify in the question, I thought the following one-liner would do it,



< input tr ' ' '|' | cut -d '|' -f 4,6,10 > output


but in a comment you wrote that you need exactly the specified format.



I added a solution with 'awk', which is approximately on par with PerlDuck's solution with perl. See the end of this answer.



< input awk '{gsub("\|"," "); print $5 ":" $9 "," $3}' > output


Test



The test was done in my computer with Lubuntu 18.04.1 LTS, 2*2 processors and 4 GiB RAM.



I made a huge infile by 'doubling 20 times' from your demo input (1572864 lines), so some margin to your 500000 lines,



Oneliner with cut and sed:



$ < infile cut -d '|' -f 3,5,6 | sed -e 's/|[A-Z].*, /|/' -e 's/ )$//' > outfile
$ wc -l infile
1572864 infile
$ wc -l outfile
1572864 outfile
$ tail outfile
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1


Timing



We might expect, that a pure sed solution would be faster, but I think that reordering of the data slows it down, so that the cut and sed solution is faster. Both solutions work without any problem in my computer.



Oneliner with cut and sed:



$ time < infile cut -d '|' -f 3,5,6 | sed -e 's/|[A-Z].*, /|/' -e 's/ )$//' > outfile

real 0m8,132s
user 0m8,633s
sys 0m0,617s


A pure sed oneliner by xenoid:



$ time sed -r 's/^[^|]+|[^|]+|([^|]+)|[^|]+|([^|]+)|.+( .+, ([^ ]+).+/2:3,1/' <infile > outfile-sed 

real 1m8,686s
user 1m8,259s
sys 0m0,344s


A perl oneliner by PerlDuck is faster than the previous oneliners:



$ time perl -lne 'print "$2:$3,$1" if /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/;' < infile > outfile.perl

real 0m5,929s
user 0m5,339s
sys 0m0,256s


Oneliner with tr and cut with a tr -s command:



I used tr to convert the spaces in the input file to pipeline characters and then cut could do it all without sed. As you can see, tr is much faster than sed. The tr -s command removes double pipes in the input, which is a good idea, particularly if there can be repeated spaces or pipes in the input file. It does not cost much.



$ time < infile tr ' ' '|' | tr -s '|' '|' | cut -d '|' -f 3,5,9 > outfile-tr-cut

real 0m1,277s
user 0m1,781s
sys 0m0,925s


Oneliner with tr and cut without the tr -s command, fastest so far:



time < infile tr ' ' '|' | cut -d '|' -f 4,6,10 > outfile-tr-cut

real 0m1,199s
user 0m1,020s
sys 0m0,618s


$ tail outfile-tr-cut
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1


Oneliner with awk, fast but not the fastest,



< input awk '{gsub("\|"," "); print $5 ":" $9 "," $3}' > output

$ time < infile awk '{gsub("\|"," "); print $5 ":" $9 "," $3}' > outfile.awk

real 0m5,091s
user 0m4,724s
sys 0m0,365s


Speed summary: the 'real' time according to time rounded to 1 decimal



1m 8.7s - sed
8.1s - cut & sed
5.9s - perl
5.1s - awk
1.2s - tr & cut


Finally, I note that the oneliners with sed, perl and awk create an output file with the prescribed format.



$ tail outfile.awk
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186





share|improve this answer





















  • 2





    Nice :-) Try perl -lne 'print "$2:$3,$1" if /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/;' < input.txt > output.txt. I also repeated the input until I got 1,572,864 lines and it runs in ~3,5 seconds on my machine.

    – PerlDuck
    yesterday








  • 1





    Note that the desired output is not | separated but uses : and ,.

    – PerlDuck
    yesterday






  • 1





    @PerlDuck, 1. Yes, I will time your perl expression :-) 2. I know (and wrote about it in the beginning of my answer), that it is not exactly what the OP wants, but similar enough to be useful, and, I think, faster than if rearranged to the exact specification.

    – sudodus
    yesterday











  • Sorry, I missed your introductory sentence about the different separators. Btw., this is an interesting approach using GNU parallel ;-)

    – PerlDuck
    yesterday











  • @PerlDuck, If you make an answer with your fast perl oneliner, I will upvote it :-)

    – sudodus
    yesterday
















2














Oneliner



If the order of the items and the separators can be different from what you specify in the question, I thought the following one-liner would do it,



< input tr ' ' '|' | cut -d '|' -f 4,6,10 > output


but in a comment you wrote that you need exactly the specified format.



I added a solution with 'awk', which is approximately on par with PerlDuck's solution with perl. See the end of this answer.



< input awk '{gsub("\|"," "); print $5 ":" $9 "," $3}' > output


Test



The test was done in my computer with Lubuntu 18.04.1 LTS, 2*2 processors and 4 GiB RAM.



I made a huge infile by 'doubling 20 times' from your demo input (1572864 lines), so some margin to your 500000 lines,



Oneliner with cut and sed:



$ < infile cut -d '|' -f 3,5,6 | sed -e 's/|[A-Z].*, /|/' -e 's/ )$//' > outfile
$ wc -l infile
1572864 infile
$ wc -l outfile
1572864 outfile
$ tail outfile
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1


Timing



We might expect, that a pure sed solution would be faster, but I think that reordering of the data slows it down, so that the cut and sed solution is faster. Both solutions work without any problem in my computer.



Oneliner with cut and sed:



$ time < infile cut -d '|' -f 3,5,6 | sed -e 's/|[A-Z].*, /|/' -e 's/ )$//' > outfile

real 0m8,132s
user 0m8,633s
sys 0m0,617s


A pure sed oneliner by xenoid:



$ time sed -r 's/^[^|]+|[^|]+|([^|]+)|[^|]+|([^|]+)|.+( .+, ([^ ]+).+/2:3,1/' <infile > outfile-sed 

real 1m8,686s
user 1m8,259s
sys 0m0,344s


A perl oneliner by PerlDuck is faster than the previous oneliners:



$ time perl -lne 'print "$2:$3,$1" if /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/;' < infile > outfile.perl

real 0m5,929s
user 0m5,339s
sys 0m0,256s


Oneliner with tr and cut with a tr -s command:



I used tr to convert the spaces in the input file to pipeline characters and then cut could do it all without sed. As you can see, tr is much faster than sed. The tr -s command removes double pipes in the input, which is a good idea, particularly if there can be repeated spaces or pipes in the input file. It does not cost much.



$ time < infile tr ' ' '|' | tr -s '|' '|' | cut -d '|' -f 3,5,9 > outfile-tr-cut

real 0m1,277s
user 0m1,781s
sys 0m0,925s


Oneliner with tr and cut without the tr -s command, fastest so far:



time < infile tr ' ' '|' | cut -d '|' -f 4,6,10 > outfile-tr-cut

real 0m1,199s
user 0m1,020s
sys 0m0,618s


$ tail outfile-tr-cut
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1


Oneliner with awk, fast but not the fastest,



< input awk '{gsub("\|"," "); print $5 ":" $9 "," $3}' > output

$ time < infile awk '{gsub("\|"," "); print $5 ":" $9 "," $3}' > outfile.awk

real 0m5,091s
user 0m4,724s
sys 0m0,365s


Speed summary: the 'real' time according to time rounded to 1 decimal



1m 8.7s - sed
8.1s - cut & sed
5.9s - perl
5.1s - awk
1.2s - tr & cut


Finally, I note that the oneliners with sed, perl and awk create an output file with the prescribed format.



$ tail outfile.awk
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186





share|improve this answer





















  • 2





    Nice :-) Try perl -lne 'print "$2:$3,$1" if /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/;' < input.txt > output.txt. I also repeated the input until I got 1,572,864 lines and it runs in ~3,5 seconds on my machine.

    – PerlDuck
    yesterday








  • 1





    Note that the desired output is not | separated but uses : and ,.

    – PerlDuck
    yesterday






  • 1





    @PerlDuck, 1. Yes, I will time your perl expression :-) 2. I know (and wrote about it in the beginning of my answer), that it is not exactly what the OP wants, but similar enough to be useful, and, I think, faster than if rearranged to the exact specification.

    – sudodus
    yesterday











  • Sorry, I missed your introductory sentence about the different separators. Btw., this is an interesting approach using GNU parallel ;-)

    – PerlDuck
    yesterday











  • @PerlDuck, If you make an answer with your fast perl oneliner, I will upvote it :-)

    – sudodus
    yesterday














2












2








2







Oneliner



If the order of the items and the separators can be different from what you specify in the question, I thought the following one-liner would do it,



< input tr ' ' '|' | cut -d '|' -f 4,6,10 > output


but in a comment you wrote that you need exactly the specified format.



I added a solution with 'awk', which is approximately on par with PerlDuck's solution with perl. See the end of this answer.



< input awk '{gsub("\|"," "); print $5 ":" $9 "," $3}' > output


Test



The test was done in my computer with Lubuntu 18.04.1 LTS, 2*2 processors and 4 GiB RAM.



I made a huge infile by 'doubling 20 times' from your demo input (1572864 lines), so some margin to your 500000 lines,



Oneliner with cut and sed:



$ < infile cut -d '|' -f 3,5,6 | sed -e 's/|[A-Z].*, /|/' -e 's/ )$//' > outfile
$ wc -l infile
1572864 infile
$ wc -l outfile
1572864 outfile
$ tail outfile
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1


Timing



We might expect, that a pure sed solution would be faster, but I think that reordering of the data slows it down, so that the cut and sed solution is faster. Both solutions work without any problem in my computer.



Oneliner with cut and sed:



$ time < infile cut -d '|' -f 3,5,6 | sed -e 's/|[A-Z].*, /|/' -e 's/ )$//' > outfile

real 0m8,132s
user 0m8,633s
sys 0m0,617s


A pure sed oneliner by xenoid:



$ time sed -r 's/^[^|]+|[^|]+|([^|]+)|[^|]+|([^|]+)|.+( .+, ([^ ]+).+/2:3,1/' <infile > outfile-sed 

real 1m8,686s
user 1m8,259s
sys 0m0,344s


A perl oneliner by PerlDuck is faster than the previous oneliners:



$ time perl -lne 'print "$2:$3,$1" if /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/;' < infile > outfile.perl

real 0m5,929s
user 0m5,339s
sys 0m0,256s


Oneliner with tr and cut with a tr -s command:



I used tr to convert the spaces in the input file to pipeline characters and then cut could do it all without sed. As you can see, tr is much faster than sed. The tr -s command removes double pipes in the input, which is a good idea, particularly if there can be repeated spaces or pipes in the input file. It does not cost much.



$ time < infile tr ' ' '|' | tr -s '|' '|' | cut -d '|' -f 3,5,9 > outfile-tr-cut

real 0m1,277s
user 0m1,781s
sys 0m0,925s


Oneliner with tr and cut without the tr -s command, fastest so far:



time < infile tr ' ' '|' | cut -d '|' -f 4,6,10 > outfile-tr-cut

real 0m1,199s
user 0m1,020s
sys 0m0,618s


$ tail outfile-tr-cut
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1


Oneliner with awk, fast but not the fastest,



< input awk '{gsub("\|"," "); print $5 ":" $9 "," $3}' > output

$ time < infile awk '{gsub("\|"," "); print $5 ":" $9 "," $3}' > outfile.awk

real 0m5,091s
user 0m4,724s
sys 0m0,365s


Speed summary: the 'real' time according to time rounded to 1 decimal



1m 8.7s - sed
8.1s - cut & sed
5.9s - perl
5.1s - awk
1.2s - tr & cut


Finally, I note that the oneliners with sed, perl and awk create an output file with the prescribed format.



$ tail outfile.awk
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186





share|improve this answer















Oneliner



If the order of the items and the separators can be different from what you specify in the question, I thought the following one-liner would do it,



< input tr ' ' '|' | cut -d '|' -f 4,6,10 > output


but in a comment you wrote that you need exactly the specified format.



I added a solution with 'awk', which is approximately on par with PerlDuck's solution with perl. See the end of this answer.



< input awk '{gsub("\|"," "); print $5 ":" $9 "," $3}' > output


Test



The test was done in my computer with Lubuntu 18.04.1 LTS, 2*2 processors and 4 GiB RAM.



I made a huge infile by 'doubling 20 times' from your demo input (1572864 lines), so some margin to your 500000 lines,



Oneliner with cut and sed:



$ < infile cut -d '|' -f 3,5,6 | sed -e 's/|[A-Z].*, /|/' -e 's/ )$//' > outfile
$ wc -l infile
1572864 infile
$ wc -l outfile
1572864 outfile
$ tail outfile
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1


Timing



We might expect, that a pure sed solution would be faster, but I think that reordering of the data slows it down, so that the cut and sed solution is faster. Both solutions work without any problem in my computer.



Oneliner with cut and sed:



$ time < infile cut -d '|' -f 3,5,6 | sed -e 's/|[A-Z].*, /|/' -e 's/ )$//' > outfile

real 0m8,132s
user 0m8,633s
sys 0m0,617s


A pure sed oneliner by xenoid:



$ time sed -r 's/^[^|]+|[^|]+|([^|]+)|[^|]+|([^|]+)|.+( .+, ([^ ]+).+/2:3,1/' <infile > outfile-sed 

real 1m8,686s
user 1m8,259s
sys 0m0,344s


A perl oneliner by PerlDuck is faster than the previous oneliners:



$ time perl -lne 'print "$2:$3,$1" if /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/;' < infile > outfile.perl

real 0m5,929s
user 0m5,339s
sys 0m0,256s


Oneliner with tr and cut with a tr -s command:



I used tr to convert the spaces in the input file to pipeline characters and then cut could do it all without sed. As you can see, tr is much faster than sed. The tr -s command removes double pipes in the input, which is a good idea, particularly if there can be repeated spaces or pipes in the input file. It does not cost much.



$ time < infile tr ' ' '|' | tr -s '|' '|' | cut -d '|' -f 3,5,9 > outfile-tr-cut

real 0m1,277s
user 0m1,781s
sys 0m0,925s


Oneliner with tr and cut without the tr -s command, fastest so far:



time < infile tr ' ' '|' | cut -d '|' -f 4,6,10 > outfile-tr-cut

real 0m1,199s
user 0m1,020s
sys 0m0,618s


$ tail outfile-tr-cut
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1
12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1
12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1


Oneliner with awk, fast but not the fastest,



< input awk '{gsub("\|"," "); print $5 ":" $9 "," $3}' > output

$ time < infile awk '{gsub("\|"," "); print $5 ":" $9 "," $3}' > outfile.awk

real 0m5,091s
user 0m4,724s
sys 0m0,365s


Speed summary: the 'real' time according to time rounded to 1 decimal



1m 8.7s - sed
8.1s - cut & sed
5.9s - perl
5.1s - awk
1.2s - tr & cut


Finally, I note that the oneliners with sed, perl and awk create an output file with the prescribed format.



$ tail outfile.awk
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330
202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186






share|improve this answer














share|improve this answer



share|improve this answer








edited yesterday

























answered yesterday









sudodussudodus

23.9k32874




23.9k32874








  • 2





    Nice :-) Try perl -lne 'print "$2:$3,$1" if /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/;' < input.txt > output.txt. I also repeated the input until I got 1,572,864 lines and it runs in ~3,5 seconds on my machine.

    – PerlDuck
    yesterday








  • 1





    Note that the desired output is not | separated but uses : and ,.

    – PerlDuck
    yesterday






  • 1





    @PerlDuck, 1. Yes, I will time your perl expression :-) 2. I know (and wrote about it in the beginning of my answer), that it is not exactly what the OP wants, but similar enough to be useful, and, I think, faster than if rearranged to the exact specification.

    – sudodus
    yesterday











  • Sorry, I missed your introductory sentence about the different separators. Btw., this is an interesting approach using GNU parallel ;-)

    – PerlDuck
    yesterday











  • @PerlDuck, If you make an answer with your fast perl oneliner, I will upvote it :-)

    – sudodus
    yesterday














  • 2





    Nice :-) Try perl -lne 'print "$2:$3,$1" if /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/;' < input.txt > output.txt. I also repeated the input until I got 1,572,864 lines and it runs in ~3,5 seconds on my machine.

    – PerlDuck
    yesterday








  • 1





    Note that the desired output is not | separated but uses : and ,.

    – PerlDuck
    yesterday






  • 1





    @PerlDuck, 1. Yes, I will time your perl expression :-) 2. I know (and wrote about it in the beginning of my answer), that it is not exactly what the OP wants, but similar enough to be useful, and, I think, faster than if rearranged to the exact specification.

    – sudodus
    yesterday











  • Sorry, I missed your introductory sentence about the different separators. Btw., this is an interesting approach using GNU parallel ;-)

    – PerlDuck
    yesterday











  • @PerlDuck, If you make an answer with your fast perl oneliner, I will upvote it :-)

    – sudodus
    yesterday








2




2





Nice :-) Try perl -lne 'print "$2:$3,$1" if /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/;' < input.txt > output.txt. I also repeated the input until I got 1,572,864 lines and it runs in ~3,5 seconds on my machine.

– PerlDuck
yesterday







Nice :-) Try perl -lne 'print "$2:$3,$1" if /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/;' < input.txt > output.txt. I also repeated the input until I got 1,572,864 lines and it runs in ~3,5 seconds on my machine.

– PerlDuck
yesterday






1




1





Note that the desired output is not | separated but uses : and ,.

– PerlDuck
yesterday





Note that the desired output is not | separated but uses : and ,.

– PerlDuck
yesterday




1




1





@PerlDuck, 1. Yes, I will time your perl expression :-) 2. I know (and wrote about it in the beginning of my answer), that it is not exactly what the OP wants, but similar enough to be useful, and, I think, faster than if rearranged to the exact specification.

– sudodus
yesterday





@PerlDuck, 1. Yes, I will time your perl expression :-) 2. I know (and wrote about it in the beginning of my answer), that it is not exactly what the OP wants, but similar enough to be useful, and, I think, faster than if rearranged to the exact specification.

– sudodus
yesterday













Sorry, I missed your introductory sentence about the different separators. Btw., this is an interesting approach using GNU parallel ;-)

– PerlDuck
yesterday





Sorry, I missed your introductory sentence about the different separators. Btw., this is an interesting approach using GNU parallel ;-)

– PerlDuck
yesterday













@PerlDuck, If you make an answer with your fast perl oneliner, I will upvote it :-)

– sudodus
yesterday





@PerlDuck, If you make an answer with your fast perl oneliner, I will upvote it :-)

– sudodus
yesterday











2














Python



import sys,re

pattern=re.compile(r'^.+|.+|(.+)|.+|(.+)|.+, (.+) )|$')

for line in sys.stdin:
match=pattern.match(line)
if match:
print(match.group(2)+':'+match.group(3)+','+match.group(1))


(works with both Python2 and Python3)



Using a regex with non-greedy matches is 4x faster (avoids backtracking?) and puts python on par with the cut/sed method (python2 being a bit faster than python3)



import sys,re

pattern=re.compile(r'^[^|]+?|[^|]+?|([^|]+?)|[^|]+?|([^|]+?)|[^,]+?, (.+) )|$')

for line in sys.stdin:
match=pattern.match(line)
if match:
print(match.group(2)+':'+match.group(3)+','+match.group(1))





share|improve this answer


























  • This one also works fine as expected but a bit slower then the perl one,

    – Ibraheem
    14 hours ago
















2














Python



import sys,re

pattern=re.compile(r'^.+|.+|(.+)|.+|(.+)|.+, (.+) )|$')

for line in sys.stdin:
match=pattern.match(line)
if match:
print(match.group(2)+':'+match.group(3)+','+match.group(1))


(works with both Python2 and Python3)



Using a regex with non-greedy matches is 4x faster (avoids backtracking?) and puts python on par with the cut/sed method (python2 being a bit faster than python3)



import sys,re

pattern=re.compile(r'^[^|]+?|[^|]+?|([^|]+?)|[^|]+?|([^|]+?)|[^,]+?, (.+) )|$')

for line in sys.stdin:
match=pattern.match(line)
if match:
print(match.group(2)+':'+match.group(3)+','+match.group(1))





share|improve this answer


























  • This one also works fine as expected but a bit slower then the perl one,

    – Ibraheem
    14 hours ago














2












2








2







Python



import sys,re

pattern=re.compile(r'^.+|.+|(.+)|.+|(.+)|.+, (.+) )|$')

for line in sys.stdin:
match=pattern.match(line)
if match:
print(match.group(2)+':'+match.group(3)+','+match.group(1))


(works with both Python2 and Python3)



Using a regex with non-greedy matches is 4x faster (avoids backtracking?) and puts python on par with the cut/sed method (python2 being a bit faster than python3)



import sys,re

pattern=re.compile(r'^[^|]+?|[^|]+?|([^|]+?)|[^|]+?|([^|]+?)|[^,]+?, (.+) )|$')

for line in sys.stdin:
match=pattern.match(line)
if match:
print(match.group(2)+':'+match.group(3)+','+match.group(1))





share|improve this answer















Python



import sys,re

pattern=re.compile(r'^.+|.+|(.+)|.+|(.+)|.+, (.+) )|$')

for line in sys.stdin:
match=pattern.match(line)
if match:
print(match.group(2)+':'+match.group(3)+','+match.group(1))


(works with both Python2 and Python3)



Using a regex with non-greedy matches is 4x faster (avoids backtracking?) and puts python on par with the cut/sed method (python2 being a bit faster than python3)



import sys,re

pattern=re.compile(r'^[^|]+?|[^|]+?|([^|]+?)|[^|]+?|([^|]+?)|[^,]+?, (.+) )|$')

for line in sys.stdin:
match=pattern.match(line)
if match:
print(match.group(2)+':'+match.group(3)+','+match.group(1))






share|improve this answer














share|improve this answer



share|improve this answer








edited yesterday

























answered yesterday









xenoidxenoid

1,5781416




1,5781416













  • This one also works fine as expected but a bit slower then the perl one,

    – Ibraheem
    14 hours ago



















  • This one also works fine as expected but a bit slower then the perl one,

    – Ibraheem
    14 hours ago

















This one also works fine as expected but a bit slower then the perl one,

– Ibraheem
14 hours ago





This one also works fine as expected but a bit slower then the perl one,

– Ibraheem
14 hours ago


















draft saved

draft discarded




















































Thanks for contributing an answer to Ask Ubuntu!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1114510%2fbash-script-hangs-after-some-processing-on-ubuntu%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

How did Captain America manage to do this?

迪纳利

南乌拉尔铁路局