Get first value that matches condition (loop too slow)

I've got a lot of matrices similar to this but with thousands of rows :

r <- 10

c <- 2

set.seed(333)



m1 <- matrix(runif(r*c)+1, r, c)



> m1

          [,1]     [,2]

 [1,] 1.467001 1.393902

 [2,] 1.084598 1.474218

 [3,] 1.973485 1.891222

 [4,] 1.571306 1.665011

 [5,] 1.020119 1.736832

 [6,] 1.723557 1.911469

 [7,] 1.609394 1.637850

 [8,] 1.306719 1.864651

 [9,] 1.063510 1.287575

[10,] 1.305353 1.129959

I've got a loop that tells me, for each value of the first column, what is the index of the first value in the second column that is 10% higher like so :

result <- 1:nrow(m1)



for (i in 1:nrow(m1)){

    result[i] <- which(m1[,2]>(1.1*m1[,1][i]))[1]

}

> result

 [1]  3  1 NA  3  1  6  3  2  1  2

I've got so much matrices that it's taking hours, and after profiling my code, the biggest time consuming task by far is this loop. What is, according to you, the fastest way to do it ?

For example, with r = 30000 :

start_time <- Sys.time()



for (i in 1:nrow(m1)){

    result[i] <- which(m1[,2]>(1.1*m1[,1][i]))[1]

}



end_time <- Sys.time()

a <- end_time - start_time



> a

Time difference of 11.25815 secs

Thanks for you help !

edited 8 hours ago

asked yesterday

Boog

785

2

Is this for repeated application? If so, maybe you study the relevant algorithms and create a function with rcpp or similar.

– sindri_baldur
yesterday

1

Also, as general advice, don't forget to consider the big picture. With profiling, it's easy to find that your code spends 99% of its time doing X and think "I need to find a faster way to do X," when often the question you should really be asking is "How can I avoid doing X so much?" (Of course, without seeing the rest of your code or knowing what its purpose is, we can't really help you with that here. But it's worth keeping in mind.)

– Ilmari Karonen
yesterday

add a comment |

I've got a lot of matrices similar to this but with thousands of rows :

r <- 10

c <- 2

set.seed(333)



m1 <- matrix(runif(r*c)+1, r, c)



> m1

          [,1]     [,2]

 [1,] 1.467001 1.393902

 [2,] 1.084598 1.474218

 [3,] 1.973485 1.891222

 [4,] 1.571306 1.665011

 [5,] 1.020119 1.736832

 [6,] 1.723557 1.911469

 [7,] 1.609394 1.637850

 [8,] 1.306719 1.864651

 [9,] 1.063510 1.287575

[10,] 1.305353 1.129959

I've got a loop that tells me, for each value of the first column, what is the index of the first value in the second column that is 10% higher like so :

result <- 1:nrow(m1)



for (i in 1:nrow(m1)){

    result[i] <- which(m1[,2]>(1.1*m1[,1][i]))[1]

}

> result

 [1]  3  1 NA  3  1  6  3  2  1  2

I've got so much matrices that it's taking hours, and after profiling my code, the biggest time consuming task by far is this loop. What is, according to you, the fastest way to do it ?

For example, with r = 30000 :

start_time <- Sys.time()



for (i in 1:nrow(m1)){

    result[i] <- which(m1[,2]>(1.1*m1[,1][i]))[1]

}



end_time <- Sys.time()

a <- end_time - start_time



> a

Time difference of 11.25815 secs

Thanks for you help !

edited 8 hours ago

asked yesterday

Boog

785

2

Is this for repeated application? If so, maybe you study the relevant algorithms and create a function with rcpp or similar.

– sindri_baldur
yesterday

1

Also, as general advice, don't forget to consider the big picture. With profiling, it's easy to find that your code spends 99% of its time doing X and think "I need to find a faster way to do X," when often the question you should really be asking is "How can I avoid doing X so much?" (Of course, without seeing the rest of your code or knowing what its purpose is, we can't really help you with that here. But it's worth keeping in mind.)

– Ilmari Karonen
yesterday

add a comment |

I've got a lot of matrices similar to this but with thousands of rows :

r <- 10

c <- 2

set.seed(333)



m1 <- matrix(runif(r*c)+1, r, c)



> m1

          [,1]     [,2]

 [1,] 1.467001 1.393902

 [2,] 1.084598 1.474218

 [3,] 1.973485 1.891222

 [4,] 1.571306 1.665011

 [5,] 1.020119 1.736832

 [6,] 1.723557 1.911469

 [7,] 1.609394 1.637850

 [8,] 1.306719 1.864651

 [9,] 1.063510 1.287575

[10,] 1.305353 1.129959

I've got a loop that tells me, for each value of the first column, what is the index of the first value in the second column that is 10% higher like so :

result <- 1:nrow(m1)



for (i in 1:nrow(m1)){

    result[i] <- which(m1[,2]>(1.1*m1[,1][i]))[1]

}

> result

 [1]  3  1 NA  3  1  6  3  2  1  2

I've got so much matrices that it's taking hours, and after profiling my code, the biggest time consuming task by far is this loop. What is, according to you, the fastest way to do it ?

For example, with r = 30000 :

start_time <- Sys.time()



for (i in 1:nrow(m1)){

    result[i] <- which(m1[,2]>(1.1*m1[,1][i]))[1]

}



end_time <- Sys.time()

a <- end_time - start_time



> a

Time difference of 11.25815 secs

Thanks for you help !

edited 8 hours ago

asked yesterday

Boog

785

I've got a lot of matrices similar to this but with thousands of rows :

r <- 10

c <- 2

set.seed(333)



m1 <- matrix(runif(r*c)+1, r, c)



> m1

          [,1]     [,2]

 [1,] 1.467001 1.393902

 [2,] 1.084598 1.474218

 [3,] 1.973485 1.891222

 [4,] 1.571306 1.665011

 [5,] 1.020119 1.736832

 [6,] 1.723557 1.911469

 [7,] 1.609394 1.637850

 [8,] 1.306719 1.864651

 [9,] 1.063510 1.287575

[10,] 1.305353 1.129959

I've got a loop that tells me, for each value of the first column, what is the index of the first value in the second column that is 10% higher like so :

result <- 1:nrow(m1)



for (i in 1:nrow(m1)){

    result[i] <- which(m1[,2]>(1.1*m1[,1][i]))[1]

}

> result

 [1]  3  1 NA  3  1  6  3  2  1  2

I've got so much matrices that it's taking hours, and after profiling my code, the biggest time consuming task by far is this loop. What is, according to you, the fastest way to do it ?

For example, with r = 30000 :

start_time <- Sys.time()



for (i in 1:nrow(m1)){

    result[i] <- which(m1[,2]>(1.1*m1[,1][i]))[1]

}



end_time <- Sys.time()

a <- end_time - start_time



> a

Time difference of 11.25815 secs

Thanks for you help !

edited 8 hours ago

asked yesterday

Boog

785

edited 8 hours ago

asked yesterday

Boog

785

edited 8 hours ago

asked yesterday

Boog

785

asked yesterday

Boog

785

asked yesterday

Boog

785

2

Is this for repeated application? If so, maybe you study the relevant algorithms and create a function with rcpp or similar.

– sindri_baldur
yesterday

1

Also, as general advice, don't forget to consider the big picture. With profiling, it's easy to find that your code spends 99% of its time doing X and think "I need to find a faster way to do X," when often the question you should really be asking is "How can I avoid doing X so much?" (Of course, without seeing the rest of your code or knowing what its purpose is, we can't really help you with that here. But it's worth keeping in mind.)

– Ilmari Karonen
yesterday

add a comment |

2

Is this for repeated application? If so, maybe you study the relevant algorithms and create a function with rcpp or similar.

– sindri_baldur
yesterday

1

Also, as general advice, don't forget to consider the big picture. With profiling, it's easy to find that your code spends 99% of its time doing X and think "I need to find a faster way to do X," when often the question you should really be asking is "How can I avoid doing X so much?" (Of course, without seeing the rest of your code or knowing what its purpose is, we can't really help you with that here. But it's worth keeping in mind.)

– Ilmari Karonen
yesterday

Is this for repeated application? If so, maybe you study the relevant algorithms and create a function with rcpp or similar.

– sindri_baldur
yesterday

Also, as general advice, don't forget to consider the big picture. With profiling, it's easy to find that your code spends 99% of its time doing X and think "I need to find a faster way to do X," when often the question you should really be asking is "How can I avoid doing X so much?" (Of course, without seeing the rest of your code or knowing what its purpose is, we can't really help you with that here. But it's worth keeping in mind.)

– Ilmari Karonen
yesterday

add a comment |

5 Answers
5

active

oldest

votes

There are some shortcuts you can take here. You are looking for the first value in column 2 that is higher than some other value. This means that it is never worth looking at values that are lower than what we have previously seen in column 2.

In your example with 10 rows, that would be as follows:

> cummax(m1[, 2])

 [1] 1.393902 1.474218 1.891222 1.891222 1.891222 1.911469 1.911469 1.911469 1.911469 1.911469

> which(cummax(m1[, 2]) == m1[, 2])

[1] 1 2 3 6

And as you can see, these are the only values in your result vector.

A second optimisation that could be made is to order the first column. If you start looking for the lowest value first, and work your way up, you don't have to look through the second column each time. You only have to step to the next row there if there are no matches with the left row anymore.

This does bear the cost of sorting the matrix, but afterwards the result can be found using a single pass through both columns.

dostuff <- function(m1){

  orderColumn1 <- order(m1[, 1])



  plus.10 <- m1[, 1] * 1.1



  results <- rep(NA, length(plus.10))



  IndexColumn1 <- 1

  IndexColumn2 <- 1

  row2CurrentMax <- 0

  while(IndexColumn2 <= nrow(m1)){

    row2Current <- m1[IndexColumn2, 2]

    if(row2Current > row2CurrentMax){

      row2CurrentMax <- row2Current

      while(TRUE){

        row1Current <- plus.10[orderColumn1[IndexColumn1]]

        if(row1Current <= row2CurrentMax){

          results[orderColumn1[IndexColumn1]] <- IndexColumn2

          IndexColumn1 <- IndexColumn1 + 1

        } else {

          break

        }

      }

    }

    IndexColumn2 <- IndexColumn2 + 1

  }

  results

}

With 30000 rows:

> result <- dostuff(m1)

> end_time <- Sys.time()

> a <- end_time - start_time

> a

Time difference of 0.0600059 secs

edited yesterday

answered yesterday

JAD

1,15711125

This could probably be improved on using rcpp, but I am not familiar enough with that.

– JAD
yesterday

add a comment |

I don't imagine this is the fastest way but it will be somewhat faster than using the current for loop approach.

plus.10 <- m1[, 1] * 1.1

m2 <- m1[,2]

result <- sapply( plus.10, function(x) which.min(m2 < x))

result[plus.10 > max(m2) ] <- NA



result

[1]  3  1 NA  3  1  6  3  2  1  2

Edit: As requested by Ronak, microbenchmark results of the solutions proposed so far on 10000 rows:

Unit: milliseconds

   expr        min        lq       mean      median          uq         max neval   cld

     h1 335.342689 337.35915 361.320461  341.804840  347.856556  516.230972    25  b   

 sindri 672.587291 688.78673 758.445467  713.240778  811.298608 1049.109844    25    d 

     op 865.567412 884.99514 993.066179 1006.694036 1026.434344 1424.755409    25     e

   loco 675.809092 682.98591 731.256313  693.672064  807.007358  821.893865    25    d 

 dmitry 420.869493 427.56492 454.439806  433.656519  438.367480  607.030825    25   c  

    jad   4.369628   4.41044   4.735393    4.503657    4.556527    7.488471    25 a

edited 22 hours ago

answered yesterday

H 1

3,17211124

1

And if you first store m2 = m1[,2] you can be faster.

– LocoGris
yesterday

@RonakShah - Yes, I didn't mean generally, I meant specifically the solution proposed by OP.

– H 1
yesterday

add a comment |

Here is an attempt using match() which reduces the time compared to the r = 30000 example in the original post by about 25%.

sapply(m1[, 1] * 1.1, function(x) match(TRUE, m1[, 2] > x))



[1]  3  1 NA  3  1  6  3  2  1  2

edited yesterday

answered yesterday

sindri_baldur

8,2551033

add a comment |

I propose these:

r <-30000

c <- 2

set.seed(333)



m1 <- matrix(runif(r*c)+1, r, c)

x2 <-m1[, 2]







start_time <- Sys.time()



result <- lapply(m1[, 1], function(x) {

  min(which(m1[,2]>(1.1*x)))

})

end_time <- Sys.time()

a <- end_time - start_time

a





start_time <- Sys.time()



result <- lapply(m1[, 1], function(x) {

            min(which(x2>(1.1*x)))

})

end_time <- Sys.time()

a <- end_time - start_time

a

First one: 8.6 s
Second one: 6.4 s

answered yesterday

LocoGris

2,0471725

add a comment |

The best way to optimize your code is to use data.table package

This code gives you > 2x speed up.

library(data.table);



setDTthreads(0);



r <- 30000;

c <- 2;

set.seed(333);



m1 <- matrix(runif(r*c)+1, r, c);

result1 <- rep(NA, nrow(m1));



start_time <- Sys.time();



for (i in 1:nrow(m1))

{

    result1[i] <- which(m1[,2]>(1.1*m1[,1][i]))[1];

}



#result1



end_time <- Sys.time()

a <- end_time - start_time

a





start_time <- Sys.time()



tstDT <- data.table(m1);

#result2 <- tstDT[, sapply(V1, function(elem) { which(V2 > 1.1*elem)[1] })]

result2 <- tstDT[, sapply(V1, function(x) match(TRUE, V2 > 1.1*x) )]



#result2



end_time <- Sys.time()

a <- end_time - start_time

a

Little comment - I use data.table compiled by gcc with march=native and O3. Possible O2 and march=core (as in standard package by installation) speed up will be less, but...

Result:

> library(data.table);

> 

> setDTthreads(0);

> 

> r <- 30000;

> c <- 2;

> set.seed(333);

> 

> m1 <- matrix(runif(r*c)+1, r, c);

> result1 <- rep(NA, nrow(m1));

> 

> start_time <- Sys.time();

> 

> for (i in 1:nrow(m1))

+ {

+     result1[i] <- which(m1[,2]>(1.1*m1[,1][i]))[1];

+ }

> 

> #result1

> 

> end_time <- Sys.time()

> a <- end_time - start_time

> a

Time difference of 8.738938 secs

> 

> 

> start_time <- Sys.time()

> 

> tstDT <- data.table(m1);

> #result2 <- tstDT[, sapply(V1, function(elem) { which(V2 > 1.1*elem)[1] })]

> result2 <- tstDT[, sapply(V1, function(x) match(TRUE, V2 > 1.1*x) )]

> 

> #result2

> 

> end_time <- Sys.time()

> a <- end_time - start_time

> a

Time difference of 3.582921 secs

> 

> 

> 

>

edited yesterday

answered yesterday

Dmitriy

314314

1

This is definitely not the best way. As the other answers demonstrate much better ways, with bigger speedups.

– Konrad Rudolph
yesterday

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55274665%2fget-first-value-that-matches-condition-loop-too-slow%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

5 Answers
5

active

oldest

votes

5 Answers
5

active

oldest

votes

In your example with 10 rows, that would be as follows:

> cummax(m1[, 2])

 [1] 1.393902 1.474218 1.891222 1.891222 1.891222 1.911469 1.911469 1.911469 1.911469 1.911469

> which(cummax(m1[, 2]) == m1[, 2])

[1] 1 2 3 6

And as you can see, these are the only values in your result vector.

This does bear the cost of sorting the matrix, but afterwards the result can be found using a single pass through both columns.

dostuff <- function(m1){

  orderColumn1 <- order(m1[, 1])



  plus.10 <- m1[, 1] * 1.1



  results <- rep(NA, length(plus.10))



  IndexColumn1 <- 1

  IndexColumn2 <- 1

  row2CurrentMax <- 0

  while(IndexColumn2 <= nrow(m1)){

    row2Current <- m1[IndexColumn2, 2]

    if(row2Current > row2CurrentMax){

      row2CurrentMax <- row2Current

      while(TRUE){

        row1Current <- plus.10[orderColumn1[IndexColumn1]]

        if(row1Current <= row2CurrentMax){

          results[orderColumn1[IndexColumn1]] <- IndexColumn2

          IndexColumn1 <- IndexColumn1 + 1

        } else {

          break

        }

      }

    }

    IndexColumn2 <- IndexColumn2 + 1

  }

  results

}

With 30000 rows:

> result <- dostuff(m1)

> end_time <- Sys.time()

> a <- end_time - start_time

> a

Time difference of 0.0600059 secs

edited yesterday

answered yesterday

JAD

1,15711125

This could probably be improved on using rcpp, but I am not familiar enough with that.

– JAD
yesterday

add a comment |

In your example with 10 rows, that would be as follows:

> cummax(m1[, 2])

 [1] 1.393902 1.474218 1.891222 1.891222 1.891222 1.911469 1.911469 1.911469 1.911469 1.911469

> which(cummax(m1[, 2]) == m1[, 2])

[1] 1 2 3 6

And as you can see, these are the only values in your result vector.

This does bear the cost of sorting the matrix, but afterwards the result can be found using a single pass through both columns.

dostuff <- function(m1){

  orderColumn1 <- order(m1[, 1])



  plus.10 <- m1[, 1] * 1.1



  results <- rep(NA, length(plus.10))



  IndexColumn1 <- 1

  IndexColumn2 <- 1

  row2CurrentMax <- 0

  while(IndexColumn2 <= nrow(m1)){

    row2Current <- m1[IndexColumn2, 2]

    if(row2Current > row2CurrentMax){

      row2CurrentMax <- row2Current

      while(TRUE){

        row1Current <- plus.10[orderColumn1[IndexColumn1]]

        if(row1Current <= row2CurrentMax){

          results[orderColumn1[IndexColumn1]] <- IndexColumn2

          IndexColumn1 <- IndexColumn1 + 1

        } else {

          break

        }

      }

    }

    IndexColumn2 <- IndexColumn2 + 1

  }

  results

}

With 30000 rows:

> result <- dostuff(m1)

> end_time <- Sys.time()

> a <- end_time - start_time

> a

Time difference of 0.0600059 secs

edited yesterday

answered yesterday

JAD

1,15711125

This could probably be improved on using rcpp, but I am not familiar enough with that.

– JAD
yesterday

add a comment |

In your example with 10 rows, that would be as follows:

> cummax(m1[, 2])

 [1] 1.393902 1.474218 1.891222 1.891222 1.891222 1.911469 1.911469 1.911469 1.911469 1.911469

> which(cummax(m1[, 2]) == m1[, 2])

[1] 1 2 3 6

And as you can see, these are the only values in your result vector.

This does bear the cost of sorting the matrix, but afterwards the result can be found using a single pass through both columns.

dostuff <- function(m1){

  orderColumn1 <- order(m1[, 1])



  plus.10 <- m1[, 1] * 1.1



  results <- rep(NA, length(plus.10))



  IndexColumn1 <- 1

  IndexColumn2 <- 1

  row2CurrentMax <- 0

  while(IndexColumn2 <= nrow(m1)){

    row2Current <- m1[IndexColumn2, 2]

    if(row2Current > row2CurrentMax){

      row2CurrentMax <- row2Current

      while(TRUE){

        row1Current <- plus.10[orderColumn1[IndexColumn1]]

        if(row1Current <= row2CurrentMax){

          results[orderColumn1[IndexColumn1]] <- IndexColumn2

          IndexColumn1 <- IndexColumn1 + 1

        } else {

          break

        }

      }

    }

    IndexColumn2 <- IndexColumn2 + 1

  }

  results

}

With 30000 rows:

> result <- dostuff(m1)

> end_time <- Sys.time()

> a <- end_time - start_time

> a

Time difference of 0.0600059 secs

edited yesterday

answered yesterday

JAD

1,15711125

In your example with 10 rows, that would be as follows:

> cummax(m1[, 2])

 [1] 1.393902 1.474218 1.891222 1.891222 1.891222 1.911469 1.911469 1.911469 1.911469 1.911469

> which(cummax(m1[, 2]) == m1[, 2])

[1] 1 2 3 6

And as you can see, these are the only values in your result vector.

This does bear the cost of sorting the matrix, but afterwards the result can be found using a single pass through both columns.

dostuff <- function(m1){

  orderColumn1 <- order(m1[, 1])



  plus.10 <- m1[, 1] * 1.1



  results <- rep(NA, length(plus.10))



  IndexColumn1 <- 1

  IndexColumn2 <- 1

  row2CurrentMax <- 0

  while(IndexColumn2 <= nrow(m1)){

    row2Current <- m1[IndexColumn2, 2]

    if(row2Current > row2CurrentMax){

      row2CurrentMax <- row2Current

      while(TRUE){

        row1Current <- plus.10[orderColumn1[IndexColumn1]]

        if(row1Current <= row2CurrentMax){

          results[orderColumn1[IndexColumn1]] <- IndexColumn2

          IndexColumn1 <- IndexColumn1 + 1

        } else {

          break

        }

      }

    }

    IndexColumn2 <- IndexColumn2 + 1

  }

  results

}

With 30000 rows:

> result <- dostuff(m1)

> end_time <- Sys.time()

> a <- end_time - start_time

> a

Time difference of 0.0600059 secs

edited yesterday

answered yesterday

JAD

1,15711125

edited yesterday

answered yesterday

JAD

1,15711125

answered yesterday

JAD

1,15711125

answered yesterday

JAD

1,15711125

This could probably be improved on using rcpp, but I am not familiar enough with that.

– JAD
yesterday

add a comment |

This could probably be improved on using rcpp, but I am not familiar enough with that.

– JAD
yesterday

This could probably be improved on using rcpp, but I am not familiar enough with that.

– JAD
yesterday

add a comment |

I don't imagine this is the fastest way but it will be somewhat faster than using the current for loop approach.

plus.10 <- m1[, 1] * 1.1

m2 <- m1[,2]

result <- sapply( plus.10, function(x) which.min(m2 < x))

result[plus.10 > max(m2) ] <- NA



result

[1]  3  1 NA  3  1  6  3  2  1  2

Edit: As requested by Ronak, microbenchmark results of the solutions proposed so far on 10000 rows:

Unit: milliseconds

   expr        min        lq       mean      median          uq         max neval   cld

     h1 335.342689 337.35915 361.320461  341.804840  347.856556  516.230972    25  b   

 sindri 672.587291 688.78673 758.445467  713.240778  811.298608 1049.109844    25    d 

     op 865.567412 884.99514 993.066179 1006.694036 1026.434344 1424.755409    25     e

   loco 675.809092 682.98591 731.256313  693.672064  807.007358  821.893865    25    d 

 dmitry 420.869493 427.56492 454.439806  433.656519  438.367480  607.030825    25   c  

    jad   4.369628   4.41044   4.735393    4.503657    4.556527    7.488471    25 a

edited 22 hours ago

answered yesterday

H 1

3,17211124

1

And if you first store m2 = m1[,2] you can be faster.

– LocoGris
yesterday

@RonakShah - Yes, I didn't mean generally, I meant specifically the solution proposed by OP.

– H 1
yesterday

add a comment |

I don't imagine this is the fastest way but it will be somewhat faster than using the current for loop approach.

plus.10 <- m1[, 1] * 1.1

m2 <- m1[,2]

result <- sapply( plus.10, function(x) which.min(m2 < x))

result[plus.10 > max(m2) ] <- NA



result

[1]  3  1 NA  3  1  6  3  2  1  2

Edit: As requested by Ronak, microbenchmark results of the solutions proposed so far on 10000 rows:

Unit: milliseconds

   expr        min        lq       mean      median          uq         max neval   cld

     h1 335.342689 337.35915 361.320461  341.804840  347.856556  516.230972    25  b   

 sindri 672.587291 688.78673 758.445467  713.240778  811.298608 1049.109844    25    d 

     op 865.567412 884.99514 993.066179 1006.694036 1026.434344 1424.755409    25     e

   loco 675.809092 682.98591 731.256313  693.672064  807.007358  821.893865    25    d 

 dmitry 420.869493 427.56492 454.439806  433.656519  438.367480  607.030825    25   c  

    jad   4.369628   4.41044   4.735393    4.503657    4.556527    7.488471    25 a

edited 22 hours ago

answered yesterday

H 1

3,17211124

1

And if you first store m2 = m1[,2] you can be faster.

– LocoGris
yesterday

@RonakShah - Yes, I didn't mean generally, I meant specifically the solution proposed by OP.

– H 1
yesterday

add a comment |

I don't imagine this is the fastest way but it will be somewhat faster than using the current for loop approach.

plus.10 <- m1[, 1] * 1.1

m2 <- m1[,2]

result <- sapply( plus.10, function(x) which.min(m2 < x))

result[plus.10 > max(m2) ] <- NA



result

[1]  3  1 NA  3  1  6  3  2  1  2

Edit: As requested by Ronak, microbenchmark results of the solutions proposed so far on 10000 rows:

Unit: milliseconds

   expr        min        lq       mean      median          uq         max neval   cld

     h1 335.342689 337.35915 361.320461  341.804840  347.856556  516.230972    25  b   

 sindri 672.587291 688.78673 758.445467  713.240778  811.298608 1049.109844    25    d 

     op 865.567412 884.99514 993.066179 1006.694036 1026.434344 1424.755409    25     e

   loco 675.809092 682.98591 731.256313  693.672064  807.007358  821.893865    25    d 

 dmitry 420.869493 427.56492 454.439806  433.656519  438.367480  607.030825    25   c  

    jad   4.369628   4.41044   4.735393    4.503657    4.556527    7.488471    25 a

edited 22 hours ago

answered yesterday

H 1

3,17211124

I don't imagine this is the fastest way but it will be somewhat faster than using the current for loop approach.

plus.10 <- m1[, 1] * 1.1

m2 <- m1[,2]

result <- sapply( plus.10, function(x) which.min(m2 < x))

result[plus.10 > max(m2) ] <- NA



result

[1]  3  1 NA  3  1  6  3  2  1  2

Edit: As requested by Ronak, microbenchmark results of the solutions proposed so far on 10000 rows:

Unit: milliseconds

   expr        min        lq       mean      median          uq         max neval   cld

     h1 335.342689 337.35915 361.320461  341.804840  347.856556  516.230972    25  b   

 sindri 672.587291 688.78673 758.445467  713.240778  811.298608 1049.109844    25    d 

     op 865.567412 884.99514 993.066179 1006.694036 1026.434344 1424.755409    25     e

   loco 675.809092 682.98591 731.256313  693.672064  807.007358  821.893865    25    d 

 dmitry 420.869493 427.56492 454.439806  433.656519  438.367480  607.030825    25   c  

    jad   4.369628   4.41044   4.735393    4.503657    4.556527    7.488471    25 a

edited 22 hours ago

answered yesterday

H 1

3,17211124

edited 22 hours ago

answered yesterday

H 1

3,17211124

answered yesterday

H 1

3,17211124

answered yesterday

H 1

3,17211124

1

And if you first store m2 = m1[,2] you can be faster.

– LocoGris
yesterday

@RonakShah - Yes, I didn't mean generally, I meant specifically the solution proposed by OP.

– H 1
yesterday

add a comment |

1

And if you first store m2 = m1[,2] you can be faster.

– LocoGris
yesterday

@RonakShah - Yes, I didn't mean generally, I meant specifically the solution proposed by OP.

– H 1
yesterday

And if you first store m2 = m1[,2] you can be faster.

– LocoGris
yesterday

@RonakShah - Yes, I didn't mean generally, I meant specifically the solution proposed by OP.

– H 1
yesterday

add a comment |

Here is an attempt using match() which reduces the time compared to the r = 30000 example in the original post by about 25%.

sapply(m1[, 1] * 1.1, function(x) match(TRUE, m1[, 2] > x))



[1]  3  1 NA  3  1  6  3  2  1  2

edited yesterday

answered yesterday

sindri_baldur

8,2551033

add a comment |

Here is an attempt using match() which reduces the time compared to the r = 30000 example in the original post by about 25%.

sapply(m1[, 1] * 1.1, function(x) match(TRUE, m1[, 2] > x))



[1]  3  1 NA  3  1  6  3  2  1  2

edited yesterday

answered yesterday

sindri_baldur

8,2551033

add a comment |

Here is an attempt using match() which reduces the time compared to the r = 30000 example in the original post by about 25%.

sapply(m1[, 1] * 1.1, function(x) match(TRUE, m1[, 2] > x))



[1]  3  1 NA  3  1  6  3  2  1  2

edited yesterday

answered yesterday

sindri_baldur

8,2551033

Here is an attempt using match() which reduces the time compared to the r = 30000 example in the original post by about 25%.

sapply(m1[, 1] * 1.1, function(x) match(TRUE, m1[, 2] > x))



[1]  3  1 NA  3  1  6  3  2  1  2

edited yesterday

answered yesterday

sindri_baldur

8,2551033

edited yesterday

answered yesterday

sindri_baldur

8,2551033

answered yesterday

sindri_baldur

8,2551033

answered yesterday

sindri_baldur

8,2551033

add a comment |

I propose these:

r <-30000

c <- 2

set.seed(333)



m1 <- matrix(runif(r*c)+1, r, c)

x2 <-m1[, 2]







start_time <- Sys.time()



result <- lapply(m1[, 1], function(x) {

  min(which(m1[,2]>(1.1*x)))

})

end_time <- Sys.time()

a <- end_time - start_time

a





start_time <- Sys.time()



result <- lapply(m1[, 1], function(x) {

            min(which(x2>(1.1*x)))

})

end_time <- Sys.time()

a <- end_time - start_time

a

First one: 8.6 s
Second one: 6.4 s

answered yesterday

LocoGris

2,0471725

add a comment |

I propose these:

r <-30000

c <- 2

set.seed(333)



m1 <- matrix(runif(r*c)+1, r, c)

x2 <-m1[, 2]







start_time <- Sys.time()



result <- lapply(m1[, 1], function(x) {

  min(which(m1[,2]>(1.1*x)))

})

end_time <- Sys.time()

a <- end_time - start_time

a





start_time <- Sys.time()



result <- lapply(m1[, 1], function(x) {

            min(which(x2>(1.1*x)))

})

end_time <- Sys.time()

a <- end_time - start_time

a

First one: 8.6 s
Second one: 6.4 s

answered yesterday

LocoGris

2,0471725

add a comment |

I propose these:

r <-30000

c <- 2

set.seed(333)



m1 <- matrix(runif(r*c)+1, r, c)

x2 <-m1[, 2]







start_time <- Sys.time()



result <- lapply(m1[, 1], function(x) {

  min(which(m1[,2]>(1.1*x)))

})

end_time <- Sys.time()

a <- end_time - start_time

a





start_time <- Sys.time()



result <- lapply(m1[, 1], function(x) {

            min(which(x2>(1.1*x)))

})

end_time <- Sys.time()

a <- end_time - start_time

a

First one: 8.6 s
Second one: 6.4 s

answered yesterday

LocoGris

2,0471725

I propose these:

r <-30000

c <- 2

set.seed(333)



m1 <- matrix(runif(r*c)+1, r, c)

x2 <-m1[, 2]







start_time <- Sys.time()



result <- lapply(m1[, 1], function(x) {

  min(which(m1[,2]>(1.1*x)))

})

end_time <- Sys.time()

a <- end_time - start_time

a





start_time <- Sys.time()



result <- lapply(m1[, 1], function(x) {

            min(which(x2>(1.1*x)))

})

end_time <- Sys.time()

a <- end_time - start_time

a

First one: 8.6 s
Second one: 6.4 s

answered yesterday

LocoGris

2,0471725

answered yesterday

LocoGris

2,0471725

answered yesterday

LocoGris

2,0471725

answered yesterday

LocoGris

2,0471725

add a comment |

The best way to optimize your code is to use data.table package

This code gives you > 2x speed up.

library(data.table);



setDTthreads(0);



r <- 30000;

c <- 2;

set.seed(333);



m1 <- matrix(runif(r*c)+1, r, c);

result1 <- rep(NA, nrow(m1));



start_time <- Sys.time();



for (i in 1:nrow(m1))

{

    result1[i] <- which(m1[,2]>(1.1*m1[,1][i]))[1];

}



#result1



end_time <- Sys.time()

a <- end_time - start_time

a





start_time <- Sys.time()



tstDT <- data.table(m1);

#result2 <- tstDT[, sapply(V1, function(elem) { which(V2 > 1.1*elem)[1] })]

result2 <- tstDT[, sapply(V1, function(x) match(TRUE, V2 > 1.1*x) )]



#result2



end_time <- Sys.time()

a <- end_time - start_time

a

Little comment - I use data.table compiled by gcc with march=native and O3. Possible O2 and march=core (as in standard package by installation) speed up will be less, but...

Result:

> library(data.table);

> 

> setDTthreads(0);

> 

> r <- 30000;

> c <- 2;

> set.seed(333);

> 

> m1 <- matrix(runif(r*c)+1, r, c);

> result1 <- rep(NA, nrow(m1));

> 

> start_time <- Sys.time();

> 

> for (i in 1:nrow(m1))

+ {

+     result1[i] <- which(m1[,2]>(1.1*m1[,1][i]))[1];

+ }

> 

> #result1

> 

> end_time <- Sys.time()

> a <- end_time - start_time

> a

Time difference of 8.738938 secs

> 

> 

> start_time <- Sys.time()

> 

> tstDT <- data.table(m1);

> #result2 <- tstDT[, sapply(V1, function(elem) { which(V2 > 1.1*elem)[1] })]

> result2 <- tstDT[, sapply(V1, function(x) match(TRUE, V2 > 1.1*x) )]

> 

> #result2

> 

> end_time <- Sys.time()

> a <- end_time - start_time

> a

Time difference of 3.582921 secs

> 

> 

> 

>

edited yesterday

answered yesterday

Dmitriy

314314

1

This is definitely not the best way. As the other answers demonstrate much better ways, with bigger speedups.

– Konrad Rudolph
yesterday

add a comment |

The best way to optimize your code is to use data.table package

This code gives you > 2x speed up.

library(data.table);



setDTthreads(0);



r <- 30000;

c <- 2;

set.seed(333);



m1 <- matrix(runif(r*c)+1, r, c);

result1 <- rep(NA, nrow(m1));



start_time <- Sys.time();



for (i in 1:nrow(m1))

{

    result1[i] <- which(m1[,2]>(1.1*m1[,1][i]))[1];

}



#result1



end_time <- Sys.time()

a <- end_time - start_time

a





start_time <- Sys.time()



tstDT <- data.table(m1);

#result2 <- tstDT[, sapply(V1, function(elem) { which(V2 > 1.1*elem)[1] })]

result2 <- tstDT[, sapply(V1, function(x) match(TRUE, V2 > 1.1*x) )]



#result2



end_time <- Sys.time()

a <- end_time - start_time

a

Little comment - I use data.table compiled by gcc with march=native and O3. Possible O2 and march=core (as in standard package by installation) speed up will be less, but...

Result:

> library(data.table);

> 

> setDTthreads(0);

> 

> r <- 30000;

> c <- 2;

> set.seed(333);

> 

> m1 <- matrix(runif(r*c)+1, r, c);

> result1 <- rep(NA, nrow(m1));

> 

> start_time <- Sys.time();

> 

> for (i in 1:nrow(m1))

+ {

+     result1[i] <- which(m1[,2]>(1.1*m1[,1][i]))[1];

+ }

> 

> #result1

> 

> end_time <- Sys.time()

> a <- end_time - start_time

> a

Time difference of 8.738938 secs

> 

> 

> start_time <- Sys.time()

> 

> tstDT <- data.table(m1);

> #result2 <- tstDT[, sapply(V1, function(elem) { which(V2 > 1.1*elem)[1] })]

> result2 <- tstDT[, sapply(V1, function(x) match(TRUE, V2 > 1.1*x) )]

> 

> #result2

> 

> end_time <- Sys.time()

> a <- end_time - start_time

> a

Time difference of 3.582921 secs

> 

> 

> 

>

edited yesterday

answered yesterday

Dmitriy

314314

1

This is definitely not the best way. As the other answers demonstrate much better ways, with bigger speedups.

– Konrad Rudolph
yesterday

add a comment |

The best way to optimize your code is to use data.table package

This code gives you > 2x speed up.

library(data.table);



setDTthreads(0);



r <- 30000;

c <- 2;

set.seed(333);



m1 <- matrix(runif(r*c)+1, r, c);

result1 <- rep(NA, nrow(m1));



start_time <- Sys.time();



for (i in 1:nrow(m1))

{

    result1[i] <- which(m1[,2]>(1.1*m1[,1][i]))[1];

}



#result1



end_time <- Sys.time()

a <- end_time - start_time

a





start_time <- Sys.time()



tstDT <- data.table(m1);

#result2 <- tstDT[, sapply(V1, function(elem) { which(V2 > 1.1*elem)[1] })]

result2 <- tstDT[, sapply(V1, function(x) match(TRUE, V2 > 1.1*x) )]



#result2



end_time <- Sys.time()

a <- end_time - start_time

a

Little comment - I use data.table compiled by gcc with march=native and O3. Possible O2 and march=core (as in standard package by installation) speed up will be less, but...

Result:

> library(data.table);

> 

> setDTthreads(0);

> 

> r <- 30000;

> c <- 2;

> set.seed(333);

> 

> m1 <- matrix(runif(r*c)+1, r, c);

> result1 <- rep(NA, nrow(m1));

> 

> start_time <- Sys.time();

> 

> for (i in 1:nrow(m1))

+ {

+     result1[i] <- which(m1[,2]>(1.1*m1[,1][i]))[1];

+ }

> 

> #result1

> 

> end_time <- Sys.time()

> a <- end_time - start_time

> a

Time difference of 8.738938 secs

> 

> 

> start_time <- Sys.time()

> 

> tstDT <- data.table(m1);

> #result2 <- tstDT[, sapply(V1, function(elem) { which(V2 > 1.1*elem)[1] })]

> result2 <- tstDT[, sapply(V1, function(x) match(TRUE, V2 > 1.1*x) )]

> 

> #result2

> 

> end_time <- Sys.time()

> a <- end_time - start_time

> a

Time difference of 3.582921 secs

> 

> 

> 

>

edited yesterday

answered yesterday

Dmitriy

314314

The best way to optimize your code is to use data.table package

This code gives you > 2x speed up.

library(data.table);



setDTthreads(0);



r <- 30000;

c <- 2;

set.seed(333);



m1 <- matrix(runif(r*c)+1, r, c);

result1 <- rep(NA, nrow(m1));



start_time <- Sys.time();



for (i in 1:nrow(m1))

{

    result1[i] <- which(m1[,2]>(1.1*m1[,1][i]))[1];

}



#result1



end_time <- Sys.time()

a <- end_time - start_time

a





start_time <- Sys.time()



tstDT <- data.table(m1);

#result2 <- tstDT[, sapply(V1, function(elem) { which(V2 > 1.1*elem)[1] })]

result2 <- tstDT[, sapply(V1, function(x) match(TRUE, V2 > 1.1*x) )]



#result2



end_time <- Sys.time()

a <- end_time - start_time

a

Little comment - I use data.table compiled by gcc with march=native and O3. Possible O2 and march=core (as in standard package by installation) speed up will be less, but...

Result:

> library(data.table);

> 

> setDTthreads(0);

> 

> r <- 30000;

> c <- 2;

> set.seed(333);

> 

> m1 <- matrix(runif(r*c)+1, r, c);

> result1 <- rep(NA, nrow(m1));

> 

> start_time <- Sys.time();

> 

> for (i in 1:nrow(m1))

+ {

+     result1[i] <- which(m1[,2]>(1.1*m1[,1][i]))[1];

+ }

> 

> #result1

> 

> end_time <- Sys.time()

> a <- end_time - start_time

> a

Time difference of 8.738938 secs

> 

> 

> start_time <- Sys.time()

> 

> tstDT <- data.table(m1);

> #result2 <- tstDT[, sapply(V1, function(elem) { which(V2 > 1.1*elem)[1] })]

> result2 <- tstDT[, sapply(V1, function(x) match(TRUE, V2 > 1.1*x) )]

> 

> #result2

> 

> end_time <- Sys.time()

> a <- end_time - start_time

> a

Time difference of 3.582921 secs

> 

> 

> 

>

edited yesterday

answered yesterday

Dmitriy

314314

edited yesterday

answered yesterday

Dmitriy

314314

answered yesterday

Dmitriy

314314

answered yesterday

Dmitriy

314314

1

This is definitely not the best way. As the other answers demonstrate much better ways, with bigger speedups.

– Konrad Rudolph
yesterday

add a comment |

1

This is definitely not the best way. As the other answers demonstrate much better ways, with bigger speedups.

– Konrad Rudolph
yesterday

This is definitely not the best way. As the other answers demonstrate much better ways, with bigger speedups.

– Konrad Rudolph
yesterday

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfxtrjtrk