Python Scrapy Without Splash

I'm trying to scrape a website that uses an AJAX request for showing the content.

I tried to simulate the AJAX call, but it uses a different token in its cookie every time it sends the request, so I get "500" error and can not access the server.

So I went for the second option (i.e. scraping the website using splash) I installed it with docker and I'm running it on port 8050.

In http://localhost:8050 I have a test render.html console. I write the site address and when it supposed to show all the content (including the ones that produce with AJAX), it does not!

I tried it in my project with codes and middle-ware and everything is right, but it's not working!

Any Help Would be Appreciated.

B.t.w.: The address I'm trying to scrape is: http://lastsecond.ir/tours/

edited Dec 3 '17 at 9:39

Videonauth

24.6k1271101

asked Dec 3 '17 at 9:32

Amirition

add a comment |

I'm trying to scrape a website that uses an AJAX request for showing the content.

I tried to simulate the AJAX call, but it uses a different token in its cookie every time it sends the request, so I get "500" error and can not access the server.

So I went for the second option (i.e. scraping the website using splash) I installed it with docker and I'm running it on port 8050.

In http://localhost:8050 I have a test render.html console. I write the site address and when it supposed to show all the content (including the ones that produce with AJAX), it does not!

I tried it in my project with codes and middle-ware and everything is right, but it's not working!

Any Help Would be Appreciated.

B.t.w.: The address I'm trying to scrape is: http://lastsecond.ir/tours/

edited Dec 3 '17 at 9:39

Videonauth

24.6k1271101

asked Dec 3 '17 at 9:32

Amirition

add a comment |

I'm trying to scrape a website that uses an AJAX request for showing the content.

I tried to simulate the AJAX call, but it uses a different token in its cookie every time it sends the request, so I get "500" error and can not access the server.

So I went for the second option (i.e. scraping the website using splash) I installed it with docker and I'm running it on port 8050.

In http://localhost:8050 I have a test render.html console. I write the site address and when it supposed to show all the content (including the ones that produce with AJAX), it does not!

I tried it in my project with codes and middle-ware and everything is right, but it's not working!

Any Help Would be Appreciated.

B.t.w.: The address I'm trying to scrape is: http://lastsecond.ir/tours/

edited Dec 3 '17 at 9:39

Videonauth

24.6k1271101

asked Dec 3 '17 at 9:32

Amirition

I'm trying to scrape a website that uses an AJAX request for showing the content.

I tried to simulate the AJAX call, but it uses a different token in its cookie every time it sends the request, so I get "500" error and can not access the server.

So I went for the second option (i.e. scraping the website using splash) I installed it with docker and I'm running it on port 8050.

In http://localhost:8050 I have a test render.html console. I write the site address and when it supposed to show all the content (including the ones that produce with AJAX), it does not!

I tried it in my project with codes and middle-ware and everything is right, but it's not working!

Any Help Would be Appreciated.

B.t.w.: The address I'm trying to scrape is: http://lastsecond.ir/tours/

python

edited Dec 3 '17 at 9:39

Videonauth

24.6k1271101

asked Dec 3 '17 at 9:32

Amirition

edited Dec 3 '17 at 9:39

Videonauth

24.6k1271101

asked Dec 3 '17 at 9:32

Amirition

edited Dec 3 '17 at 9:39

Videonauth

24.6k1271101

edited Dec 3 '17 at 9:39

Videonauth

24.6k1271101

edited Dec 3 '17 at 9:39

Videonauth

24.6k1271101

asked Dec 3 '17 at 9:32

Amirition

asked Dec 3 '17 at 9:32

Amirition

asked Dec 3 '17 at 9:32

Amirition

add a comment |

1 Answer
1

active

oldest

votes

If you want it to be python, you can use pygi or pyqt with full on webkit browsers. Then inject arbitrary JS on the page or parse the dom however you prefer. It's a full on browser, so heavier than some frameworks - it does just work though, unless you're trying to parse DOM rewrites on something that uses a shadow DOM.

answered Dec 9 '17 at 21:47

RobotHumans

23.1k363104

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "89"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f982694%2fpython-scrapy-without-splash%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

answered Dec 9 '17 at 21:47

RobotHumans

23.1k363104

add a comment |

answered Dec 9 '17 at 21:47

RobotHumans

23.1k363104

add a comment |

answered Dec 9 '17 at 21:47

RobotHumans

23.1k363104

answered Dec 9 '17 at 21:47

RobotHumans

23.1k363104

answered Dec 9 '17 at 21:47

RobotHumans

23.1k363104

answered Dec 9 '17 at 21:47

RobotHumans

23.1k363104

answered Dec 9 '17 at 21:47

RobotHumans

23.1k363104

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Ask Ubuntu!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfxtrjtrk