Python Scrapy Without Splash












1















I'm trying to scrape a website that uses an AJAX request for showing the content.



I tried to simulate the AJAX call, but it uses a different token in its cookie every time it sends the request, so I get "500" error and can not access the server.



So I went for the second option (i.e. scraping the website using splash) I installed it with docker and I'm running it on port 8050.



In http://localhost:8050 I have a test render.html console. I write the site address and when it supposed to show all the content (including the ones that produce with AJAX), it does not!



I tried it in my project with codes and middle-ware and everything is right, but it's not working!



Any Help Would be Appreciated.



B.t.w.: The address I'm trying to scrape is: http://lastsecond.ir/tours/










share|improve this question





























    1















    I'm trying to scrape a website that uses an AJAX request for showing the content.



    I tried to simulate the AJAX call, but it uses a different token in its cookie every time it sends the request, so I get "500" error and can not access the server.



    So I went for the second option (i.e. scraping the website using splash) I installed it with docker and I'm running it on port 8050.



    In http://localhost:8050 I have a test render.html console. I write the site address and when it supposed to show all the content (including the ones that produce with AJAX), it does not!



    I tried it in my project with codes and middle-ware and everything is right, but it's not working!



    Any Help Would be Appreciated.



    B.t.w.: The address I'm trying to scrape is: http://lastsecond.ir/tours/










    share|improve this question



























      1












      1








      1








      I'm trying to scrape a website that uses an AJAX request for showing the content.



      I tried to simulate the AJAX call, but it uses a different token in its cookie every time it sends the request, so I get "500" error and can not access the server.



      So I went for the second option (i.e. scraping the website using splash) I installed it with docker and I'm running it on port 8050.



      In http://localhost:8050 I have a test render.html console. I write the site address and when it supposed to show all the content (including the ones that produce with AJAX), it does not!



      I tried it in my project with codes and middle-ware and everything is right, but it's not working!



      Any Help Would be Appreciated.



      B.t.w.: The address I'm trying to scrape is: http://lastsecond.ir/tours/










      share|improve this question
















      I'm trying to scrape a website that uses an AJAX request for showing the content.



      I tried to simulate the AJAX call, but it uses a different token in its cookie every time it sends the request, so I get "500" error and can not access the server.



      So I went for the second option (i.e. scraping the website using splash) I installed it with docker and I'm running it on port 8050.



      In http://localhost:8050 I have a test render.html console. I write the site address and when it supposed to show all the content (including the ones that produce with AJAX), it does not!



      I tried it in my project with codes and middle-ware and everything is right, but it's not working!



      Any Help Would be Appreciated.



      B.t.w.: The address I'm trying to scrape is: http://lastsecond.ir/tours/







      python






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Dec 3 '17 at 9:39









      Videonauth

      24.6k1271101




      24.6k1271101










      asked Dec 3 '17 at 9:32









      AmiritionAmirition

      84




      84






















          1 Answer
          1






          active

          oldest

          votes


















          0














          If you want it to be python, you can use pygi or pyqt with full on webkit browsers. Then inject arbitrary JS on the page or parse the dom however you prefer. It's a full on browser, so heavier than some frameworks - it does just work though, unless you're trying to parse DOM rewrites on something that uses a shadow DOM.






          share|improve this answer























            Your Answer








            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "89"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f982694%2fpython-scrapy-without-splash%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            If you want it to be python, you can use pygi or pyqt with full on webkit browsers. Then inject arbitrary JS on the page or parse the dom however you prefer. It's a full on browser, so heavier than some frameworks - it does just work though, unless you're trying to parse DOM rewrites on something that uses a shadow DOM.






            share|improve this answer




























              0














              If you want it to be python, you can use pygi or pyqt with full on webkit browsers. Then inject arbitrary JS on the page or parse the dom however you prefer. It's a full on browser, so heavier than some frameworks - it does just work though, unless you're trying to parse DOM rewrites on something that uses a shadow DOM.






              share|improve this answer


























                0












                0








                0







                If you want it to be python, you can use pygi or pyqt with full on webkit browsers. Then inject arbitrary JS on the page or parse the dom however you prefer. It's a full on browser, so heavier than some frameworks - it does just work though, unless you're trying to parse DOM rewrites on something that uses a shadow DOM.






                share|improve this answer













                If you want it to be python, you can use pygi or pyqt with full on webkit browsers. Then inject arbitrary JS on the page or parse the dom however you prefer. It's a full on browser, so heavier than some frameworks - it does just work though, unless you're trying to parse DOM rewrites on something that uses a shadow DOM.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Dec 9 '17 at 21:47









                RobotHumansRobotHumans

                23.1k363104




                23.1k363104






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Ask Ubuntu!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f982694%2fpython-scrapy-without-splash%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    How did Captain America manage to do this?

                    迪纳利

                    南乌拉尔铁路局