What webcrawlers have you made that you use on a regular basis?

Question

What webcrawlers have you made that you use on a regular basis?

Andrew Russell

September 17, 2016 - 13:50

Other urls found in this thread:

youtube.com/feeds/videos.xml?channel_id=[HEX-ID]
youtube.com/channel/UCxr2d4As312LulcajAkKJYw
youtube.com/feeds/videos.xml?channel_id=UCxr2d4As312LulcajAkKJYw
youtube.com/feeds/videos.xml?channel_id=UCqbkm47qBxDj-P3lI9voIAw
twitter.com/SFWRedditVideos

Brandon Walker

ona that checks every chaturbate room for a pair of tetas theb it takes a screenshot and compresses it down to a 32x32 gif and sends it to my phone

September 17, 2016 - 13:52

Dylan Moore

why

September 17, 2016 - 13:55

Robert Young

ur my hero m8

September 17, 2016 - 13:59

Cameron Ward

EVERYONE GIMME YOUR WEBCRAWLER IDEAS

September 17, 2016 - 14:13

Logan Hernandez

One that hops on my bank account and gets any cash movement.

A crawler to download poster/favorited videos of a user from xhamster.

September 17, 2016 - 17:58

Camden Foster

How do I make one? Where can I get examples? What do I need? Is this like imacros in Firefox? I am really interested

September 17, 2016 - 18:07

Noah Cruz

I used a html parser module for python called beautifulSoup, you wont need more than this 99%of the time. If more interaction is needed with the website, selenium is my goto module.

September 17, 2016 - 18:13

Joseph Young

is there any method of doing this without resorting the the cucks of programming languages`?

is it easy to do with C or Ada?

September 17, 2016 - 20:04

Carson Smith

i hope this is a bait post
you are constrained mostly by your network speed, therefore the c/ada program running faster makes absolutely no difference
and no, it isnt easy either
just use python/ruby/any scripting language w/ a library that has bindings to a native html parser

September 17, 2016 - 20:12

Josiah Wright

Mostly i dont know those and would like to use something im comfortable with

September 17, 2016 - 20:17

Daniel Torres

Just came up with the idea for a crawler that looks for credit card info or pics of cards (eg on Twitter) and makes donations to something worthwhile, like cloning Harambe

September 17, 2016 - 20:21

Owen Price

>I used a html parser module for python called beautifulSoup
beautifulSoup is nice, but it's slow as shit. Unless you REALLY need the tolerance, use LXML.

Why the crap would you want to tackle a problem like this using C or Ada?

>cucks of programming languages
Are you 12?

>Mostly i dont know those
Then learn one. If you know C well then I can't imagine you'll have trouble picking up Python.

>Automatic image recognition against random images on Twitter
Good luck.

September 17, 2016 - 20:23

Henry Roberts

Training a neutral net to recognize credit cards would be stupidly easy. Throw in some OCR magic and hook it up to Twitter's API and there you go. Something similar was already done, there used to be an account that would retweet photos of credit cards.

September 17, 2016 - 20:33

Dylan Cox

Tempted to make an interpals crawler that searches girls for keywords and sends a message created around those keywords

September 17, 2016 - 20:47

Aaron Sullivan

>Training a neutral net to recognize credit cards would be stupidly easy.
Maybe?
I suspect the broad range of patterns and images on credit cards would make identifying them tricky, but I don't have real experience in that area.

September 17, 2016 - 21:16

Joseph Carter

did one to log into Mergent Online and search for top preformers for the day in the, well, some market like nyse. downloads financial statements, competitors, etc

pretty useless thing to do

September 17, 2016 - 21:26

Samuel Allen

parse university canteens websites

offer machine readable data

September 17, 2016 - 21:28

Jaxson Gutierrez

any guides on building one?

September 17, 2016 - 21:40

Aaron Powell

how hard would it be to do it in C though ? Just for fun to learn and become more familiar with C.
Or is it just too bad of an idea

September 17, 2016 - 21:41

Nathan Ross

WEBCRAWLING IN MY SKIN

September 17, 2016 - 21:52

Charles Watson

i was fucking listening to crawling too

September 17, 2016 - 21:57

Eli Stewart

I work at an industrial equipment distributor.
Made some scripts to gather data and process from manufacturers pages in order to use them on our company page.
Does this count as crawling, I haven't really looked into the definition of crawling, I just did my junk to do what it need it to.

September 17, 2016 - 21:58

Matthew Bailey

Scrapy framework for Python is very good and use concurrent requests.
A lot of options are also available.

September 17, 2016 - 22:06

Jeremiah Parker

Made some python/scrapy cronjobs to automatically like the fb/twitter posts of my gf every hour or so.

Cause you know, I'm a vagina slave developer with no time for childishness like social networks.

September 17, 2016 - 22:10

Jaxson Perry

Sometimes I like to use nmap to scan millions of random IPs on port 80 and then see if a web page resolves. It's usually just boring shit like chinese sites and stuff. I found someone's home videos once.

September 17, 2016 - 22:27

Luis Price

One for "subscribing" to youtube channels, without having an account, navigating through a laggy GUI or getting distracted from my work by recommendations.
After the scan, the videos open in a vlc media stream.
Quite comfy on low-end computers.

September 17, 2016 - 23:18

Xavier Walker

Nicely done.

September 17, 2016 - 23:26

Jacob Scott

>One for "subscribing" to youtube channels, without having an account
I do that too.

>After the scan, the videos open in a vlc media stream.
Huh, okay. Mine returns an Atom feed that gets read by my feed reader.

September 17, 2016 - 23:28

Parker Taylor

Are you doing that via the Youtube API? I did a similar API to RSS kind of thing for search results a while ago, but it had some arbitrary limits in API v2 or whatever it was at the time.

September 17, 2016 - 23:33

Jose Gutierrez

Y'know, every channel does have an RSS feed. Y'can just use that.

September 17, 2016 - 23:37

Blake Sullivan

Uuhh uh whaat

September 17, 2016 - 23:38

Hudson Brown

>Are you doing that via the Youtube API?
God no. The Youtube API actually requires you to authenticate with an account.

I'm just scraping the HTML of the Uploads page (or Playlist page) and the individual Video pages. To save re-scraping the same pages over and over, I store the info on the Video pages in a SQLite DB between scrapes.

If Google doesn't like me doing that, then they're free to bring back channel RSS feeds.

>every channel does have an RSS feed
That's been gone for years.

September 17, 2016 - 23:41

Jaxon Lee

yeah. There was some specific url you paste the channels ID after, but sometimes even just view source and look for 'RSS' works. Ill see if I have the URL saved somewhere.
All you need is RSS though, for that, yeah.

September 17, 2016 - 23:41

Brayden Russell

>That's been gone for years.
Its not. It's still there. Just not obviously available.

September 17, 2016 - 23:43

Jonathan Reed

>Its not. It's still there. Just not obviously available.
Shit, really? I did a bunch of searching for stuff like that before I wrote the scraper, but I found nothing that still worked.
Do you have any information you could post / link to?

September 17, 2016 - 23:45

Oliver Hughes

I have some automated betting process going on with a few sports betting sites.

September 17, 2016 - 23:52

Ayden Bell

youtube.com/feeds/videos.xml?channel_id=[HEX-ID]

[HEX-ID] => search for the tag "channel-external-id" on the channel HTML

September 18, 2016 - 00:05

Angel Powell

this seems cool..is it checking the betting lines in those games?

September 18, 2016 - 00:09

Ayden Scott

However, on osme channels, you can just view source and ctrl+f 'rss'. for example, from the channel of a random video in my recommended videos-
youtube.com/channel/UCxr2d4As312LulcajAkKJYw
youtube.com/feeds/videos.xml?channel_id=UCxr2d4As312LulcajAkKJYw

Otherwise you'll have to do that however.
hoowever, I'm...having trouble finding one its not working for right now, looking for one, even though it didnt work for alot of the channels in my rss. So its pretty helpful still, apparantly.

September 18, 2016 - 00:10

Jayden Moore

if there are existing libs in C for scraping like beautifulsoup, then easy

Otherwise, start from scratch would be an intermediate task for a new C programmer

September 18, 2016 - 00:12

Luis Morris

>channel RSS feeds are gone
Its still there, using it right now...

September 18, 2016 - 00:19

Julian Ward

I suggest going for libcurl and libtidy.
libtidy comes with a buffer that can be passed to the curl_easy_setopt on CURLOPT_WRITEDATA
But listen to

September 18, 2016 - 00:22

Samuel Wood

>youtube.com/feeds/videos.xml?channel_id=UCqbkm47qBxDj-P3lI9voIAw
Alright, I don't know if that was added since I wrote this thing, or if I missed it somehow.
Still, Thanks!

September 18, 2016 - 00:39

Nathan Bennett

>>I used a html parser module for python called beautifulSoup
>beautifulSoup is nice, but it's slow as shit. Unless you REALLY need the tolerance, use LXML.
What the fuck is this and how do I use it?
Im manually parsing html with C right now..
>protip: it just werks

September 18, 2016 - 02:05

1 2 ... 5 Next

What webcrawlers have you made that you use on a regular basis?

Last threads