Content Extractor/Data Scraper
Using Regex patterns, XPath or CSS/jQuery selectors, you will be able to configure Sitebulb to scrape text or HTML on each URL it audits. This data will then be presented within its own section in the audit report as a URL List + data, and exportable to Excel for further analysis.
Comments: 10
-
03 Oct, '17
Tom WHighly interesting feature (if it's accurate). N.B. scraping text cleanly is not trivial. (I believe Diffbot is the market leader and charges about 0.1c/page)
-
03 Oct, '17
Zdenek Dvorak [linki.cz]I really miss that from Sceaming Frog. I can tell what kind of URL have at an e-commerce site (scrape unique text or content grouping), if a content is popular (scrape number of comments), assign authos and many more.
-
06 Oct, '17
Michael Fieldimport.io is a pretty cool service for this. whilst I don't anticipate the point and click features, some of it's elements could be pulled through on a crawl.
-
03 Nov, '17
Simon CoxGood idea but I believe this distracts from the original purpose of Sitebulb. There are a lot more things time could be well-spent on before tackling this.
-
29 Nov, '17
Tim WolfeThis should be a Must-Do feature. I use custom extractions in DeepCrawl & Screaming Frog for everything from schema markup to list counts and page copy.
-
05 May, '18
WalidFor me this shouldn't be optional, it's essential. Crawling a website helps to get data, and then insights. To make valuable insights the data must be segmented. Type of page on e-commerce websites. We have product pages, categories, informational, manufacturer, institutional pages and so on. A custom extractor helps flagging all URLs for understand where they fit. Once done the data is much more insightful, it can help me then in log analysis, keyword analysis, technical website analysis....
-
25 May, '18
garethhaving to decide on one crawler SF or SB - this is a pretty hard feature not to have. heart wants SB, brain is making me take SF this budget cycle.
-
04 Jul, '18
Laurie Turnbull MergedIdentify pages based on a footprint (e.g. video embed code)
-
09 Jul, '18
Admin"Custom extraction based on Footprint" (suggested by Laurie Turnbull on 2018-07-04), including upvotes (1) and comments (0), was merged into this suggestion.
-
31 Oct, '18
EricThis could fit under the "crawl map advanced features" request, but...
Something that could help paint a picture for keyword density. I was thinking something as rudimentary as a wordcloud could be extremely useful as a starting point in many use cases.