Gutenberg Crawler
Sometimes you need some real files with decent content for testing or whatever. Project Gutenberg is a great source of Public Domain books. Their robot page shows a number of ways to automate download of their files, however a quick and dirty (or effective, depends on your point of view) is to build the path to the text document itself. Exercise restraint - hammering their servers is rude, there are guidelines on the page linked earlier.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'net/http' | |
require 'uri' | |
(10000..10010).each do |i| | |
path = i.to_s.chars.to_a[0...-1].inject(""){|memo, value|"#{memo}/#{value}"} | |
path = "/dirs" + path + "/#{i}/#{i}.txt" | |
puts "Get: http://www.gutenberg.org" + path | |
Net::HTTP::Proxy("yourproxyserver", 8080).start("www.gutenberg.org", 80) do |http| | |
http.request_get(path) do |response| | |
File.open("#{i}.txt", 'w') do |file| | |
response.read_body { |body_string| file.write body_string } | |
end if response.is_a? Net::HTTPSuccess | |
end | |
end | |
end |
SVN Stripper
Git is friendly, it only adds one folder to your project. Subversion is invasive adding one to every single folder. Sometimes clients want source, for whatever reason this script strips out the .svn or _svn folders so you can unversion some subversioned source without folder hunting.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'fileutils' | |
Dir.chdir('Directory that you want to clear svn junk folders out of.') | |
Dir.glob('**/{.,_}svn').each do |svn_dir| | |
FileUtils.rm_rf(File.join(Dir.pwd, svn_dir)) | |
end |
No comments:
Post a Comment