All patches and comments are welcome. Please squash your changes to logical
commits before using git-format-patch and git-send-email to
patches@git.madduck.net.
If you'd read over the Git project's submission guidelines and adhered to them,
I'd be especially grateful.
5 webcheckout - check out repositories referenced on a web page
9 B<webcheckout> [options] url [destdir]
13 B<webcheckout> downloads an url and parses it, looking for version control
14 repositories referenced by the page. It checks out each repository into
15 a subdirectory of the current directory, using whatever VCS program is
16 appropriate for that repository (git, svn, etc).
18 The information about the repositories is embedded in the web page using
19 the rel=vcs-* microformat, which is documented at
20 <http://kitenet.net/~joey/rfc/rel-vcs/>.
22 If the optional destdir parameter is specified, VCS programs will be asked
23 to check out repositories into that directory. If there are multiple
24 repositories to check out, each will be checked out into a separate
25 subdirectory of the destdir.
33 Prefer authenticated repositories. By default, webcheckout will use
34 anonymous repositories when possible. If you have an account that
35 allows you to use authenticated repositories, you might want to use this
40 Do not actually check anything out, just print out the commands that would
41 be run to check out the repositories.
45 Quiet mode. Do not print out the commands being run. (The VCS commands
46 may still be noisy however.)
52 To use this program you will need lots of VCS programs installed,
53 obviously. It also depends on the perl LWP and HTML::Parser modules.
55 If the perl URI module is installed, webcheckout can heuristically guess
56 what you mean by partial URLs, such as "kitenet.net/~joey"'
60 Copyright 2009 Joey Hess <joey@kitenet.net>
62 Licensed under the GNU GPL version 2 or higher.
64 This program is included in myrepos <http://myrepos.branchable.com/>
77 # Controls whether to print what is being done.
80 # Controls whether to actually check anything out.
83 # Controls whether to perfer repos that use authentication.
86 # Controls where to check out to. If not set, the VCS is allowed to
90 # how to perform checkouts
92 git => sub { doit("git", "clone", shift, $destdir) },
93 svn => sub { doit("svn", "checkout", shift, $destdir) },
94 bzr => sub { doit("bzr", "branch", shift, $destdir) },
97 # Regexps matching urls that are used for anonymous
98 # repository checkouts. The order is significant:
99 # urls matching earlier in the list are preferred over
100 # those matching later.
105 qr/^http:\/\//i, # generally the worst transport
109 Getopt::Long::Configure("bundling", "no_permute");
110 my $result=GetOptions(
111 "q|quiet" => \$quiet,
112 "n|noact" => \$noact,
113 "a|auth", => \$want_auth,
115 if (! $result || @ARGV < 1) {
116 die "usage: webcheckout [options] url [destdir]\n";
120 $destdir=shift @ARGV;
122 eval q{use URI::Heuristic};
124 $url=URI::Heuristic::uf_uristr($url);
133 my @args=grep { defined } @_;
134 print join(" ", @args)."\n" unless $quiet;
136 return system(@args);
139 # Is repo a better than repo b?
144 foreach my $r (@anon_urls) {
145 if ($a->{href} =~ /$r/) {
148 elsif ($b->{href} =~ /$r/) {
154 # Whichever is authed is better.
155 return 1 if ! @anon || ! grep { $_ eq $a } @anon;
156 return 0 if ! grep { $_ eq $b } @anon;
157 # Neither is authed, so the better anon method wins.
158 return $anon[0] == $a;
161 # Better anon method wins.
162 return @anon && $anon[0] == $a;
166 # Eliminate duplicate repositories from list.
167 # Duplicate repositories have the same title, or the same href.
172 foreach my $repo (@_) {
173 if (exists $repo->{title} &&
174 length $repo->{title}) {
175 if (exists $bytitle{$repo->{title}}) {
176 my $other=$bytitle{$repo->{title}};
177 next unless better($repo, $other);
178 delete $bytitle{$other->{title}}
181 if (! $seenhref{$repo->{href}}++) {
182 $bytitle{$repo->{title}}=$repo;
190 return values %bytitle, @others;
197 my $parser=HTML::Parser->new(api_version => 3);
200 $parser->handler(start => sub {
204 return if ! exists $attr->{href} || ! length $attr->{href};
205 return if ! exists $attr->{rel} || $attr->{rel} !~ /^vcs-(.+)/i;
206 $attr->{type}=lc($1);
208 # need to collect the body of the <a> tag if there is no title
209 if ($tagname eq "a" && ! exists $attr->{title}) {
216 $parser->handler(text => sub {
218 $abody.=join(" ", @_);
221 $parser->handler(end => sub {
223 if ($tagname eq "a" && defined $aref) {
224 $aref->{title}=$abody;
229 $parser->report_tags(qw{link a});
230 $parser->parse($page);
239 if (! defined $page) {
240 die "failed to download $url\n";
243 my @repos=dedup(parse($page));
245 die "no repositories found on $url\n";
249 #print Dumper(\@repos);
252 if (defined $destdir && @repos > 1) {
253 # create subdirs of $destdir for the multiple repos
256 chdir($destdir) || die "failed to chdir to $destdir: $!";
262 foreach my $repo (@repos) {
263 my $handler=$handlers{$repo->{type}};
265 if ($handler->($repo->{href}) != 0) {
266 print STDERR "failed to checkout ".$repo->{href}."\n";
271 print STDERR "unknown repository type ".$repo->{type}.
272 " for ".$repo->{href}."\n";