Commit 1f7ce189 authored by Rob Riepel's avatar Rob Riepel
Browse files

Initial commit

parents
package LBCD;
use constant PROTO_PORTNUM => 4330;
use constant PROTO_MAXMESG => 2048;
use constant PROTO_VERSION => 2;
use constant OP_LB_INFO => 1; # load balance info, request and reply
use constant STATUS_REQUEST => 0; # /* a request packet */
use constant STATUS_OK => 1; # /* load balance info, request and reply */
use constant STATUS_ERROR => 2; # /* generic error */
use constant STATUS_PROTO_VERSION => 3; # /* protocol version error */
use constant STATUS_PROTO_ERROR => 4; # /* generic protocol error */
use constant STATUS_UNKNOWN_OP => 5; # /* unknown operation requested */
# typedef struct {
# u_short version; /* protocol version */
# u_short id; /* requestor's uniq request id */
# u_short op; /* operation requested */
# u_short status; /* set on reply */
# } P_HEADER,*P_HEADER_PTR;
use constant P_HEADER => 'nnnn';
# typedef struct {
# P_HEADER h;
# u_int boot_time;
# u_int current_time;
# u_int user_mtime; /* time user information last changed */
# u_short l1; /* (int) (load*100) */
# u_short l5;
# u_short l15;
# u_short tot_users; /* total number of users logged in */
# u_short uniq_users; /* total number of uniq users */
# u_char on_console; /* true if somone on console */
# u_char reserved; /* future use, padding... */
# } P_LB_RESPONSE, *P_LB_RESPONSE_PTR;
use constant P_LB_RESPONSE => P_HEADER . 'NNNnnnnnCC';
1;
NAME=power-lbbe
FILES=poller.config LBCD.pm lbbe poller
ALL_FILES=Makefile README.md $(FILES) slbcd
VERSION=1.0.0
RELEASE=$(NAME)-$(VERSION)
check: $(ALL_FILES)
@echo " The files are ready. Now what? Targets: install/update, dist/release, clean"
install update: /etc/poller.config /usr/sbin/lbbe /usr/sbin/poller /usr/share/perl5/LBCD.pm /var/lib/poller
/etc/poller.config: poller.config
install -D -m 644 $< $@
/usr/sbin/lbbe: lbbe
install -D -m 755 $< $@
/usr/sbin/poller: poller
install -D -m 755 $< $@
/usr/share/perl5/LBCD.pm: LBCD.pm
install -D -m 644 $< $@
/var/lib/poller:
mkdir -p $@
dist release: $(RELEASE).tar.gz
$(RELEASE).tar.gz: $(RELEASE).tar
gzip -f $^
$(RELEASE).tar: $(ALL_FILES)
ln -s . $(RELEASE)
tar cf $@ $(addprefix $(RELEASE)/,$^)
rm -f $(RELEASE)
clean:
rm -f $(NAME)-*.tar.gz poller.config.*
Contents
========
* Introduction
* Collecting and Summarizing Host Metrics
* Answering DNS Queries
* TTLs and MXes
* Configuring the poller
* The Load-Balanced Client Daemon
* Site-Specific Configuration
* Installation
* Conclusion
Introduction
============
lbbe is a PowerDNS pipe back end providing DNS load balancing. lbbe allows
you to create dynamic groups of hosts that have one name in the DNS. A host
may be in multiple groups at the same time. For example, the name
www.best.stanford.edu
represents a dynamic group of 5 web servers named:
www{1,2,3,4,5}.stanford.edu
When someone tries to connect to www.best.stanford.edu, a DNS query is
performed. That query eventually gets sent to the PowerDNS server using lbbe
which responds with the name of the least loaded of those five web servers.
Of course advertising a web service in a subdomain like "best.stanford.edu"
is less than desirable. Fortunately, that can be avoided with the simple
alias:
www.stanford.edu -> www.best.stanford.edu
Now when someone tries to connect to www.stanford.edu and their DNS resolver
queries for the IP address of www.stanford.edu it gets the following answer:
www.stanford.edu is an alias for www.best.stanford.edu (1)
www.best.stanford.edu is an alias for www3.stanford.edu (2)
www3.stanford.edu has address 171.64.10.89 (3)
In this "alias chain" type answer, the middle link, (2), is provided by lbbe;
the others are provided by the normal name service serving the "stanford.edu"
zone.
Collecting and Summarizing Host Metrics
=======================================
The poller script queries the load-balanced client daemon, lbcd, on hosts
participating in load-balanced name groups. It collects the system load and
other data from the lbcd running on each host (more on lbcd later). The
poller uses the collected information to calculate an overall "weight" for
each host. The host names and weights are written to a file that lbbe uses
to determine which host names to pass out in response to queries.
The poller calculates the weights using the following algorithm:
// constants
WEIGHT_PER_USER = 10
WEIGHT_PER_LOAD_UNIT = 3
// data retrieved from lbcd by the poller
l1 = current host load
tot = total number of users logged into the host
uniq = number of unique users logged into the host
// data provided in the lbbe configuration file
sf = this host's "server factor" in the range 0 to 10
weight = WEIGHT_PER_USER * (0.2 * tot + 0.8 * uniq) * (10 - sf)
+ WEIGHT_PER_LOAD_UNIT * l1 * sf
As you can see, the "server factor" controls the relative importance of
interactive user sessions. A value of zero represents a purely interactive,
user-oriented host; a value of 10 represents a server with no logins.
Answering DNS Queries
=====================
The lbbe script uses the weights calculated by the poller and the host
"participation factors" to determine which host's name to pass out in
response to a DNS query. The participation factor is a configuration
parameter that allows for unequal load sharing between hosts in a group.
lbbe sorts a group of hosts by (weight / participation factor) and passes
out the name of the host with the lowest value. A host with a participation
factor of 0.10 would have to have a weight ten times that of a host with a
participation factor of 1.0 before they would sort equally. A very small
participation factor, say 0.001, can be used to make one host a backup for
a set of other hosts with the default participation factor of 1.0.
Once lbbe has determined which host's name to use in the DNS response,
it sends the response and increments the host's weight. The increment is
calculated using the following formula:
// constants
WEIGHT_PER_USER = 10
WEIGHT_PER_LOAD_UNIT = 3
// data provided in the lbbe configuration file
sf = this host's "server factor" in the range 0 to 10
increment = WEIGHT_PER_USER * (10 - sf)
+ WEIGHT_PER_LOAD_UNIT * sf
Note that eventually all the hosts in a group will have the same weight and
their names will be passed out in round-robin fashion from then on.
TTLs and MXes
=============
By default, lbbe uses a TTL of 0 in its DNS responses. That tells the DNS
not to cache the response and to always make a new query for the name in
question. This gives the most accurate load sharing across the hosts in a
group. But it may generate too many DNS queries. If that's the case, you
can provide a non-zero TTL value for the group in the configuration file.
It's still best to keep the value small to promote a truly balanced load.
DNS queries for MX records must be handled in lbbe for groups not using
the alias chain response type. For those using the alias chain, the MX
response is "inherited" from the host's real name. For example, an MX
query on www.stanford.edu might get the following alias chain answer:
www.stanford.edu is an alias for www.best.stanford.edu (1)
www.best.stanford.edu is an alias for www3.stanford.edu (2)
www3.stanford.edu mail is handled by 10 leland.stanford.edu (3)
As mentioned before, only link (2) comes from lbbe, so it doesn't have to
handle the MX data. But without the alias chain, the answer would be:
www.stanford.edu is an alias for www.best.stanford.edu (1)
www.best.stanford.edu mail is handled by 10 leland.stanford.edu (2)
Once again part (2) is provided by the load-balanced name server, but this
time it's the MX record, so lbbe must have the data to respond properly.
In order to support this, the lbbe configuration file includes a section
where you may provide MX information.
Configuring the poller
======================
The poller is configured using a configuration file. The first section of the
file lists the names of the hosts and the names of the groups in which they
participate (remember that a group is a load-balanced name). Each host also
has a server factor and a participation factor for each group it's in. The
file may have an optional second section for listing load-balanced name TTLs
and MXes. Here's a short sample configuration file:
# SF = server factor; default participation factor = 1.0;
host SF group(participation factor)
#################### ## #########################################
foo.stanford.edu 2 quux
bar.stanford.edu 10 www
baz.stanford.edu 5 quux www(.01)
# default TTL = 0 seconds; top slice - see the poller POD; XXX
# default MX = none;
group TTL top slice MX
############ ##### ######### ##################
www 6 0 mail.stanford.edu
The Load-Balanced Client Daemon
===============================
The load-balanced client daemon, lbcd, runs on the hosts participating in
load-balanced name groups. The poller queries lbcd to get the host load
and other information. lbcd is a mildly complex beast because it has to
understand how to get all that information from a plethora of Unix
variants. Because of that complexity, lbcd is distributed and maintained
separately from lbbe.
If lbcd doesn't support your flavor of Unix, or round-robin satisfies your
load-balancing needs, or you just want to play with lbbe without down-
loading lbcd, this package includes slbcd, a static/simple load-balanced
client daemon. slbcd is written in perl. It provides a complete lbcd with
hard-coded values for the load and other data. When you run it on the hosts
participating in a load-balanced name group, they will always have the same
weight and therefore their names will always be passed out in round-robin
fashion.
Site-Specific Configuration
===========================
The lbbe and poller.config files in this package contain Stanford-specific
configuration data. Unless you're going to run a load-balanced name service
for Stanford, you'll want to edit those files replacing that data with values
appropriate for your DNS domain.
Installation
============
Use make(1) with the included Makefile to install the lbbe and poller programs
along with their dependencies.
Conclusion
==========
Hopefully this has been enough of an introduction to lbbe that you can
make it work for you. Be sure to check out the POD documentation for both
the lbbe and the poller, and the sample configuration file, poller.config.
Share and enjoy.
#!/usr/bin/perl
############################################################
#
# lbbe - load balancing name server back end
#
############################################################
use Getopt::Std;
############################################################
#
# Configuration / Initialization
#
$WEIGHT_PER_USER = 10; # should be consistent with poller
$WEIGHT_PER_LOAD_UNIT = 3; # should be consistent with poller
$poller_sleep = 120;
$poller_results = "/var/lib/poller/lb";
$next_poller_time = 0;
$default_ttl = 0;
$my_domain = "best.stanford.edu";
$hostmaster = "action.stanford.edu";
@servers = qw(lbdns1.stanford.edu lbdns2.stanford.edu lbdns3.stanford.edu);
@SOA = ($servers[0], $hostmaster, time(), 3600, 1800, 86400, 0);
$|=1; # don't buffer output, that would be silly
############################################################
#
# Command-line arguement processing
#
($myname = $0) =~ s|.*/||;
getopts("dlp:r:s:") or die <<EOF;
Usage: $myname [options] [domain (def: $my_domain)]
-d debug
-l turn on logging
-p path poller results (def: $poller_results)
-r rname domain admin email address encoded as a name
(def: $hostmaster)
-s server,... authoritative name server names
(def: @servers)
EOF
$debug = $opt_d;
$logging = $opt_l;
$poller_results = $opt_p if $opt_p;
@servers = split(/\s*,\s*/,$opt_s) if $opt_s;
$SOA[0] = $servers[0] if $opt_s;
$SOA[1] = $opt_r if $opt_r;
$my_domain = shift @ARGV if @ARGV;
do_reload(); # load initial poller data
############################################################
#
# Accept input from PowerDNS
#
chomp($line=<>);
unless($line eq "HELO\t1") {
write_log("bad HELO message '$line'");
print "FAIL\n";
exit;
}
print "OK lbbe firing up\n";
debug("HELO accepted, on with the show!"); # this is it
while (<>) {
chomp();
($query = $_) =~ s/\t/ /g;
write_log("received query: $query");
# my ($type,$qname,$qclass,$qtype,$id,$ip)=split(/\s+/); # easier for testing
my ($type,$qname,$qclass,$qtype,$id,$ip)=split(/\t/); # production
if ($type eq "AXFR") { print "END\n"; next; }
unless ($type eq "Q") {
print "LOG Unsupported message '$_'\n";
print "FAIL\n";
next;
}
# domain SOA and NS queries
if ($qname eq $my_domain) {
if ($qtype eq "SOA" || $qtype eq "ANY") {
write_log("sending SOA record");
print "DATA $qname $qclass SOA 3600 -1 " . join("\t",@SOA) . "\n";
}
if ($qtype eq "NS" || $qtype eq "ANY") {
write_log("sending NS records");
foreach (@servers) { print "DATA $qname $qclass NS 3600 -1 $_\n"; }
}
}
# dynamic queries
elsif ($qname =~ ".+\.$my_domain") {
write_log("sending dynamic response for $qname/$qtype");
print handle_lb_request(split(/\./,$qname,2),$qtype,$qclass);
}
print "END\n";
} continue { do_reload() } # check for new poller data
############################################################
#
# Dynamic domain handler
#
sub by_weight { $weight{$a} / $$a{$qname} <=> $weight{$b} / $$b{$qname}; }
sub min { my @values = sort {$a <=> $b} @_; return shift @values; }
sub handle_lb_request {
my($residual,$domain,$qtype,$qclass) = @_;
my($the_host,$the_ip,$answer,$group);
local($qname);
# if the name is "show", return status of load-balanced name in $domain
my $show = "";
if ($residual eq "show" and $qtype eq "ANY") {
($residual,$domain) = split(/\./,$domain,2);
$qtype = "TXT";
$show = "show.";
}
$qname = $residual;
return unless ($group = $lb_groups{$qname});
if ($qtype eq "A" || $qtype eq "MX" || $qtype eq "ANY" || $qtype eq "CNAME") {
@$group = sort by_weight @$group;
$the_host = $rnd{$qname} ? @$group[int(rand(min($rnd{$qname},$#$group)))] : @$group[0];
$weight{$the_host} += $WEIGHT_PER_USER * (10 - $server_factor{$the_host})
+ $WEIGHT_PER_LOAD_UNIT * $server_factor{$the_host};
$answer = "DATA $qname.$domain $qclass CNAME $ttl{$qname} -1 $the_host\n";
} elsif ($qtype eq "TXT") {
for $the_host (sort by_weight @$group) {
$answer .= "DATA $show$qname.$domain $qclass TXT $ttl{$qname} -1 "
. sprintf("%7d/%-5.3f %s",$weight{$the_host}, $$the_host{$qname}, $the_host)
. "\n";
}
}
return $answer;
}
############################################################
#
# Poller interaction
#
sub do_reload {
# check for new poller results and load them
debug("do_reload");
if (time() > $next_poller_time) {
$mtime = (stat($poller_results))[9];
if ($mtime > $next_poller_time) {
debug("reloading config");
load_config($poller_results);
$next_poller_time = $mtime + $poller_sleep;
$SOA[2] = $mtime; # set SOA serial number
}
}
}
sub load_config {
local($file) = @_;
my($host,$bg,$a,$b,$c,$d,$ipaddr,$weight,$ip,$groups,$message);
my($entry,$play);
$message = "load_config:";
%ttl=();
%rnd=();
%mx=();
%hits=();
%group_hits=();
%weight=();
%server_factor=();
%lb_groups=();
%ip_host=();
open(CONFIG,"<",$file) or print "LOG Can't open config file: $file: $!\n";
while(<CONFIG>) {
s/^\s+//;
s/\s+$//;
next if /^#/ || /^$/;
($weight,$server_factor,$host,$ip,$groups) = split(/\s+/,$_,5);
$message .= "\n loading $_";
if ($weight !~ /^[0-9]+/ ) {
$ttl{$weight} = $server_factor; # ttl
$rnd{$weight} = $host if $host; # random
$mx{$weight} = $ip if $ip; # mx
debug("N for $weight is $host") if $host;
debug("MX for $weight is $ip") if $ip;
next;
}
$_ = $ip;
($a,$b,$c,$d) = /(\d+)\.(\d+)\.(\d+)\.(\d+)/;
$ipaddr = ($a<<24)|($b<<16)|($c<<8)|$d;
$ip_host{$host} = $ipaddr;
$weight{$host} = $weight;
$oweight{$host} = $weight;
$server_factor{$host} = $server_factor;
foreach $entry (split(/\s+/,$groups)) {
($group,$play) = split(/\(/,$entry);
chop($play);
$$host{$group} = $play ? $play : 1;
$lb_groups{$group} = [] unless defined $lb_groups{$group};
$bg = $lb_groups{$group};
debug("$host participation in $group is $$host{$group}");
push(@$bg,$host);
}
}
debug($message);
close(CONFIG);
# assure ttl and random for each group
foreach $key (keys(%lb_groups)) {
$ttl{$key} = $default_ttl unless $ttl{$key};
$rnd{$key} = 0 unless $rnd{$key} >= 2;
}
}
sub write_log {
return unless $logging;
my $message = shift;
print STDERR "$$ lbbe $message\n";
}
sub debug {
return unless $debug;
my $message = shift;
my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time);
my $date = sprintf("%02d/%02d %02d:%02d",$mon+1,$mday,$hour,$min);
print STDERR "$date $$ lbbe $message\n";
}
=head1 NAME
lbbe - Load-balancing DNS server back end
=head1 SYNOPSIS
B<lbbe> [B<-d>] [B<-l>] [B<-p> I<path>] [B<-r> I<rname>] [B<-s> I<server,...>] [I<domain>]
=head1 DESCRIPTION
B<lbbe> is a load-balancing DNS server back end. It depends on a separate
program, B<poller>, to collect data about the load and state of servers.
Then, based on that data, it responds to DNS queries with the least loaded
system in a pool of systems answering to the name in the query. Each time
it hands out a particular system, it increments that system's load so that
the load will be balanced between multiple systems even between updates of
the system loads.
Configuration options are specified at the beginning of the script and can
be overridden with options and arguments. In the default configuration,
B<lbbe> serves out the I<best.stanford.edu> domain with nameservers
I<lbdns1.stanford.edu>, I<lbdns2.stanford.edu>, and
I<lbdns3.stanford.edu>. The best domain returns CNAMEs to the real system
names. The TTL of these records can be configured in the B<poller>
configuration file. The domain can be changed on the command line, the
nameservers can be specified using the B<-s> option, and the B<SOA> I<rname>
can be specified using the B<-r> option.
A TXT query for a I<show> plus the load-balanced pool name will return a
textual record listing all of the participating servers and their current
loads and participation factors. The server load is divided by the
participation factor for the purposes of sorting by load. B<lbbe> can also
be configured to return a randomly chosen host from the N least loaded
rather than always returning the least loaded system. See the poller
documentation for details.
=head1 OPTIONS
=over 4
=item B<-d>
Enable debugging, which causes B<lbbe> to print out more information to
standard error.
=item B<-l>
Enable logging, which causes B<lbbe> to print information to standard error.
=item B<-p> I<path>
B<lbbe> looks for the poller results in F</var/lib/poller/lb>. This option
overrides that and directs B<lbbe> to find the results elsewhere.
=item B<-r> I<rname>
The B<SOA> I<rname> field specifies the email address of a person or role
account responsible for the load-balanced zone with the "@" converted to a
".". The default B<lbbe> value is I<action.stanford.edu>, the role account
responsible for the I<best.stanford.edu> domain. Use this option to override
that and provide the appropriate encoded email address.
=item B<-s> I<server,...>
The default set of nameservers is I<lbdns1.stanford.edu>, I<lbdns2.stanford.edu>,
and I<lbdns3.stanford.edu>. This option overrides that with the specified servers.
=back
=head1 FILES
=over 4
=item F</etc/poller.config>
The default location for the configuration file as described above.
=item F</var/lib/poller/lb>
poller data used to calculate query answers. May be in another location
if the B<-p> option has been given.
=item F</usr/sbin/poller>
The poller program. This program does the work of querying remote
systems and writing out the results for B<lbbe> to read. For more
information, see the poller(8) man page.
=back
=head1 SEE ALSO