Postgres-XL 9.5r1.3 Documentation | |||
---|---|---|---|
Prev | Up | Chapter 17. Server Setup and Operation | Next |
Before anyone can access the database, you must start the database server. The database server program is called postgres. The postgres program must know where to find the data it is supposed to use. This is done with the -D option. Thus, the simplest way to start the server is:
$ postgres -D /usr/local/pgsql/data
which will leave the server running in the foreground. This must be done while logged into the PostgreSQL user account. Without -D, the server will try to use the data directory named by the environment variable PGDATA. If that variable is not provided either, it will fail.
Normally it is better to start postgres in the background. For this, use the usual Unix shell syntax:
$ postgres -D /usr/local/pgsql/data >logfile 2>&1 &
It is important to store the server's stdout and stderr output somewhere, as shown above. It will help for auditing purposes and to diagnose problems. (See Section 23.3 for a more thorough discussion of log file handling.)
The postgres program also takes a number of other command-line options. For more information, see the postgres reference page and Chapter 18 below.
This shell syntax can get tedious quickly. Therefore the wrapper program pg_ctl is provided to simplify some tasks. For example:
pg_ctl start -l logfile
will start the server in the background and put the output into the named log file. The -D option has the same meaning here as for postgres. pg_ctl is also capable of stopping the server.
Normally, you will want to start the database server when the computer boots. Autostart scripts are operating-system-specific. There are a few distributed with PostgreSQL in the contrib/start-scripts directory. Installing one will require root privileges.
Different systems have different conventions for starting up daemons at boot time. Many systems have a file /etc/rc.local or /etc/rc.d/rc.local. Others use init.d or rc.d directories. Whatever you do, the server must be run by the PostgreSQL user account and not by root or any other user. Therefore you probably should form your commands using su postgres -c '...'. For example:
su postgres -c 'pg_ctl start -D /usr/local/pgsql/data -l serverlog'
Here are a few more operating-system-specific suggestions. (In each case be sure to use the proper installation directory and user name where we show generic values.)
For FreeBSD, look at the file contrib/start-scripts/freebsd in the PostgreSQL source distribution.
On OpenBSD, add the following lines to the file /etc/rc.local:
if [ -x /usr/local/pgsql/bin/pg_ctl -a -x /usr/local/pgsql/bin/postgres ]; then su -l postgres -c '/usr/local/pgsql/bin/pg_ctl start -s -l /var/postgresql/log -D /usr/local/pgsql/data' echo -n ' postgresql' fi
On Linux systems either add
/usr/local/pgsql/bin/pg_ctl start -l logfile -D /usr/local/pgsql/data
to /etc/rc.d/rc.local or /etc/rc.local or look at the file contrib/start-scripts/linux in the PostgreSQL source distribution.
On NetBSD, use either the FreeBSD or Linux start scripts, depending on preference.
On Solaris, create a file called /etc/init.d/postgresql that contains the following line:
su - postgres -c "/usr/local/pgsql/bin/pg_ctl start -l logfile -D /usr/local/pgsql/data"
Then, create a symbolic link to it in /etc/rc3.d as S99postgresql.
While the server is running, its PID is stored in the file postmaster.pid in the data directory. This is used to prevent multiple server instances from running in the same data directory and can also be used for shutting down the server.
As described in the previous chapter, XL consists of various components. Minimum set of components are a GTM, GTM-Proxy, Coordinator and Datanode. You must configure and start each of them. Following sections will give you how to configure and start them. pgxc_clean and GTM-Standby are described in high-availability sections.
You should initialize each database which composes Postgres-XL database cluster system. Both Coordinator and Datanode has its own database and you should initialize these database. Coordinator holds just database catalog and temporary data store. Datanode holds most of your data. First of all, you should determine how many Coordinators/Datanodes to run and where they should run. It is a good convention that you run a Coordinator where you run a Datanode. In this case, you should run GTM-Proxy on the same server too. It simplifies XL configuration and help to make workload of each servers even.
Both Coordinator and Datanode have their own databases, essentially PostgreSQL databases. They are separate and you should initialize them separately.
The GTM provides global transaction management feature to all the other components in Postgres-XL database cluster. Because the GTM handles transaction requirements from all the Coordinators and Datanodes, it is highly advised to run this in a separate server.
Before you start the GTM, you should decide followings:
Because the GTM receives all the request to begin/end transactions and to refer to sequence values, you should run the GTM in a separate server. If you run the GTM in the same server as Datanode or Coordinator, it will become harder to make a workload reasonably balanced.
Then, you should determine the GTM's working directory. Please create this directory before you run the GTM.
Next, you should determine listen address and port of the GTM. Listen address can be either the IP address or host name which receives request from other component, typically GTM-Proxy.
You have a chance to run more than one GTM in one Postgres-XL cluster. For example, if you need a backup of GTM in high-availability environment, you need to run two GTMs. You should give unique GTM id to each of such GTMs. GTM id value begins with one.
When this is determined, you can initialize the GTM with the command initgtm, for example:
$ initgtm -Z gtm -D /usr/local/pgsql/data_gtm
All the parameters related to the GTM can be modified in gtm.conf located in data folder initialized by initgtm.
Then you can start the GTM as follows:
$ gtm -D /usr/local/pgsql/data_gtm
where -D option specifies working directory of the GTM.
Alternatively, the GTM can be started using gtm_ctl, for example:
$ gtm_ctl -Z gtm start -D /usr/local/pgsql/data_gtm
A GTM-Proxy is not a mandatory component of Postgres-XL cluster but it can be used to group messages between the GTM and cluster nodes, reducing workload and the number of packages exchanged through network.
As described in the previous section, a GTM-Proxy needs its own listen address, port, working directory and GTM-Proxy ID, which should be unique and begins with one. In addition, you should determine how many working threads to run. You should also use the GTM's address and port to start GTM-Proxy.
Then, you need first to initialize a GTM-Proxy with initgtm, for example:
$ initgtm -Z gtm_proxy -D /usr/local/pgsql/data_gtm_proxy
All the parameters related to a GTM-Proxy can be modified in gtm_proxy.conf located in data folder initialized by initgtm.
Then, you can start a GTM-Proxy like:
$ gtm_proxy -D /usr/local/pgsql/data_gtm_proxy
where -D specifies GTM-Proxy's working directory.
Alternatively, you can start a GTM-Proxy using gtm_ctl as follows:
$ gtm_ctl start -Z gtm_proxy -D /usr/local/pgsql/data_gtm_proxy
Before starting Coordinator or Datanode, you must configure them. You can configure Coordinator or Datanode by editing postgresql.conf file located at their working directory as you specified by -D option in initdb command.
Datanode is almost native PostgreSQL with some extensions. Additional options in postgresql.conf for the Datanode are as follows:
This value is not just a number of connections you expect to each Coordinator. Each Coordinator backend has a chance to connect to all the Datanodes. You should specify number of total connections whole Coordinator may accept. For example, if you have five Coordinators and each of them may accept forty connections, you should specify 200 as this parameter value.
Even though your application does not intend to issue PREPARE TRANSACTION, a Coordinator may issue this internally when more than one Datanodes are involved. You should specify this parameter the same value as max_connections.
The GTM needs to identify each Datanode, as specified by this parameter. The value should be unique and start with one.
Because both Coordinator and Datanode may run on the same server, you may want to assign separate port number to the Datanode.
Specify the port number of the GTM-Proxy, as specified in -p option in gtm_proxy or gtm_ctl.
Specify the host name or IP address of the GTM-Proxy, as specified in -h option in gtm_proxy or gtm_ctl.
For some joins that occur in queries, data from one Datanode may need to be joined with data from another Datanode. Postgres-XL uses shared queues for this purpose. During execution each Datanode knows if it needs to produce or consume tuples, or both.
Note that there may be mulitple shared_queues used even for a single query. So a value should be set taking into account the number of connections it can accept and expected number of such joins occurring simultaneously.
This parameter sets the size of each each shared queue allocated.
Although Coordinators and Datanodes shares the same binary, their configuration is a little different due to their functionalities.
You don't have to take other Coordinators or Datanodes into account. Just specify the number of connections the Coordinator accepts from applications.
Specify at least total number of Coordinators in the cluster.
The GTM needs to identify each Datanode, as specified by this parameter.
Because both a Coordinator and Datanode may run on the same server, you may want to assign separate port numbers to the Coordinator. It may be convenient to use default value of PostgreSQL listen port.
Specify the port number of the GTM-Proxy, as specified in -p option in gtm_proxy or gtm_ctl.
Specify the host name or IP address of the GTM-Proxy, as specified in -h option in gtm_proxy or gtm_ctl.
Specify the port number that the pooler should use. This must not conflict with any other server ports used on this host.
A Coordinator maintains connections to Datanodes as a pool. This parameter specifies max number of connections the Coordinator maintains. Specify max_connection value of remote nodes as this parameter value.
This is the minimum number of Coordinators to remote node connections maintained by the pooler. Typically specify 1.
This parameter specifies how long to keep the connection alive. If older than this amount, the pooler discards the connection. This parameter is useful in multi-tenant environments where many connections to many different databases may be used, so that idle connections may cleaned up. It is also useful for automatically closing connections occasionally in case there is some unknown memory leak so that this memory can be freed.
This parameter specifies how long to wait until pooler maintenance is performed. During such maintenance, old idle connections are discarded. This parameter is useful in multi-tenant environments where many connections to many different databases may be used, so that idle connections may cleaned up.
This parameter specifies the cost overhead of setting up a remote query to obtain remote data. It is used by the planner in costing queries.
This parameter is used in query cost planning to estimate the cost involved in row shipping and obtaining remote data based on the expected data size. Row shipping is expensive and adds latency, so this setting helps to favor plans that minimizes row shipping.
This parameter is used to get several sequence values at once from the GTM. This greatly speeds up COPY and INSERT SELECT operations where the target table uses sequences. Postgres-XL will not use this entire amount at once, but will increase the request size over time if many requests are done in a short time frame in the same session. After a short time without any sequence requests, decreases back down to 1. Note that any settings here are overriden if the CACHE clause was used in CREATE SEQUENCE or ALTER SEQUENCE.
This is the maximum number of Coordinators that can be configured in the cluster. Specify exact number if it is not planned to add more Coordinators while cluster is running, or greater, if it is desired to dynamically resize cluster. It costs about 140 bytes of shared memory per slot.
This is the maximum number of Datanodes configured in the cluster. Specify exact number if it is not planned to add more Datanodes while cluster is running, or greater, if it is desired to dynamically resize cluster. It costs about 140 bytes of shared memory per slot.
Enforce the usage of two-phase commit on transactions involving ON COMMIT actions or temporary objects. Usage of autocommit instead of two-phase commit may break data consistency so use at your own risk.
Now you can start central component of Postgres-XL, Datanode and Coordinator. If you're familiar with starting PostgreSQL database server, this step is very similar to PostgreSQL.
You can start a Datanode as follows:
$ postgres --datanode -D /usr/local/pgsql/data
--datanode specifies postgres should run as a Datanode. You may need to specify -i postgres to accept connection from TCP/IP connections or edit pg_hba.conf if cluster uses nodes among several servers.
You can start a Coordinator as follows:
$ postgres --coordinator -D /usr/local/pgsql/Datanode
--coordinator specifies postgres should run as a Coordinator. You may need to specify -i postgres to accept connection from TCP/IP connections or edit pg_hba.conf if cluster uses nodes among several servers.
There are several common reasons the server might fail to start. Check the server's log file, or start it by hand (without redirecting standard output or standard error) and see what error messages appear. Below we explain some of the most common error messages in more detail.
LOG: could not bind IPv4 socket: Address already in use HINT: Is another postmaster already running on port 5432? If not, wait a few seconds and retry. FATAL: could not create TCP/IP listen socket
This usually means just what it suggests: you tried to start another server on the same port where one is already running. However, if the kernel error message is not Address already in use or some variant of that, there might be a different problem. For example, trying to start a server on a reserved port number might draw something like:
$ postgres -p 666 LOG: could not bind IPv4 socket: Permission denied HINT: Is another postmaster already running on port 666? If not, wait a few seconds and retry. FATAL: could not create TCP/IP listen socket
A message like:
FATAL: could not create shared memory segment: Invalid argument DETAIL: Failed system call was shmget(key=5440001, size=4011376640, 03600).
probably means your kernel's limit on the size of shared memory is smaller than the work area PostgreSQL is trying to create (4011376640 bytes in this example). Or it could mean that you do not have System-V-style shared memory support configured into your kernel at all. As a temporary workaround, you can try starting the server with a smaller-than-normal number of buffers (shared_buffers). You will eventually want to reconfigure your kernel to increase the allowed shared memory size. You might also see this message when trying to start multiple servers on the same machine, if their total space requested exceeds the kernel limit.
An error like:
FATAL: could not create semaphores: No space left on device DETAIL: Failed system call was semget(5440126, 17, 03600).
does not mean you've run out of disk space. It means your kernel's limit on the number of System V semaphores is smaller than the number PostgreSQL wants to create. As above, you might be able to work around the problem by starting the server with a reduced number of allowed connections (max_connections), but you'll eventually want to increase the kernel limit.
If you get an "illegal system call" error, it is likely that shared memory or semaphores are not supported in your kernel at all. In that case your only option is to reconfigure the kernel to enable these features.
Details about configuring System V IPC facilities are given in Section 17.4.1.
Although the error conditions possible on the client side are quite varied and application-dependent, a few of them might be directly related to how the server was started. Conditions other than those shown below should be documented with the respective client application.
psql: could not connect to server: Connection refused Is the server running on host "server.joe.com" and accepting TCP/IP connections on port 5432?
This is the generic "I couldn't find a server to talk to" failure. It looks like the above when TCP/IP communication is attempted. A common mistake is to forget to configure the server to allow TCP/IP connections.
Alternatively, you'll get this when attempting Unix-domain socket communication to a local server:
psql: could not connect to server: No such file or directory Is the server running locally and accepting connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
The last line is useful in verifying that the client is trying to connect to the right place. If there is in fact no server running there, the kernel error message will typically be either Connection refused or No such file or directory, as illustrated. (It is important to realize that Connection refused in this context does not mean that the server got your connection request and rejected it. That case will produce a different message, as shown in Section 19.4.) Other error messages such as Connection timed out might indicate more fundamental problems, like lack of network connectivity.