Question 6 :
MySQL - System/Compile Time and Startup Parameter Tuning
We start with the system level things since some of these decisions have to be made very early. In other cases a fast look at this part may suffice because it not that important for the big gains. However, it is always nice to have a feeling about how much one could gain by changing things at this level.
The default OS to use is really important! To get the most use of multiple CPU machines one should use Solaris (because the threads works really nice) or Linux (because the 2.2 kernel has really good SMP support). Also on 32-bit machines Linux has a 2G file size limit by default. Hopefully this will be fixed soon when new filesystems are released (XFS/Reiserfs). If you have a desperate need for files bigger than 2G on Linux-Intel 32 bit, you should get the LFS patch for the ext2 file system.
Because we have not run MySQL in production on that many platforms, we advice you to test your intended platform before choosing it, if possible.
If you have enough RAM, you could remove all swap devices. Some operating systems will use a swap device in some contexts even if you have free memory.
Use the --skip-locking MySQL option to avoid external locking. Note that this will not impact MySQL's functionality as long as you only run one server. Just remember to take down the server (or lock relevant parts) before you run myisamchk. On some system this switch is mandatory because the external locking does not work in any case. The --skip-locking option is on by default when compiling with MIT-pthreads, because flock() isn't fully supported by MIT-pthreads on all platforms. It's also on default for Linux as Linux file locking are not yet safe. The only case when you can't use --skip-locking is if you run multiple MySQL servers (not clients) on the same data, or run myisamchk on the table without first flushing and locking the mysqld server tables first. You can still use LOCK TABLES/UNLOCK TABLES even if you are using --skip-locking 12.2.1 How Compiling and Linking Affects the Speed of MySQL
Most of the following tests are done on Linux with the MySQL benchmarks, but they should give some indication for other operating systems and workloads.
You get the fastest executable when you link with -static.
On Linux, you will get the fastest code when compiling with pgcc and -O6. To compile `sql_yacc.cc' with these options, you need about 200M memory because gcc/pgcc needs a lot of memory to make all functions inline. You should also set CXX=gcc when configuring MySQL to avoid inclusion of the libstdc++ library (it is not needed). Note that with some versions of pgcc, the resulting code will only run on true Pentium processors, even if you use the compiler option that you want the resulting code to be working on all x586 type processors (like AMD).
By just using a better compiler and/or better compiler options you can get a 10-30 % speed increase in your application. This is particularly important if you compile the SQL server yourself!
We have tested both the Cygnus CodeFusion and Fujitsu compilers, but when we tested them, neither was sufficiently bug free to allow MySQL to be compiled with optimizations on.
When you compile MySQL you should only include support for the character sets that you are going to use. (Option --with-charset=xxx). The standard MySQL binary distributions are compiled with support for all character sets.
Here is a list of some measurements that we have done:
If you use pgcc and compile everything with -O6, the mysqld server is 1% faster than with gcc 2.95.2. If you link dynamically (without -static), the result is 13% slower on Linux. Note that you still can use a dynamic linked MySQL library. It is only the server that is critical for performance.
If you connect using TCP/IP rather than Unix sockets, the result is 7.5% slower on the same computer. (If you are connection to localhost, MySQL will, by default, use sockets).
If you compile with --with-debug=full, then you will loose 20 % for most queries, but some queries may take substantially longer (The MySQL benchmarks ran 35 % slower) If you use --with-debug, then you will only loose 15 %. On a Sun SPARCstation 20, SunPro C++ 4.2 is 5 % faster than gcc 2.95.2.
Compiling with gcc 2.95.2 for ultrasparc with the option -mcpu=v8 -Wa,-xarch=v8plusa gives 4 % more performance. On Solaris 2.5.1, MIT-pthreads is 8-12% slower than Solaris native threads on a single processor. With more load/CPUs the difference should get bigger.
Running with --log-bin makes [MySQL 1 % slower.
Compiling without frame pointers -fomit-frame-pointer with gcc makes MySQL 1 % faster.
The MySQL-Linux distribution provided by MySQL AB used to be compiled with pgcc, but we had to go back to regular gcc because of a bug in pgcc that would generate the code that does not run on AMD. We will continue using gcc until that bug is resolved. In the meantime, if you have a non-AMD machine, you can get a faster binary by compiling with pgcc. The standard MySqL Linux binary is linked statically to get it faster and more portable.
Question 7 :
MySQL - Disk Issues
As mentioned before, disks seeks are a big performance bottleneck. This problems gets more and more apparent when the data starts to grow so large that effective caching becomes impossible. For large databases, where you access data more or less randomly, you can be sure that you will need at least one disk seek to read and a couple of disk seeks to write things. To minimize this problem, use disks with low seek times.
Increase the number of available disk spindles (and thereby reduce the seek overhead) by either symlink files to different disks or striping the disks.
Using symbolic links
This means that you symlink the index and/or data file(s) from the normal data directory to another disk (that may also be striped). This makes both the seek and read times better (if the disks are not used for other things).
Striping means that you have many disks and put the first block on the first disk, the second block on the second disk, and the Nth on the (N mod number_of_disks) disk, and so on. This means if your normal data size is less than the stripe size (or perfectly aligned) you will get much better performance. Note that striping is very dependent on the OS and stripe-size. So benchmark your application with different stripe-sizes.
Note that the speed
difference for striping is very dependent on the parameters. Depending on how you set the striping parameters and number of disks you may get a difference in orders of magnitude. Note that you have to choose to optimize for random or sequential access.
For reliability you may want to use RAID 0+1 (striping + mirroring), but in this case you will need 2*N drives to hold N drives of data. This is probably the best option if you have the money for it! You may, however, also have to invest in some volume-management software to handle it efficiently.
A good option is to have semi-important data (that can be regenerated) on RAID 0 disk while storing really important data (like host information and logs) on a RAID 0+1 or RAID N disk. RAID N can be a problem if you have many writes because of the time to update the parity bits.
You may also set the parameters for the file system that the database uses. One easy change is to mount the file system with the noatime option. That makes it skip the updating of the last access time in the inode and by this will avoid some disk seeks.
On Linux, you can get much more performance (up to 100 % under load is not uncommon) by using hdpram to configure your disk's interface! The following should be quite good hdparm options for MySQL (and probably many other applications): hdparm -m 16 -d 1
Note that the performance/reliability when using the above depends on your hardware, so we strongly suggest that you test your system throughly after using hdparm! Please consult the hdparm man page for more information! If hdparm is not used wisely, filesystem corruption may result. Backup everything before experimenting!
On many operating systems you can mount the disks with the 'async' flag to set the file system to be updated asynchronously. If your computer is reasonable stable, this should give you more performance without sacrificing too much reliability. (This flag is on by default on Linux.)
If you don't need to know when a file was last accessed (which is not really useful on a database server), you can mount your file systems with the noatime flag.
Question 8 :
MySQL - Using Symbolic Links for Databases and Tables
You can move tables and databases from the database directory to other locations and replace them with symbolic links to the new locations. You might want to do this, for example, to move a database to a file system with more free space.
If MySQL notices that a table is symbolically linked, it will resolve the symlink and use the table it points to instead. This works on all systems that support the realpath() call (at least Linux and Solaris support realpath())! On systems that don't support realpath(), you should not access the table through the real path and through the symlink at the same time! If you do, the table will be inconsistent after any update.
MySQL doesn't that you link one directory to multiple databases. Replacing a database directory with a symbolic link will work fine as long as you don't make a symbolic link between databases. Suppose you have a database db1 under the MySQL data directory, and then make a symlink db2 that points to db1:
shell> cd /path/to/datadir
shell> ln -s db1 db2
Now, for any table tbl_a in db1, there also appears to be a table tbl_a in db2. If one thread updates db1.tbl_a and another thread updates db2.tbl_a, there will be problems.
If you really need this, you must change the following code in `mysys/mf_format.c':
if (flag & 32 || (!lstat(to,&stat_buff) && S_ISLNK(stat_buff.st_mode)))
On Windows you can use internal symbolic links to directories by compiling MySQL with -DUSE_SYMDIR. This allows you to put different databases on different disks.
Question 9 :
MySQL - Tuning Server Parameters
You can get the default buffer sizes used by the mysqld server with this command:
shell> mysqld --help
This command produces a list of all mysqld options and configurable variables. The output includes the default values and looks something like this:
Possible variables for option --set-variable (-O) are:
back_log current value: 5
bdb_cache_size current value: 1048540
binlog_cache_size current_value: 32768
connect_timeout current value: 5
delayed_insert_timeout current value: 300
delayed_insert_limit current value: 100
delayed_queue_size current value: 1000
flush_time current value: 0
interactive_timeout current value: 28800
join_buffer_size current value: 131072
key_buffer_size current value: 1048540
lower_case_table_names current value: 0
long_query_time current value: 10
max_allowed_packet current value: 1048576
max_binlog_cache_size current_value: 4294967295
max_connections current value: 100
max_connect_errors current value: 10
max_delayed_threads current value: 20
max_heap_table_size current value: 16777216
max_join_size current value: 4294967295
max_sort_length current value: 1024
max_tmp_tables current value: 32
max_write_lock_count current value: 4294967295
myisam_sort_buffer_size current value: 8388608
net_buffer_length current value: 16384
net_retry_count current value: 10
net_read_timeout current value: 30
net_write_timeout current value: 60
query_buffer_size current value: 0
record_buffer current value: 131072
slow_launch_time current value: 2
sort_buffer current value: 2097116
table_cache current value: 64
thread_concurrency current value: 10
tmp_table_size current value: 1048576
thread_stack current value: 131072
wait_timeout current value: 28800
If there is a mysqld server currently running, you can see what values it actually is using for the variables by executing this command:
shell> mysqladmin variables
You can find a full description for all variables in the SHOW VARIABLES section in this manual.
You can also see some statistics from a running server by issuing the command SHOW STATUS.
MySQL uses algorithms that are very scalable, so you can usually run with very little memory. If you, however, give MySQL more memory, you will normally also get better performance.
When tuning a MySQL server, the two most important variables to use are key_buffer_size and table_cache. You should first feel confident that you have these right before trying to change any of the other variables.
If you have much memory (>=256M) and many tables and want maximum performance with a moderate number of clients, you should use something like this:
shell> safe_mysqld -O key_buffer=64M -O table_cache=256 \
-O sort_buffer=4M -O record_buffer=1M &
If you have only 128M and only a few tables, but you still do a lot of sorting, you can use something like:
shell> safe_mysqld -O key_buffer=16M -O sort_buffer=1M
If you have little memory and lots of connections, use something like this:
shell> safe_mysqld -O key_buffer=512k -O sort_buffer=100k \
-O record_buffer=100k &
shell> safe_mysqld -O key_buffer=512k -O sort_buffer=16k \
-O table_cache=32 -O record_buffer=8k -O net_buffer=1K &
When you have installed MySQL, the `support-files' directory will contain some different my.cnf example files, `my-huge.cnf', `my-large.cnf', `my-medium.cnf', and `my-small.cnf', you can use as a base to optimize your system.
If there are very many connections, ``swapping problems'' may occur unless mysqld has been configured to use very little memory for each connection. mysqld performs better if you have enough memory for all connections, of course.
Note that if you change an option to mysqld, it remains in effect only for that instance of the server.
To see the effects of a parameter change, do something like this:
shell> mysqld -O key_buffer=32m --help
Make sure that the --help option is last; otherwise, the effect of any options listed after it on the command line will not be reflected in the output.
Question 10 :
How MySQL Opens and Closes Tables ?
table_cache, max_connections, and max_tmp_tables affect the maximum number of files the server keeps open. If you increase one or both of these values, you may run up against a limit imposed by your operating system on the per-process number of open file descriptors. However, you can increase the limit on many systems. Consult your OS documentation to find out how to do this, because the method for changing the limit varies widely from system to system.
table_cache is related to max_connections. For example, for 200 concurrent running connections, you should have a table cache of at least 200 * n, where n is the maximum number of tables in a join.
The cache of open tables can grow to a maximum of table_cache (default 64; this can be changed with the -O table_cache=# option to mysqld). A table is never closed, except when the cache is full and another thread tries to open a table or if you use mysqladmin refresh or mysqladmin flush-tables.
When the table cache fills up, the server uses the following procedure to locate a cache entry to use:
Tables that are not currently in use are released, in least-recently-used order.
If the cache is full and no tables can be released, but a new table needs to be opened, the cache is temporarily extended as necessary.
If the cache is in a temporarily-extended state and a table goes from in-use to not-in-use state, the table is closed and released from the cache.
A table is opened for each concurrent access. This means that if you have two threads accessing the same table or access the table twice in the same query (with AS) the table needs to be opened twice. The first open of any table takes two file descriptors; each additional use of the table takes only one file descriptor. The extra descriptor for the first open is used for the index file; this descriptor is shared among all threads.
You can check if your table cache is too small by checking the mysqld variable opened_tables. If this is quite big, even if you haven't done a lot of FLUSH TABLES, you should increase your table cache.