The classic tip is “set global query_cache_size=0” .
It often happens to see and hear of replication SQL thread almost always in state “invalidating query cache entries (table)” .
While sometime this works, others don’t; the query cache is “disabled” and you still see the SQL thread in state “invalidating query cache entries (table)”.
id=38551In short, in all My SQL versions before 5.5 , the query cache mutex is acquired even if query_cache_size=0 and query_cache_type=OFF : always!That is, even if the query cache is not enabled, the mutex (slow, by nature) to access the (not existing) query cache is acquired for every binlog event.The only way to not acquire the query cache mutex in My SQL pre-5.5 is to compile My SQL without query cache. For My SQL 5.5 , to completely disable the query cache (thus, not acquiring the query cache mutex) is required to set query_cache_type=OFF at startup (not at runtime).I'm running a 4 servers master-master cluster of My Sql. Replication topology: 1 - 1 UPDATE It seems that server 3 has its SBM at 0, while the other servers are jumping up and down. It looks like the server is busy doing something, and there is a huge delay between when the server gets the statement, and when it executes it. After disabling cache, server 4 is ok but 1&2 are still having this issue. id=60696 If anyone knows how to fix it, i would be glad to hear There is one flaw with mysql's seconds_behind_master value: it only takes into account the position relative to one upstream hop away.(2 servers version 5.1, and 2 version 5.5) While checking the slave status, i see the seconds_behind_master at 0, and half a second after i see it jumps to 2000, and so fourth. Easiest demonstrated with a slightly simpler replication topology: server1 - server3 If server2 falls behind, and is processing some long-running queries, the following will happen, assuming as start point: : Everyone ok : server1 writes two 10-minute queries to the binlog, no replication delay anywhere : server2 starts processing query one. : server2 is done with query 2, replication delay zero again.
When creating a test table in server 4, checking the relay log in server 1 shows the create statement was copied to the relay log in server 1 instantly, but the table is not created. Servers 1 2 & 4 were having "invalidating query cache entries (table)" stuck in their replication thread.
Replication delay for server2 starts growing, replication delay for server3 stays zero : server2 is done with query one, starts processing query two. Server3 will be done with query 3, replication delay jumps back to zero, and then back up to 10 as it processes the next query.
So, the jumpy behaviour is caused by not using a global timestamp for replication delay, but simply the delay behind the last "hop" in the replication chain.
We found this severely annoying and now use My SQL's event scheduler to update a timer table on each master every second, so we can actually see actual delay from the global master (in a non-ring topology) or delay from any peer in a ring.
I belive the issue is not related to long running queries.
First because i dont't see the servers processing anything, and second, because as i mentiomed in update4, the server stops processing and gets stuck on invalidating cache on the old non-Percona servers which caused the replication to halt until the cache was invalidated (Which took a lot of time). id=60696 We solved the issue by moving entirely to Percona My SQL server v5.5 which has the ability to disable Query Cache completely.