Thread

  1. Inefficiencies in COPY command

    Wayne Piekarski <wayne@senet.com.au> — 1999-08-07T07:59:59Z

    Hi,
    
    With my system I get it to do a full pg_dump once per hour so that if
    anything goes wrong I can go back to the previous hours information if
    needed. One thing I did notice is that the whole process of the COPY
    command is highly CPU bound, the postgres will use up all available CPU
    and the disks are relatively idle.
    
    I ran some profiling on it with gprof, and I was shocked to find millions
    of calls to functions like memcpy, pq_putbytes, CopySendChar. I had a look
    at the code and most of these were just wrappers to various other
    functions, and did no work of their own. Having all these functions
    running a couple of million times and making heaps of their own calls
    meant that it was spending most of its time pushing and pulling things off
    the stack instead of doing real work :)
    
    So I looked into the problem, and CopyTo was getting the data, 
    and calling CopyAttributeOut to convert it to a string and send it to the
    client. The CopyTo function was getting called rarely so this was a good
    place to start.
    
    Turns out that the CopyAttributeOut function gets the supplied string, and
    escapes it out, and as it does it, it calls CopySendChar for each
    character to put it in the buffer for sending. This function does a lot of
    other function calls, including a memcpy or two.
    
    So I made some changes to CopyAttributeOut so that it escapes the string
    initially into a temporary buffer (allocated onto the stack) and then
    feeds the whole string to the CopySendData which is a lot more efficient
    because it can blast the whole string in one go, saving about 1/3 to 1/4
    the number of memcpy and so on.
    
    This modification caused a copy (with profiling and everything else) to go
    from 1m14sec to 45 sec, which I was very happy with considering how easy
    it was to fix :)
    
    
    I kept hacking at the code though, and found another slow point in the
    code which was int4out. It was a one line front end to ltoa, then this
    called sprintf with just "%d", and then numerous calls to other internal
    libc functions. This was very expensive and wasted lots of CPU time.
    
    So, I rewrote int4out (throwing away all the excess code) to do this
    conversion directly without calling any libc code which is more generic
    and slower, and this too gained a speed improvement. Combined with my
    previous patch, with profiling and all that turned off, and -O3, I managed
    to get another COPY (on a different table - sorry) down from 30 seconds to
    15 seconds, so I've managed to double its speed without any tradeoffs
    whatsoever!
    
    I have included a patch below for the changes I made, although I would
    only use this to look at, it is shocking code which was a real hack job
    using #defines and a few other tricks to make it work so I could test this
    idea out quickly. The int4out example is especially bad, and has lots of
    pointers and stuff flying around and may have some bugs.
    
    But, what I think could be interesting to improve our performance is to
    make simple mods to CopyAttributeOut (along the lines of my changes) - and
    to also make changes to remove all use of sprintf when numbers and floats
    are being converted into strings. I would imagine this would generally 
    speed up SELECT and COPY commands, so the performance increase can be
    gained in other places too.
    
    If someone wants, I can do cleaner versions of this patch for inclusion in
    the code, or one of the developers may want to do this instead in case
    theres something I'm missing. But I wanted to share this with everyone to
    help increase Postgres' speed as it is quite significant!
    
    [Patch included below for reference - note that this patch probably has
    bugs and use it at your own risk! I am not using this patch except for
    testing]
    
    Regards,
    Wayne
    
    ------------------------------------------------------------------------------
    Wayne Piekarski                               Tel:     (08) 8221 5221
    Research & Development Manager                Fax:     (08) 8221 5220
    SE Network Access Pty Ltd                     Mob:     0407 395 889
    222 Grote Street                              Email:   wayne@senet.com.au
    Adelaide SA 5000                              WWW:     http://www.senet.com.au
    
    
    
    diff -u -r NORMAL/postgresql-6.5/src/backend/commands/copy.c PROFILING/postgresql-6.5/src/backend/commands/copy.c
    --- NORMAL/postgresql-6.5/src/backend/commands/copy.c	Mon Jun 14 10:33:40 1999
    +++ PROFILING/postgresql-6.5/src/backend/commands/copy.c	Sat Aug  7 16:02:50 1999
    @@ -1278,12 +1278,23 @@
     #endif
     }
     
    +
    +
    +#warning Wayne put hacks here
    +
    +#define XCopySendChar(ch, fp) __buffer[__buffer_ofs++] = ch
    +
    +
    +
     static void
     CopyAttributeOut(FILE *fp, char *server_string, char *delim, int is_array)
     {
     	char	   *string;
     	char		c;
     
    +	char __buffer [(strlen (server_string) * 2) + 4]; /* Use +4 for safety */
    +	int __buffer_ofs = 0;
    +
     #ifdef MULTIBYTE
     	int			mblen;
     	int			encoding;
    @@ -1307,31 +1318,34 @@
     	{
     		if (c == delim[0] || c == '\n' ||
     			(c == '\\' && !is_array))
    -			CopySendChar('\\', fp);
    +			XCopySendChar('\\', fp);
     		else if (c == '\\' && is_array)
     		{
     			if (*(string + 1) == '\\')
     			{
     				/* translate \\ to \\\\ */
    -				CopySendChar('\\', fp);
    -				CopySendChar('\\', fp);
    -				CopySendChar('\\', fp);
    +				XCopySendChar('\\', fp);
    +				XCopySendChar('\\', fp);
    +				XCopySendChar('\\', fp);
     				string++;
     			}
     			else if (*(string + 1) == '"')
     			{
     				/* translate \" to \\\" */
    -				CopySendChar('\\', fp);
    -				CopySendChar('\\', fp);
    +				XCopySendChar('\\', fp);
    +				XCopySendChar('\\', fp);
     			}
     		}
     #ifdef MULTIBYTE
     		for (i = 0; i < mblen; i++)
    -			CopySendChar(*(string + i), fp);
    +			XCopySendChar(*(string + i), fp);
     #else
    -		CopySendChar(*string, fp);
    +		XCopySendChar(*string, fp);
     #endif
     	}
    +	
    +	/* Now send the whole output string in one shot */
    +	CopySendData (__buffer, __buffer_ofs, fp);
     }
     
     /*
    diff -u -r NORMAL/postgresql-6.5/src/backend/utils/adt/int.c PROFILING/postgresql-6.5/src/backend/utils/adt/int.c
    --- NORMAL/postgresql-6.5/src/backend/utils/adt/int.c	Sun Feb 14 09:49:20 1999
    +++ PROFILING/postgresql-6.5/src/backend/utils/adt/int.c	Sat Aug  7 15:53:36 1999
    @@ -210,9 +210,60 @@
     int4out(int32 l)
     {
     	char	   *result;
    +	char       *sptr, *tptr;
    +	int32 value = l;
    +	char       temp[12]; /* For storing string in reverse */
    +
    +#warning Wayne put hacks here
     
     	result = (char *) palloc(12);		/* assumes sign, 10 digits, '\0' */
    -	ltoa(l, result);
    +
    +
    +	/* ltoa(l, result);
    +	   maps to sprintf(result, "%d", l) later on in the code */
    +
    +	/* Remove minus sign firstly */
    +	if (l < 0)
    +	  value = -l;
    +	
    +	/* Point to our output string at the end */
    +	tptr = temp;
    +	
    +	/* Now loop until we have no value left */
    +	while (value > 0)
    +	  {
    +	    int current = value % 10;
    +	    value = value / 10;
    +
    +	    *tptr = current + '0';
    +	    tptr++;
    +	  }
    +	if (tptr == temp)
    +	  {
    +	    *tptr = '0';
    +	    tptr++;
    +	  }
    +	*tptr = '\0';
    +	tptr--;
    +
    +	/* Now, we need to prepare the result which is the reversal */
    +	sptr = result;
    +	if (l < 0)
    +	  {
    +	    *sptr = '-';
    +	    sptr++;
    +	  }
    +	
    +	while (tptr >= temp)
    +	  {
    +	    *sptr = *tptr;
    +	    sptr++;
    +	    tptr--;
    +	  }
    +
    +	/* Ok, we have copied everything - terminate now */
    +	*sptr = '\0';
    +	
     	return result;
     }