numactl --interleave=all ./testing_zgetrf -N 100 -N 1000 --range 10:90:10 --range 100:900:100 --range 1000:9000:1000 --range 10000:20000:2000
MAGMA 1.6.1  compiled for CUDA capability >= 3.5
CUDA runtime 7000, driver 7000. OpenMP threads 16. MKL 11.2.3, MKL threads 16. 
ndevices 3
device 0: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
device 1: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
device 2: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
Usage: ./testing_zgetrf [options] [-h|--help]

ngpu 1
    M     N   CPU GFlop/s (sec)   GPU GFlop/s (sec)   |PA-LU|/(N*|A|)
=========================================================================
  100   100     ---   (  ---  )      1.30 (   0.00)     ---   
 1000  1000     ---   (  ---  )    137.14 (   0.02)     ---   
   10    10     ---   (  ---  )      0.26 (   0.00)     ---   
   20    20     ---   (  ---  )      0.75 (   0.00)     ---   
   30    30     ---   (  ---  )      1.27 (   0.00)     ---   
   40    40     ---   (  ---  )      3.09 (   0.00)     ---   
   50    50     ---   (  ---  )      2.22 (   0.00)     ---   
   60    60     ---   (  ---  )      3.92 (   0.00)     ---   
   70    70     ---   (  ---  )      1.11 (   0.00)     ---   
   80    80     ---   (  ---  )      1.59 (   0.00)     ---   
   90    90     ---   (  ---  )      2.07 (   0.00)     ---   
  100   100     ---   (  ---  )      2.71 (   0.00)     ---   
  200   200     ---   (  ---  )     10.69 (   0.00)     ---   
  300   300     ---   (  ---  )     23.03 (   0.00)     ---   
  400   400     ---   (  ---  )     36.86 (   0.00)     ---   
  500   500     ---   (  ---  )     52.48 (   0.01)     ---   
  600   600     ---   (  ---  )     68.63 (   0.01)     ---   
  700   700     ---   (  ---  )     87.08 (   0.01)     ---   
  800   800     ---   (  ---  )    105.82 (   0.01)     ---   
  900   900     ---   (  ---  )    123.14 (   0.02)     ---   
 1000  1000     ---   (  ---  )    142.71 (   0.02)     ---   
 2000  2000     ---   (  ---  )    339.72 (   0.06)     ---   
 3000  3000     ---   (  ---  )    518.56 (   0.14)     ---   
 4000  4000     ---   (  ---  )    627.35 (   0.27)     ---   
 5000  5000     ---   (  ---  )    684.01 (   0.49)     ---   
 6000  6000     ---   (  ---  )    772.08 (   0.75)     ---   
 7000  7000     ---   (  ---  )    829.56 (   1.10)     ---   
 8000  8000     ---   (  ---  )    882.56 (   1.55)     ---   
 9000  9000     ---   (  ---  )    906.12 (   2.15)     ---   
10000 10000     ---   (  ---  )    944.31 (   2.82)     ---   
12000 12000     ---   (  ---  )    994.15 (   4.63)     ---   
14000 14000     ---   (  ---  )   1027.60 (   7.12)     ---   
16000 16000     ---   (  ---  )   1053.91 (  10.36)     ---   
18000 18000     ---   (  ---  )   1063.89 (  14.62)     ---   
20000 20000     ---   (  ---  )   1071.63 (  19.91)     ---   

numactl --interleave=all ./testing_zgetrf_gpu -N 100 -N 1000 --range 10:90:10 --range 100:900:100 --range 1000:9000:1000 --range 10000:20000:2000
MAGMA 1.6.1  compiled for CUDA capability >= 3.5
CUDA runtime 7000, driver 7000. OpenMP threads 16. MKL 11.2.3, MKL threads 16. 
ndevices 3
device 0: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
device 1: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
device 2: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
Usage: ./testing_zgetrf_gpu [options] [-h|--help]

    M     N   CPU GFlop/s (sec)   GPU GFlop/s (sec)   |PA-LU|/(N*|A|)
=========================================================================
  100   100     ---   (  ---  )      1.01 (   0.00)     ---  
 1000  1000     ---   (  ---  )    154.37 (   0.02)     ---  
   10    10     ---   (  ---  )      0.06 (   0.00)     ---  
   20    20     ---   (  ---  )      0.36 (   0.00)     ---  
   30    30     ---   (  ---  )      0.69 (   0.00)     ---  
   40    40     ---   (  ---  )      1.38 (   0.00)     ---  
   50    50     ---   (  ---  )      1.19 (   0.00)     ---  
   60    60     ---   (  ---  )      2.64 (   0.00)     ---  
   70    70     ---   (  ---  )      0.67 (   0.00)     ---  
   80    80     ---   (  ---  )      1.02 (   0.00)     ---  
   90    90     ---   (  ---  )      1.35 (   0.00)     ---  
  100   100     ---   (  ---  )      1.79 (   0.00)     ---  
  200   200     ---   (  ---  )      7.88 (   0.00)     ---  
  300   300     ---   (  ---  )     18.82 (   0.00)     ---  
  400   400     ---   (  ---  )     32.03 (   0.01)     ---  
  500   500     ---   (  ---  )     50.66 (   0.01)     ---  
  600   600     ---   (  ---  )     68.52 (   0.01)     ---  
  700   700     ---   (  ---  )     89.21 (   0.01)     ---  
  800   800     ---   (  ---  )    110.91 (   0.01)     ---  
  900   900     ---   (  ---  )    133.32 (   0.01)     ---  
 1000  1000     ---   (  ---  )    161.19 (   0.02)     ---  
 2000  2000     ---   (  ---  )    405.92 (   0.05)     ---  
 3000  3000     ---   (  ---  )    630.44 (   0.11)     ---  
 4000  4000     ---   (  ---  )    753.23 (   0.23)     ---  
 5000  5000     ---   (  ---  )    725.11 (   0.46)     ---  
 6000  6000     ---   (  ---  )    884.64 (   0.65)     ---  
 7000  7000     ---   (  ---  )    945.01 (   0.97)     ---  
 8000  8000     ---   (  ---  )    996.60 (   1.37)     ---  
 9000  9000     ---   (  ---  )    986.27 (   1.97)     ---  
10000 10000     ---   (  ---  )   1021.82 (   2.61)     ---  
12000 12000     ---   (  ---  )   1076.85 (   4.28)     ---  
14000 14000     ---   (  ---  )   1110.41 (   6.59)     ---  
16000 16000     ---   (  ---  )   1120.76 (   9.75)     ---  
18000 18000     ---   (  ---  )   1133.20 (  13.72)     ---  
20000 20000     ---   (  ---  )   1120.56 (  19.04)     ---  
