Also , the relactions between the best block size for matrix transpose and the size and associativity of the processor ' s cache is formulized . for parallel optimization , several programming models available on a numa system , such as lightweight processes ( sproc ) , posix threads , openmp and mpi , are compared , and their speedup and coding complexity are analyzed 對于sar成像處理的并行優化,本文對比了在numa架構上可用的幾種并行編程模型:輕量級進程、 posix線程、 openmp和mpi ,針對numa架構和sar成像處理的特點從加速比、編程復雜度等多個方面進行了討論。