We noticed that if we import directly into the global table it is really, really slow. Importing directly in the single partition is faster.
Do you have a rule or trigger on the main table to redirect to the partitions? You should expect that to take some extra time *per row*. Your best bet is to just import into the proper partition and make sure your application produces batch files that align with your partitions.
Either that or write a program that reads the data, determines the partition, and then inserts directly to it. It might be faster.
I wonder if this is a case of hurry up and wait. A script which could load say 10 records, and assuming that takes much less than one second, run once per second (waiting 1000 - runtime ms) would by now have done about a million records since the question was asked.