If the size of the data is identical, you could calculate the position and set it before you write to the output stream.
If this is not the case, I would give the threads an ascending number to know which result should be written next. Then I would simply give the results combined with the numbers of the results to an output class, which writes the next result to be written as it gets it and stores all other results until the missing results have arrived.
You did not write which version of Delphi you are using. In newer versions I would use a TDictionary to store the results until all parts between have been written.
So in pseudocode this would be:
												| 1:2:
 3:
 4:
 5:
 6:
 7:
 8:
 9:
 10:
 11:
 12:
 13:
 14:
 15:
 16:
 17:
 18:
 19:
 20:
 21:
 22:
 23:
 24:
 25:
 26:
 27:
 28:
 29:
 30:
 31:
 32:
 
 | Results: TObjectDictionary<Integer, TStream>;
 procedure ExecuteThread(Number, Data);
 begin
 WorkWithData;
 HandleResult(Number, WorkResult);
 end;
 
 procedure HandleResult(Number, WorkResult);
 var
 CurrentResult;
 begin
 TMonitor.Enter(Results);
 try
 if NextNumber = Number then
 begin
 WriteResult(WorkResult);
 Inc(NextNumber);
 while Results.TryGetValue(NextNumber, CurrentResult) do
 begin
 WriteResult(CurrentResult);
 Results.Remove(NextNumber);
 Inc(NextNumber);
 end;
 end
 else
 Results.Add(Number, WorkResult);
 finally
 TMonitor.Exit(Results);
 end;
 end;
 | 
		
	  
If this wasn't detailed enough, I can of course explain more detailed. 
 
TMonitor is more lightweight than a full critical section. If you have a large number of chunks and threads, I would recommend to separate storing the results and writing them to the output. So you only lock Results, write the results to it and unlock again. Another thread could check periodically for results (where of course it has to enter the lock too) and write them to the output. This way the periods, in which one thread is inside the lock, are not this long as the actual writing to the output takes place outside.