Entwickler-Ecke

IO, XML und Registry - Komplexes CSV parsen


Holonet - Di 31.01.12 12:02
Titel: Komplexes CSV parsen
Hallo Leute

Ich versuche gerade ein CSV-File zu parsen. Die Daten sollen dann auf den SQLServer in ne Datenbank. Allerdings hat das CSV-File so seine Tücken. Zum Beispiel enthält es Kommas, die jedoch nicht einen Spaltenwechsel bedeuten und es besitzt mehrere Metadatenzeilen, also mehrere Tabellen.

Hier einen Auszug. Es sind Messdaten einer Messmaschine. Ich musste einige Informationen entfernen, da sie nicht ins Internet gelangen sollten. Ich habe jedoch die Struktur unangetastet gelassen:

Quelltext
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
34:
35:
36:
37:
38:
39:
40:
41:
42:
43:
44:
45:
46:
47:
48:
"Technician Name:","Test Name:","SPALTE:","Descriptions:","Notes:","Test Type","Speed","Averaging","Distance Eng Units","Force Eng Units","Encoder Enabled"
"USERNAME, USERROLE","TESTDESCRIPTION","","TESTING","","TESTTYPE","N/A","100","MM","N","SPALTE",""
" WERTESPALTE","WERTESPALTE","WERTESPALTE","WERTESPALTE","WERTESPALTE","WERTESPALTE","WERTESPALTE","WERTESPALTE","WERTESPALTE","WERTESPALTE"
"0.079","0","0.0462","0","0","0.0000","0.0000","0.0000","0.0000","0.01563"
.187569273743017,"0",4.36291960165169E-02,0,0,"0.5636",.5,"0.0000","0.0000","0.18675"
.38138598,"0",.04171512,0,0,"1.2637",1,"0.0000","0.0000","0.27728"
.44818825,"0",.039,0,0,"1.7031",1.5,"0.0000","0.0000","0.34262"
.345362,"0",.0399108,0,0,"2.1469",2,"0.0000","0.0000","0.40606"
.26328688,"0",.043376205,0,0,"2.7465",2.5,"0.0000","0.0000","0.48366"
.241412128,"0",.039141912,0,0,"3.2001",3,"0.0000","0.0000","0.54661"
.2221,"0",.03580851,0,0,"3.8150",3.5,"0.0000","0.0000","0.62421"
.234712682857143,"0",3.84180857142857E-02,0,0,"4.2766",4,"0.0000","0.0000","0.68695"
.22758673,"0",.046107075,0,0,"4.7452",4.5,"0.0000","0.0000","0.74968"
.232897564444444,"0",.03728232,0,0,"5.2139",5,"0.0000","0.0000","0.81226"
.24243526,"0",.0389631,0,0,"5.6899",5.5,"0.0000","0.0000","0.87487"
.245615592727273,"0",4.34978072727273E-02,0,0,"6.1659",6,"0.0000","0.0000","0.93746"
.247525976666667,"0",.040821165,0,0,"6.8033",6.5,"0.0000","0.0000","1.03052"
.251911486153846,"0",4.26185261538462E-02,0,0,"7.2867",7,"0.0000","0.0000","1.09343"
.25942647,"0",.03636156,0,0,"7.7698",7.5,"0.0000","0.0000","1.15598"
.265972182666667,"0",5.04203826666667E-02,0,0,"8.2680",8,"0.0000","0.0000","1.21849"
.2873657825,"0",.0551845775,0,0,"8.7661",8.5,"0.0000","0.0000","1.29639"
.282033270588235,"0",5.69999529411765E-02,0,0,"9.2640",9,"0.0000","0.0000","1.35895"
.28726064,"0",.05687786,0,0,"9.7621",9.5,"0.0000","0.0000","1.42169"
.296783538947368,"0",5.49933347368421E-02,0,0,"10.2674",10,"0.0000","0.0000","1.48421"
.295752872,"0",.05321232,0,0,"10.7655",10.5,"0.0000","0.0000","1.56216"
"Results","SPALTE","SPALTE","SPALTE","SPALTE","SPALTE","SPALTE","SPALTE","SPALTE","SPALTE","SPALTE","SPALTE","SPALTE","SPALTE","SPALTE","SPALTE","
"
"Results",3.2067,28,0,0,1.39297846091045,0,0,0,0,0,1.67728609090909,28,0,0,.63537496711157,"
"
"SPALTE","SPALTE","SPALTE","SPALTE","SPALTE","
"
"36","0","0","0","0","
"
"SPALTE","SPALTE","SPALTE","SPALTE","SPALTE","
"
"450","100","100","100","100","
"
"SPALTE","SPALTE","SPALTE","SPALTE","SPALTE","SPALTE","
"
"10000","10000","10000","10000","10000","10000","
"
"SPALTE","SPALTE","SPALTE","SPALTE","SPALTE","
"
0,0,0,0,0,"","
"
"Profile Name","Date","Time","
"
"PROFILENAME","01-01-1900","00:00:00",


Habt ihr mir einen Tipp wie ich dieses File parsen kann? Die Anzahl der Testergebnisse(die Zeilen mit den Zahlen) sind variabel.
Wie gesagt, die Sachen, die mir Schwierigkeiten bereiten ist das Komma, das keine neue Spalte anzeigt(Zeile 2), die verschiedenen Tabellen(Metadatenzeilen jeweils auf Zeile 1, 3, 26, 30, 34, 38, 42 und 46) sowie die Anführungszeichen, die bei den Zahlen nur zum Teil gesetzt sind und aber auch den Zeilenumbruch der sich in Anführungszeichen befindet.


daeve - Di 31.01.12 19:35

wenn es in dem CSV keine identifizierenden "Wörter" oder so was ähnliches gibt, wüsste ich auch nicht wie..


Th69 - Di 31.01.12 21:07

Hallo Holonet,

meinst du das Komma in "USERNAME, USERROLE"? Dies kann eigentlich jeder CVS-Reader vernünftig behandeln, da Anführungsstriche Vorrang haben.
Probiere mal A Fast CSV Reader [http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader].

Und die verschiedenen Header lassen sich entweder nur über eindeutige Begriffe (Schlüsselwörter) lösen oder aber evtl. in deinem Fall über die auffälligen

Quelltext
1:
2:
...,"
"

(also einen Zeilenwechsel innerhalb der Anführungsstriche als letzte Spalte).