String (Operationen und Management)

WRITEME

Die Strings werden nicht wie in C über 0-Terminatoren abgeschlossen sondern enthalten alle ihre Exakte Länge.
Data + 0 ist zeigt auf das erste Zeichen im String.
Data - 4 ist zeigt auf das die Länge des Strings, die Länge steht dabei vor dem Zeiger auf den String selbst.
Um Kompatibilität mit C Umgebung herzustellen wird am Ende der Daten ein Null-Terminator eingefügt der aber für die eigenen Routinen keinen Einfluß hat.
Alle Funktionen werden für dynamische Strings erstellt, also Strings die ihre Längen ändern können.

LängeData StringNull-Terminator
-4 = Len0......X =0

Die Strings sind 1 Byte pro Zeichen lang (also nicht Unicode wie zB. in VB, Unicode wird später folgen.)

String Management

String Type

Type String
    Length   as Long   ' Länge des Strings
    Data()   as Byte   ' Zeiger auf StartAddresse des Strings im Speicher
    NullTerm as Byte   ' NullTerminator für Kompatibilität
End Type

String Operationen

String Case

;=== LCase
;=== Alle Zeichen im String auf Klein-Buchstaben ändern
MOV      ECX, [Data1 - 4]    ECX = Length
?+          Prüfungen, wenn Length < 1 dann Ende
MOV      EBX, [Data1]        EAX = Start-Adresse
 
?+
;=== UCase
;=== Alle Zeichen im String auf Groß-Buchstaben ändern
?+

String Add Split

;=== StrAdd
;=== Fügt zwei Strings zusammen
?+
;=== StrSplit
;=== Teilt ein String in zwei Strings auf
?+

String Cut

;=== Right
;=== Behält Rechts eine Anzahl Zeichen und schneidet den Rest weg
?+
;=== RightCut
;=== Schneidet Rechts eine Anzahl Zeichen weg
?+
;=== Left
;=== Behält Links eine Anzahl Zeichen und schneidet den Rest weg
?+
;=== LeftCut
;=== Schneidet Rechts eine Anzahl Zeichen weg
?+

String Compare

;=== StrCmp
;=== Vergleicht zwei Strings
;=== StrCmp(Str1, Str2) --> Str1<Str2 = -1 ¦ Str1=Str2 = 0 ¦ Str1>Str2 = 1
?+

String Diverses

;=== StrLen
;=== Gibt die Länge des String zurück (Anzahl Zeichen)
?+

String Suche

;=== InStr
;=== Sucht nach einem String innerhalb eines anderen
?+

String Konvertierung

WRITEME

?+
 
 
 
 
 
    CLD              ; set direction flag forward
    MOV ESI, source  ; put address INTO the source index
    MOV EDI, dest    ; put address INTO the destination index
 
    MOV ECX, ln      ; put the number of bytes to copy in ECX
 
 
  ; --------------------------------------------------
  ; repeat copying bytes from ESI to EDI until ECX = 0
  ; --------------------------------------------------
    rep MOVSB

In this example, MOVSB copies each byte from ESI to EDI AND decrements ECX. The exit condition for the REP prefix is when ECX is decremented to zero.

When you copy a zero terminated string, you can write an algorithm that copies until it finds an ascii zero.

    CLD              ; clear direction flag to read forward
    MOV ESI, source  ; put address INTO the source index
    MOV EDI, dest    ; put address INTO the destination index
 
  label:
    LODSB            ; load byte from source INTO AL AND INC ESI
    STOSB            ; write AL to dest AND INC EDI
    CMP AL, 0        ; see if its an ascii zero
    JNE label        ; read the next byte if its NOT

A trick that will make this algorithm run faster is to directly move each byte from the source address to AL AND then from AL the the destination address. On Pentium AND later processors, it is faster to use MOV/INC than LODSB or STOSB. This is done by "dereferencing" both ESI AND EDI so that they function as memory addresses.

It should be noted that the direction flag CLD does NOT effect this method AND you can use any 32 bit registers when you are NOT using the string instructions.

    MOV ESI, source  ; put address INTO the source index
    MOV EDI, dest    ; put address INTO the destination index
 
  label:
    MOV AL, [ESI]    ; copy byte at address in ESI to AL
    INC ESI          ; increment address in ESI
    MOV [EDI], AL    ; copy byte in AL to address in EDI
 
    INC EDI          ; increment address in EDI
    CMP AL, 0        ; see if its an ascii zero
    JNE label        ; jump back AND read next byte if NOT

This code is longer but faster on later processors with dual pipelines due to what is called pairing, when mnemonics can go through the two pipelines in pairs, the code runs nominally twice as fast. The choice of mnemonics in this simple algorithm are small instructions of the type that pair properly so it runs faster than the shorter algorithm using the older string instructions that do NOT pair.

Anonymous labels
There is a notation in MASM to address the need for always having to create unique label names, especially when a lot of code uses labels in close sequence where their position makes sense without needing to be named. It is called anonymous labels that have the form,

@@:

AND they are addressed respectively with @F AND @B.

JMP @F means to jump FORWARD to the next occurrence of @@:
JMP @B means to jump BACK to the last occurrence of @@:

Expanding the use of the algorithm
The previous algorithm is a good base for string manipulation that modifies bytes in the string, filters, upper AND lower case conversions AND other types of byte replacements.

If you want to filter a string so that it only has numbers in it, you test the range of ascii numbers in the string AND disallow any others.

ascii "0" is 48
ascii "9" is 57

NumOnly proc source :DWORD, dest:DWORD

    MOV ECX, source     ; put address INTO ECX
 
    MOV EDX, dest       ; put address INTO EDX
 
  @@:
    MOV AL, [ECX]       ; copy byte at address in ECX to AL
    INC ECX             ; increment address in ECX
  ; -----------------------------------------------------
  ; perform byte modification, replacement  or omissions
  ; -----------------------------------------------------
    CMP AL, 0           ; test for zero first
    JE  @F              ; exit LOOP on ascii zero
 
    CMP AL, "0"         ; string literal
    JB @B               ; if below 48 "0", jump back
    CMP AL, "9"         ; string literal
    JA @B               ; if above 57 "9", jump back
  ; ------------------------------------------------------
    MOV [EDX], AL       ; copy byte in AL to address in EDX
    INC EDX             ; increment address in EDX
    JMP @B
 
  @@:
    MOV [EDX], AL       ; copy ascii zero to address in EDI
 
    RET
 
NumOnly endp

Runtime/String.txt · Zuletzt geändert: 2009/05/08 19:43 (Externe Bearbeitung)