2015年9月13日 星期日

[C] 切割字串函數 strtok


今天寫了 strtok 的範例:『如何分離網路 mac address』程式碼如下,大家一定會有疑問 strtok 第一次呼叫,第一參數輸入愈分離的字串,在 while 迴圈,則是輸入 NULL 呢?底下就來解析 strtok.c 的程式碼。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
/*
*
* Author      : appleboy
* Date        : 2010.04.01
* Filename    : strtok.c
*
*/
 
#include "string.h"
#include "stdlib.h"
#include "stdio.h"
 
int main()
{
  char str[]="00:22:33:4B:55:5A";
  char *delim = ":";
  char * pch;
  printf ("Splitting string \"%s\" into tokens:\n",str);
  pch = strtok(str,delim);
  while (pch != NULL)
  {
    printf ("%s\n",pch);
    pch = strtok (NULL, delim);
  }     
  system("pause");
  return 0;
}
執行結果如下圖: strtok strtok.c 在 FreeBSD 7.1 Release 裡面路徑是 /usr/src/lib/libc/string/strtok.c,可以看到底下函式 __strtok_r
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
__strtok_r(char *s, const char *delim, char **last)
{
    char *spanp, *tok;
    int c, sc;
 
    if (s == NULL && (s = *last) == NULL)
        return (NULL);
 
    /*
     * Skip (span) leading delimiters (s += strspn(s, delim), sort of).
     */
cont:
    c = *s++;
    for (spanp = (char *)delim; (sc = *spanp++) != 0;) {
        if (c == sc)
            goto cont;
    }
 
    if (c == 0) {       /* no non-delimiter characters */
        *last = NULL;
        return (NULL);
    }
    tok = s - 1;
 
    /*
     * Scan token (scan for delimiters: s += strcspn(s, delim), sort of).
     * Note that delim must have one NUL; we stop if we see that, too.
     */
    for (;;) {
        c = *s++;
        spanp = (char *)delim;
        do {
            if ((sc = *spanp++) == c) {
                if (c == 0)
                    s = NULL;
                else
                    s[-1] = '\0';
                *last = s;
                return (tok);
            }
        } while (sc != 0);
    }
    /* NOTREACHED */
}
大家可以看到,在第一次執行 strtok 時候,會針對傳入s字串每一個字進行比對,c = *s++; 意思就是 c 先設定成 *s,這行執行結束之後,會將 *s 指標加1,也就是字母 T -> h 的意思,這地方必須注意,如果第一個字母符合 delim 分隔符號,就會執行 goto cont;,如果不是,則會將 tok 指標指向 s 字串第一個位址,再來跑 for 迴圈找出下一個分隔字串,將其字串設定成 \0 中斷點,回傳 tok 指標,並且將s字串初始值指向分隔字串的下一個位址。 接下來程式只要繼續執行 strtok(NULL, delim),程式就會依照上次所執行的 s 字串繼續比對下去,等到 *last 被指向 NULL 的時候就不會在執行 strtok 了,我相信這非常好懂,微軟 Visual Studio 有不同的寫法: https://research.microsoft.com/en-us/um/redmond/projects/invisible/src/crt/strtok.c.htm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
/* Copyright (c) Microsoft Corporation. All rights reserved. */
 
#include <string.h>
 
/* ISO/IEC 9899 7.11.5.8 strtok. DEPRECATED.
 * Split string into tokens, and return one at a time while retaining state
 * internally.
 *
 * WARNING: Only one set of state is held and this means that the
 * WARNING: function is not thread-safe nor safe for multiple uses within
 * WARNING: one thread.
 *
 * NOTE: No library may call this function.
 */
 
char * __cdecl strtok(char *s1, const char *delimit)
{
    static char *lastToken = NULL; /* UNSAFE SHARED STATE! */
    char *tmp;
 
    /* Skip leading delimiters if new string. */
    if ( s1 == NULL ) {
        s1 = lastToken;
        if (s1 == NULL)         /* End of story? */
            return NULL;
    } else {
        s1 += strspn(s1, delimit);
    }
 
    /* Find end of segment */
    tmp = strpbrk(s1, delimit);
    if (tmp) {
        /* Found another delimiter, split string and save state. */
        *tmp = '\0';
        lastToken = tmp + 1;
    } else {
        /* Last segment, remember that. */
        lastToken = NULL;
    }
 
    return s1;
}
微軟用了 strpbrk 來取代 for 迴圈的字串比對,但是整個流程是差不多的,大家可以參考看看,果然看 Code 長知識。


文章出處 : http://blog.wu-boy.com/2010/04/cc-%E5%88%87%E5%89%B2%E5%AD%97%E4%B8%B2%E5%87%BD%E6%95%B8%EF%BC%9Astrtok-network-mac-address-%E5%88%86%E5%89%B2/

2015年9月11日 星期五

[C] Pointers on C - note 2

/ **************************************************************************** /

1.

P. 243


The NUL terminates the string but is not considered a part of it, so the length of a string does not include the NUL. 


The header file string.h contains the prototypes and declarations needed to use the string functions. Although its use is not required, it is a good idea to include this header file because with the prototypes it contains the compiler can do a better job error checking your program. 


/ **************************************************************************** /

2.

p.246


char  message[] = "meg";
strcpy( mesaage, "s" );
   for(int i=0; i<4; i++)
    printf("%c", message[i]);
 
// real :  ['s'] [0] ['g'] [0] 


char  message[] = "meg";
 strcpy( mesaage, "send" ); 
    for(int i=0; i<4; i++)
    printf("%c", message[i]);
 
// Abort trap: 6  
//  Abort trap: 6 usually indicates a failed assertion.
 
 
/ ****************************************************************************/
 
3.
 
P. 247
 
  
char *strcpy( char *dst, char const *src ); 


To append (concatenate) one string to the end of another, strcat is used prototype is: 
 char *strcat( char *dst, char const *src );




strcmp returns a value less than zero if s1 is less than s2; a value greater than zero if s1 is greater than s2; and zero if the two strings are equal. 


int strcmp( char const *s1, char const *s2 ); 


 
/ **************************************************************************** /
  

4.

P.249


char  *strncpy( char *dst, char const *src, size_t len );
char  *strncat( char *dst, char const *src, size_t len );
int   strncmp( char const *s1, char const *s2, size_t len );
  
  
Like scrcpy, strncpy copies characters from the source string to the destination array. However, it always writes exactly len characters to dst. If strlen( src ) is less than len, then dst is padded to a length of len with additional NUL characters. If strlen( src ) is greater than or equal to len, then only len characters will be written to dst, and the result will not be NUL􏰊terminated! 



char  buffer[BSIZE];
      ...
      strncpy( buffer, name, BSIZE );
      buffer[BSIZE – 1] = '\0';
If the contents of name fit into buffer, the assignment has no effect. If name is too long, though, the assignment ensures that the string in buffer is properly terminated. Subsequent calls to strlen or other unrestricted string functions on this array will work properly.

 

5.

P.250


The easiest way to locate a specific character in a string is with the strchr and strrchr functions, whose prototypes are: 

            char  *strchr( char const *str, int ch );
            char  *strrchr( char const *str, int char );
 
Note that the second argument is an integer. It contains a character value, however. strchr searches the string str to find the first occurrence of the character ch. Then a pointer to this position is returned. If the character does not appear at all in the string, a NULL pointer is returned. strrchr works exactly the same except that it returns a pointer to the last (rightmost) occurrence of the character. 

Here is an example;
            char  string[20] = "Hello there, honey.";
            char  *ans;
ans = strchr( string, 'h' );
ans
will get the value string + 7 because the first 'h' appears in this position. Note
that case is significant.


6.

P.251


char *strpbrk( char const *str, char const *group ); 

This function returns a pointer to the first character in str that matches any of the characters in group, or NULL if none matched.
In the following code fragment,
            char  string[20] = "Hello there, honey.";
            char  *ans;
ans = strpbrk( string, "aeiou" );
ans
will get the value string + 1 because this position is the first that contains any of
the characters in the second argument. Once again, case is significant. 


7.

P.252


To locate a substring, strstr is used. Its prototype is:
            char  *strstr( char const *s1, char const *s2 );


This function finds the first place in s1 where the entire string s2 begins and returns a pointer to this location. If s2 does not appear in its entirety anywhere in s1, then NULL is returned. If the second argument is an empty string, then s1 is returned.


The standard library includes neither a strrstr nor a strrpbrk function, but they are easy to implement if you need them.

 This is an example of strrstr program:





#include <string.h>
char  *
my_strrstr( char const *s1, char const *s2 )
{
      register char     *last;
      register char     *current;
      /*
      ** Initialize pointer for the last match we've found.
      */
      last = NULL;
      /*
      ** Search only if the second string is not empty.  If s2 is
      ** empty, return NULL.
      */
      if( *s2 != '\0' ){
            /*
            ** Find the first place where s2 appears in s1.
            */
            current = strstr( s1, s2 );
            /*
            ** Each time we find the string, save the pointer to
            ** where it begins.  Then look after the string for
            ** another occurrence.
            */
            while( current != NULL ){
                  last = current;
                  current = strstr( last + 1, s2 );
} }
      /*
      ** Return pointer to the last occurrence we found.
      */
      return last;


}

  
8.

P.253

size_t strspn( char const *str, char const *group ); // 計算參數1需要經過多少個字元才會出現不屬於參數2中的字元。
size_t strcspn( char const *str, char const *group ); // 計算參數1需要經過多少個字元才會出現屬於參數2中的字元。
            int   len1, len2;
            char  buffer[] = "25,142,330,Smith,J,239-4123";
            len1 = strspn( buffer, "0123456789" );
            len2 = strspn( buffer, ",0123456789" );
 
            len3 = strcspn( buffer, ",330"); 
 
          // len1 = 2, len2 =11 , len3=2

     char *prt = buffer+strcspn(buffer, "4");
     printf("%ld\n", prt-buffer);
     // ans : 4

9.

P.253

'\r'是回車,'\n'是換行,前者使光標到行首,後者使光標下移一格。通常用的Enter是兩個加起來。下面轉一篇文章。


10.

P.254

string.h 的函數 strtok() ,需要兩個字串參數,以第二個參數字串的內容切割第一個參數字串。


char  *strtok( char *str, char const *sep );



11.

P.256


whitespace character : space '  ',
form feed
'\f',
newline
'\n',
carriage return lab
'\t',
or vertical tab
'\v'.






12.

P. 256
  1. The library includes two groups of functions that operate on individual characters, prototyped in the include file ctype.h. The first group is used in classifying characters, and the second group transforms them. 

    Example :




    isspace -  '  ', '\f' , '\t'  , '\n' , '\v'
    isdigit - a decimal digit 0 ~ 9 

    islower - a~z
    isupper - A~Z
    isalpha - a~z or A~Z

    ---------------------------------


     int   tolower( int ch );
     int   toupper( int ch );
    
    toupper returns the uppercase equivalent of its argument, and tolower returns the lowercase equivalent of its argument. If the argument to either function is not a character of the appropriate case, then it is returned unchanged.