Regular expressions to match C grammar
This page discusses regular expressions for parsing various kinds of C grammar.
The following Perl regular expression matches the traditional-style C
comments like /* this */
or
/* this */
our $trad_comment_re = qr! /\* (?: # Match "not an asterisk" [^*] | # Match multiple asterisks followed # by anything except an asterisk or a # slash. \*+[^*/] )* # Match multiple asterisks followed by a # slash. \*+/ !x;
Matching the C++-style comments is easier:
our $cxx_comment_re = qr!//.*\n!;
The following regular expression matches a C preprocessor instruction:
our $cpp_re = qr/^\h* \# (?: $trad_comment_re | [^\\\n] | \\[^\n] | \\\n )+\n /mx;
The following regular expressions match a single C string, like "this"
, and compound C strings, like "this" "one"
:
our $single_string_re = qr/ (?: " (?:[^\\"]+|\\[^"]|\\")* " ) /x;
our $string_re = qr/$single_string_re(?:\s*$single_string_re)*/;
The following regular expressions match one-character C operators and all C operators respectively.
our $one_char_op_re = qr/(?:\%|\&|\+|\-|\=|\/|\||\.|\*|\:|>|<|\!|\?|~|\^)/;
our $operator_re = qr/ (?: # # Operators with two characters # \|\||&&|<<|>>|--|\+\+|->|== | # Operators with one or two characters # followed by an equals sign. (?:<<|>>|\+|-|\*|\/|%|&|\||\^) = | $one_char_op_re ) /x;
All of these regular expressions are supplied in the Perl CPAN module C::Tokenize.
Copyright © Ben Bullock 2009-2024. All
rights reserved.
For comments, questions, and corrections, please email
Ben Bullock
(benkasminbullock@gmail.com).
/
Privacy /
Disclaimer