Regular

Expressions 


What is 

a regexp?





The regular expressions, also know as RegEx or RegExp are a conjunction of literal and symbolic characters with the propose to find, replace, count or test a defined pattern.


The regex is property of any language so it depends of the language to support the regex methods a syntax and the language's engine to provide a regex engine. Javascript supports some part of the standard, but there are many features it doesn't support, for a more detailed list of feature support per language you can check out this page


This training is focused in the use of RegExp inside Javascript, so don't expect to see all the regular expression's standard methods.



Why to learn it


The main reason is learn regex is to remove that face when you see one. Seriously, is not that hard as it seems, I promise.

You don't always want to be asking - what this regex means? - don't you? The regex sometimes seems like a spaghetti group of chars, but once you understand one single char, you'll become a kind of super hero, finding exactly what you want, and modifying long texts with a simple sequence of chars.


Your job as a Javascript and front-end developer demands to validate form fields, sometimes with complex patterns, so you need to know not just how to create the validation pattern, you should create it in the best way possible, and be able to update an existing regex created by anyone.


Methods in Javascript



Javascript is a beautiful language to start writing regex, because the patterns are first-class citizens, this means that in Javascript there's an object of type regex. In other languages, you need to wrap your regex pattern into a string, this raise some confusions because in regex, mostly all the matchmakers needs a backslash, and if you have a backslash in a string, you need to escape the backslash, so the mess ends up messier.

How to create a regex:




You have two methods to create a new regex, with the Javascript constructor, or simply writing the regex between to slashes.


The regex constructor is: RegExp(pattern, [flags]) we'll talk about the flags and matchmakers later, for now, we'll only work with literal chars, so for example:


 var myRegex = new RegExp("hola");


This code will store a new regular expresion in the variable myRegex, that will math the literal word hola.




The other way we got to create a new regex is just writing it between two slashes and put the flags after that: /patter/flags, like this:


var myRegex = /hola/; 


Both methods are valid, and there's no "best" method to do it, just two option, one is cleaner and the other is more flexible, just choose wisely which method you need.


In both methods, we'll get a RegExp object that contains some read-only properties, and methods, so for example if you want to check the source you can try:


myRegex.source; // "hola" 


It also has two methods, test and exec. Both methods require a string as parameter, this string will be matched against the regex. 


The method test will return a boolean value if the pattern matches the string given or not. 


The exec method will return an array of the results or null if no result was found.



myRegex.test("casa"); // true
myRegex.test("caso"); // false
myRegex.exec("casa"); // ["casa"]
myRegex.exec("caso"); // null 


Note that, by default, the regex match everything or nothing, you can change this behavior with flags or matchmakers, but for now, we'll let this to be like that.

Apart of those methods in the RegExp object, the strings also have some methods that can be used to match against a regex, those methods are match, search, split and replace. Those methods accepts a string or a regex as parameter. 

  • "casa".match(/casa/); // ["casa"] -> returns an array of matched elements 
  • "casa casita".search(/casita/); // 5 -> returns the index of the first match 
  • "casacasitacasa".split(/casita/); // ["casa", "casa"] -> returns the string splitted by the pattern 
  • "casacasitacasa".replace(/casita/, " "); // "casa casa" -> returns the string with the second parameters in the places it finds the pattern.



Where to test



Let's accept it, you'll never gonna write a single line of code without an error, well, maybe sometimes, but not always, and with more reason if we're talking about regex. 


You can always open a blank page (about:blank), open the developer tools and write some regex to start testing; test as many scenarios you can imagine to not get a false positive.

The developer tools are your best friend, but sometimes you'll want something more graphical, so you can use RegExr, a site made in Flash, this is where I learned the basics about regex, it's really cool, because you can search the definitions of some matchmakers, but be careful, this is made in flash, so you can find things that doesn't exists on javascript.


Also, this amazing site refiddle, is like a social network of regex, it's very useful if you need to share a regex, or simply bookmark it to use it later.



Basic Patterns

  • . - match everything but new line.
  • \w - match the alphanumeric characters (standard chars from a to z, numbers from 0 to 9 and underscore).
  • \W - match the non-alphanumeric characters (the opposite of the \w).

  • \d - match the digits from 0 to 9.
  • \D - match the non-digits (the opposite of the \D).
  • \s - match the white space characters (space, tab, from feed, line feed, etc).
  • \S - match the non-white space characters (the opposite of the \s).
  • \t - match the tab character
  • \n - match the new line feed
  • \r - match the carriage return character
  • \b - match the word boundary/edge, but don't select any char
  • \B - match the non-word boundary/edge, but don't select any char



Modifiers

  • * - match the preceding pattern 0 or more times.
  • + - match the preceding pattern 1 or more times.
  • ? - match the preceding pattern 0 or 1 time.
  • {x} - match the preceding pattern x times.
  • {x,y} - match the preceding pattern between x and y times.
  • [x-y] - match a range of chars between the char x and the char y.
  • [xyz] - match at least one of the patterns given like xyz.
  • [^xyz] - match everything but xyz.
  • (x) - match x and remembers the match
  • (?:x) - match x but does not remembers the match.
  • x(?=y) - match x just if it's followed by y.
  • x(?!y) - match x just it it's not followed by y.
  • x|y - match x or y.
  • ^ - match the start of the string.
  • $ - match the end of the string.



Flags




  • g - global: by default the regex finds the first coincidence and then stops, this flag makes the search global, this means it will not stop in the first match.
  • i - ignore case: by default the regex are case sensitive, this flag allow us to forget about the case.
  • m - multiline: by default the start and the end is marked at the string level, this flag sets the start and the end at line level.

RegExp Training

By Diego Barahona

RegExp Training

A training about regular expression from scratch.

  • 1,196