As I was saying, jEdit is nice cross-platform Programmer's text editor. I use if for quick regular expression processing of text files. I use it for other things as well but regular expressions are something few windows programs support.
I'll write an introduction to regular expressions at some point. If you are curious, you can search the web for regular expressions1) 2) or for “regex” as they are sometimes referred to. But let me give you a short example of something you can do with regular expressions.
I had a web sitemap list with URLs followed by information such as date and time last modified. I wanted to create a list of just the pages without the complete domain name. That meant deleting the “http://www.somedomain.com” at the beginning of each line (not difficult with most editors) and deleting everything after the end of the page name - all the date/time info to the end of the line.
Prepare yourself. Looking at regex if you aren't used to it can make your eyes bleed. Or at least give you a headache. Here's a search that will match the entire line but save only the bit I want.
^.*\.com(/\w*).*$
Here are some example lines I'm trying to match3):
http://www.example.com/about.html 2009-07-11T21:28:11Z weekly 0.5 http://www.example.com/contact.html 2009-07-11T21:28:11Z weekly 0.5
This is what I want the results to look like.
about.html contact.html
Here's what the Search/Replace dialog box looks like in jEdit to get that result.
What does it mean? The replacement expression, “$1” uses whatever text was matched in the first parenthesis (the only parentheses in this example) and inserts it into the replacement string. Using () to get matching text is referred to as “capturing” the match. If you look at the Notepad++ entry you will see that Notepad++ uses a different nomenclature for inserting captured text.
I'll repeat the search term below and then explain the meaning of the symbols:
^.*\.com(/\w*).*$
One note that could confuse you (if you're not already confused) is if you were searching more complex text the regex shown could cause problems. By default regex patterns are described to be “greedy” which means they match the biggest chunk of text that fits the expression. If ”.com” appeared more than once on a line I was searching the “^.*\.com” would match everything to the last occurrence of ”.com”. To make the expression “non-greedy” we would change it to be “^.*?\.com”
If you are just learning regex, you can test your expressions with jEdit. The next image is a screen shot where I've searched for
\.com/\w*
This matches the first .com (the \. is to escape the period which is a special character in regex); the slash after the .com and then the next contiguous set of alphanumeric characters (the \w implies this).
jEdit highlights the first match. If I hit RETURN while in the search box, the next match would be highlighted.
| Up one level |
|---|