Thursday, April 19, 2012

inputString/string.gsub Help Required

I'm trying to re-work the Pirate Speak mod it to use internally in my guild - we're rolling up a small alt-guild made up of Orcs, and I'm trying to re-write your mod so that we all talk like Orks from Games Workshop's Warhammer 40,000.

Anyhow, most of the code is straight-forward, which is good as my Lua knowledge is hardly in-depth, the only real problem I'm having is when I'm trying to get the mod to change words that share certain sequences of letters. For example, "his" and "this" tend to get mixed up, as do "the" and "their".

I notice in the code use various symbols like ^, [a], $, are used to prevent the same thing, but I can't quite get it to work, and I was hoping there was a way around this. Can anyone help me out, assuming it's straightforward, or maybe point me towards a resource online I could use? I've tried going through the Lua 5.1 manual online, but as I don't quite know how to correctly phrase what I'm looking for, I'm not getting far.|||The LUA Manual itself isn't so helpful for the beginner - but the online book "Programming in LUA" is VERY useful with good examples (but you'll need to concentrate) :

http://www.lua.org/pil/

You need section III, chapter 20 "The String Library"

All four sections on Pattern matching, captures, and patterns are very helpful.



Below is some code that might help you :


Code:


local myString = "Novice coders may confuse the word 'his' with the letters in this, or histogram. His solution is included in this example."

local newString = string.gsub(myString, "his", "her");
print(newString);
-- OUTPUT : "Novice coders may confuse the word 'her' with the letters in ther, or hertogram. His solution is included in ther example."

-- PROBLEMS :
-- 1.) ALL "his" letters are changed including those in the words "this" and "histogram"
-- 2.) The word "His" has not been changed

-- SOLUTIONS :

-- PROBLEM 1.) There are lots of ways you could actually do this - this is the simplest example of how to NOT change words like "this" and "historgram"
-- NOTE that %A means any non-letter characters like spaces, commas, full stops
newString = string.gsub(myString, "%Ahis%A", "her");
print(newString);
-- OUTPUT : "Novice coders may confuse the word her with the letters in this, or histogram. His solution is included in this example."
-- "this" and "histogram" are now NOT changed
-- HOWEVER, there is a new problem in that 'his' (in single quotes) has been replaced by her (without qutoes)...the answer is to use captures and parameters
-- SO...
newString = string.gsub(myString, "(%A)his(%A)", "%1her%2");
print(newString);
--OUTPUT : "Novice coders may confuse the word 'her' with the letters in this, or histogram. His solution is included in this example."
-- The single non-letter characters are "captured" and labelled in order %1, %2, %3, etc. and can be used in the replacement...


-- PROBLEM 2.) "His" does not equal "his"
-- It is probably not safe to .lowercase the entire string, and carry out the substitution, as you don't know how to get back to a capitalised version...
-- The easiest solution is to just use a second .gsub statement for the special case of a capitalised
newString = string.gsub(newString, "(%A)His(%A)", "%1Her%2");
print(newString);
-- OUTPUT : "Novice coders may confuse the word 'her' with the letters in this, or histogram. Her solution is included in this example."
-- HOWEVER, if the first letter is shared by both the original and replacement words, then it would be possible to do both at the same time....
newString = string.gsub(myString, "(%A[Hh])is(%A)", "%1er%2");
print(newString);
-- OUTPUT : "Novice coders may confuse the word 'her' with the letters in this, or histogram. Her solution is included in this example."



NOTE : the limitations of %A is that some characters used as letters in different languages may not be recognised as letters by the LUA compiler, so there could be localisation issues.

Instead of %A, you could specify a list of common possible "characters" that could surround your word

e.g.

newString = string.gsub(myString, ([%s%.%'\"][Hh])is([%s%.%'\"]), "%1er%2");|||Telic - I owe you an apology - the information you've given me has been super helpful, and I should have replied a long time ago to say thanks. For the record - thank you very much for this

No comments:

Post a Comment