regex - Breaking up PascalCase in R -


i have series of character strings using pascalcase.

"bobdylan" "mikhailgorbachev"  "helpfulstackoverflowpeople" 

i want function in r put spaces between each word. have achieved perl regular expression , gsub( ) function. essentially, putting space before every capital letter not first letter of string.

gsub("(?!^)(?=[a-z])", " ","bobdylan",perl=true) [1] "bob dylan" 

however, of words in list may have capitalized abbreviations in them not want have separated spaces.

"bobdylanusa" "mikhailgorbachevussr"  "helpfulstackoverflowpeople" 

applying same syntax before create spaces between every capital letter.

gsub("(?!^)(?=[a-z])", " ","mikhailgorbachevussr",perl=true) [1] "mikhail gorbachev u s s r" 

however, abbreviations stay same. desired output following.

[1] "bob dylan usa" [1] "mikhail gorbachev ussr" [1] "helpful stack overflow people" 

what else need in gsub( ) expression? alternatively, there better way approach problem entirely.

x <- c("bobdylanusa",        "mikhailgorbachevussr",        "helpfulstackoverflowpeople")  gsub('[a-z]\\k(?=[a-z])', ' ', x, perl = true)  # [1] "bob dylan usa"                 "mikhail gorbachev ussr"        # [3] "helpful stack overflow people" 

or

gsub('(?<=[a-z])(?=[a-z])', ' ', x, perl = true)  # [1] "bob dylan usa"                 "mikhail gorbachev ussr"        # [3] "helpful stack overflow people" 

or guy split single letter words or a

x <- c("bobdylanusa",        "mikhailgorbachevussr",        "helpfulstackoverflowpeople",        "iamatalldrinkofwater")  gsub('(?<=[a-z])(?=[a-z])|(?<=[a-z])(?=[a-z][a-z])', ' ', x, perl = true)  # [1] "bob dylan usa"                 "mikhail gorbachev ussr"        # [3] "helpful stack overflow people" "i tall drink of water"  

Comments

Popular posts from this blog

html - Styling progress bar with inline style -

java - Oracle Sql developer error: could not install some modules -

How to use autoclose brackets in Jupyter notebook? -