Re: split string by special characters
От | Andreas Wenk |
---|---|
Тема | Re: split string by special characters |
Дата | |
Msg-id | 4A6A262F.3050506@netzmeister-st-pauli.de обсуждение исходный текст |
Ответ на | split string by special characters (Jan-Erik <jan-erik.larka@os2world.com>) |
Список | pgsql-general |
Jan-Erik wrote: > I wonder if you could please help me out to extract a character string > to an array or better yet, a table. > > I'd like to split strings of text up into words and delimiters (but > not delete the delimiters). The delimiters are defined as comma, > space, dot, singe/double quotation mark, question mark etc.¹ in a > separate table (delimiters) depending on what rules apply for the > input. > > regexp_split_to_array/table seem quite suitable but I have > difficulties to form the right expression with it, apart from that it > remove the delimiters as well. > > Example: > This is just a text that contain special characters such as , (comma), > "(", ")" (left and right parenthesis) as well as "?" question mark. > How do I split it up with PostgreSQL? > > Expected result: > {This, " ", is, " ", just, " ", a, ..., PostgreSQL, "?" } > __________________ > ¹) Also later on tags such as <html> and at other times something > else depending on the circumstances. > > //Jan-Erik > Hi, I was thinking about that and in my opinion the approach to let the database do that is the wrong direction. Sure you can do a lot with regexp_split_to_table or regexp_split_to_array but they are kind of limited compared to a programming language using regular expressions. If I had to try to get your jobdone, I would try regexp_matches() like: SELECT regexp_matches('This is just a text, that contain special characters such as, (comma),"(", ")" (left and right parenthesis) as well as "?" question, mark.How do I split it up with PostgreSQL?', E'(\\w*.)\\s+','g'); regexp_matches ---------------- {This} {is} {just} {a} {"text,"} {that} {contain} {special} {characters} {such} {"as,"} {","} {"\""} {left} {and} {right} {parenthesis)} {as} {well} {as} {"\""} {"question,"} {How} {do} {I} {split} {it} {up} {with} (29 rows) So, you have the ability to catch the seperators like ','. But for now, teh example just catches the comma. But you want to catch a lot of other seperators as well. I suggest you do that within the logic of your coding language because I don't think this will be an easy way to walk ;-). This is no database job in my opinion. Cheers Andy
В списке pgsql-general по дате отправления: