Convert Rich Text to Long Text and remove HTML tags

I have a rich text field called Notes.

When a user exports the data table to CSV, I’d prefer the plain text version (e.g. remove all HTML tags) of the Notes field as the output.

To try and solve this, I created a form record rule that sets a separate Notes (Long Text) field to the form value of Notes.

However, I’m finding this approach still retains the HTML tags.

Is there a better way to strip out the HTML tags in a rich text field so a plain text version is available in the CSV export?

1 Like

Great idea!

We’ll add this in the coming days to the Text Utilities pipe.

1 Like

Awesome, thanks Moe! Looking forward to using it.

@ScottG

I see the team has added 3 new methods in the “Tadabase Text Utilities” pipe where you can convert:

  • HTML to Text
  • HTML to Markdown
  • Markdown to HTML

Let me know if you have any questions.

1 Like

Thanks for the update, @moe!

Just tested out and the HTML to Text method works well with text containing bold or italics.

It does not appear to work when the value contains any of the following in the rich text editor:

  • Text style (Paragraph, Heading 1, etc.)
  • Underline
  • Strikethrough
  • Custom text color
  • Custom background color
  • Text align (left, center, right)
  • Bullet list
  • Numbered list
  • Horizontal line
  • Table
  • Insert link

Might this be a bug in how the pipe functions?

Thanks for the feedback. I’ll get this checked out and update you again very soon.

@moe @ScottG
I think Underline, Strikethrough can not use in markdown valid format.
officially style(css) not supported in markdown.
Markdown only support tags like:

  • Heading tags [‘h1’, ‘h2’, ‘h3’, ‘h4’, ‘h5’, ‘h6’]
  • Blockquote Tag [‘blockquote’]
  • Paregraph and Block tags [“p”, “div”]
  • Table tags [‘table’, ‘tr’, ‘th’, ‘td’, ‘thead’, ‘tbody’, ‘tfoot’, ‘colgroup’, ‘col’, ‘caption’]
  • Image tag [‘img’]
  • Link tag [‘a’]
  • List Item tag [‘ol’, ‘ul’, ‘li’]
  • Emphasis tag [‘em’, ‘i’, ‘strong’, ‘b’]
  • HorizontalRule Tag [‘hr’]
  • Hard Break Tag [‘br’]
  • Code Tag [‘code’]
  • Preformatted tag [‘pre’]

i have tested with Tadabase pipe and it generate nice output
Input:


	<!-- Paragraph and Heading tags -->
	<h1 style="text-align: center;">This is a Heading 1</h1>
	<p style="text-align: justify;">This is a paragraph. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut a massa quis risus viverra egestas. Integer bibendum, nulla sed varius volutpat, quam justo sollicitudin turpis, nec eleifend mauris purus vel nisl. Nullam id orci ac augue congue iaculis nec vel nulla. Praesent vestibulum augue eget urna ultrices, et tincidunt ipsum luctus. Duis laoreet tincidunt diam, sit amet bibendum libero pharetra vel.</p>
	
	<!-- Underline and Strikethrough -->
	<p><u>This text is underlined.</u></p>
	<p><s>This text has a strikethrough.</s></p>
	<p><b>This text has a bold.</b></p>
	<p><i>This text has a italic.</i></p>
	
	<!-- Custom text and background colors -->
	<p>This text has a custom color and background color.</p>
	
	<!-- Text alignment -->
	<p style="text-align: center;">This text is centered.</p>
	<p style="text-align: right;">This text is right-aligned.</p>
	
	<!-- Bullet and Numbered lists -->
	<ul>
		<li>Item 1</li>
		<li>Item 2</li>
		<li>Item 3</li>
	</ul>
	
	<ol>
		<li>Item 1</li>
		<li>Item 2</li>
		<li>Item 3</li>
	</ol>
	
	<!-- Horizontal line -->
	<hr>
	
	<!-- Table -->
	<table style="width:100%">
	  <tr>
	    <th>Header 1</th>
	    <th>Header 2</th>
	    <th>Header 3</th>
	  </tr>
	  <tr>
	    <td>Row 1, Column 1</td>
	    <td>Row 1, Column 2</td>
	    <td>Row 1, Column 3</td>
	  </tr>
	  <tr>
	    <td>Row 2, Column 1</td>
	    <td>Row 2, Column 2</td>
	    <td>Row 2, Column 3</td>
	  </tr>
	</table>
	
	<!-- Link -->
	<p>Visit <a href="https://www.example.com/">Example.com</a> for more information.</p>


Output:

Hi @christopher93, thanks for your reply. I still seem to be running into issues.

Below is my test of the HTML to Text method in the Tadabase Text Utilities pipe.

The source code I’m using as input is as follows:

<p>these are some test notes</p>
<p><strong>bold</strong> and <span style="text-decoration: underline;">underline</span></p>
<p>with a few:</p>
<ul>
<li>bullet points</li>
</ul>

cc @moe

I see the issue here, we’ll have this fixed soon hopefully. It comes down to the double quotes you’re using that are breaking the JSON payload.

We’ll need to find a better solution for this.

If you test with this, it will work:

<p>these are some test notes</p> <p><strong>bold</strong> and <span style='text-decoration: underline;'>underline</span></p> <p>with a few:</p> <ul> <li>bullet points</li> </ul>
1 Like