6096 comments
2357 subscribers
6253 on Twitter
Subscribe! Feed reader E-mail

Cleaning up HTML from Microsoft Word

I often see HTML pasted in from Microsoft Word. It has a lot of non-standard and irrelevant code in it, so sometimes it breaks our systems. It’s also hard to edit afterwards.

An easy way to clean that up is to paste it into Windows Live Writer using Edit > Paste Special > Thinned HTML, which removes most of the Microsoft Word extras while leaving the basic formatting in place. You can then copy-and-paste it into the blog/wiki editor. You can also use View > Source to get the HTML source code, which you can paste into the HTML mode of the blog/wiki editor.

Hope that helps!

Short URL: http://sachachua.com/blog/p/8185
  • http://www.trajano.net/ Archimedes Trajano

    I usually do Paste Special -> Text only. Or worst case copy and paste into Notepad then paste. However, your tip is good if the target object has support for HTML.

  • Ken Krause

    I like to use CleanHaven (http://www.holymackerelsoftware.com/MoreSoftware/CleanHaven/CleanHaven.html) – it’s cross platform and let’s you adjust alot more than just formatting…

  • http://info-architecture.blogspot.com Samuel Driessen

    Good tips, Sacha! I use them as well. I recently learned you can use ctrl c and then ctrl-shift-v in Chrome to paste without extra’s. Works great!

  • http://rcarlos.com/wp ramon

    DUDE! (sorry – I can’t help but call you that… – perhaps I should have said ‘dudette’)

    I was so thinking you were gonna put up a sed script, or at least a tidy config. or even: tidy –word-2000 -m msword.html

    I shudder at the thought of using more Microsoft software than I absolutely have to.

    Ramon

  • http://sachachua.com Sacha Chua

    Heh. It’s a simple thing to do with tidy, but in this case, I do have to help a number of people who are on Microsoft Windows and who may boggle at command lines. If I can help them strip out the HTML using whichever tool they feel comfortable using, then I have to deal with fewer broken posts. =)

    If it weren’t already built in to tidy, I’d probably reach for Emacs Lisp rather than sed… ;)

On This Day...

  • 2011: Tweaking my Windows 7 setup more: Emacs on all virtual desktops! — I’ve been using VirtuaWin to set up four virtual desktops on my computer. This makes it easy to group applications: [...]
  • 2009: Find your attention wandering during teleconferences? — Find your attention wandering during teleconferences? The temptation to check your mail or surf the Web is hard to resist [...]
  • 2008: It’s a little bit scary — I feel a bit nervous. There’s a lot of shifting around, and lots of things I wonder about, too. I’m [...]
  • 2007: Remembering what should go in the book — Writing a book about an open source editor and its extensions is difficult. I want to describe many of the things [...]
  • 2005: Who needs a gym when there’s laundry to be done? — Whew. Bushed. I’ve taken all my cold-weather clothes out of vacuum-packed bags and hung them up neatly. I washed all my [...]
  • 2005: Hey! I’m in the ACM Digital Library! — ACM Digital Library entry for Taming the TODO Okay, I’ve officially screwed up in terms of names now… All of my research [...]
  • 2005: Bookmarking beyond the browser — Bookmarking web pages is a breeze with tag-based bookmarking services like del.icio.us, which reduced the need for up-front organization and [...]
  • 2005: More thoughts about Google — My computing dream: universal bookmarking My dream for computing is universal bookmarking. I want to be able to bookmark and link to [...]
  • 2004: Typhoon — A typhoon is supposed to hit our area tomorrow, so I’ve taken the precautions of removing my clothes from the balcony. [...]
  • 2004: Halloween coming up — Looks like people are gearing up for Halloween. I’m looking forward to seeing the costumes. Fortunately, the 31st is a Sunday. [...]
  • 2004: LAZYBONES — The lambanog-soaked farmer snored, lulled to sleep by the carabao’s steady rhythm as it navigated the muddy trail home. The coarse, [...]
  • 2004: On obfuscating e-mail messages — Responded to a thread on PLUG with http://lists.q-linux.com/pipermail/plug/2004-October/037001.html: > Unfortunately, there is evidence that spambots have gained in > 'intelligence' (if one [...]
  • 2004: More uncharacteristically business-y thoughts — The inner marketing geek resurfaced in an e-mail to Sean. By the way, I’m not recommending being insincere. I’m just recommending looking [...]
  • 2003: 10 things about life Hong Kong taught me in 24 hours, from Kathy — According to Kathy, my sister: 10 things about life Hong Kong taught me in 24 hours. 1. It pays to be a [...]
  • 2003: Gender confusion again — From someone whom I shall keep anonymous (pfft!): From: "***** *. *****" Subject: RE: Linux Expo To: "'Sacha Chua'" sacha, are you [...]
  • 2003: planner-move-task-to-plan-page — Cross-reference: PlannerMode#6

Get the highlights as a PDF!

Stories from my Twenties: Highlights from a Decade of Blogging

Free sample!