You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

markdown.rs 15KB

5 years ago
5 years ago
7 years ago
5 years ago
5 years ago
Allow manual specification of header IDs (#685) Justification for this feature is added in the docs. Precedent for the precise syntax: Hugo. Hugo puts this syntax behind a preference named headerIds, and automatic header ID generation behind a preference named autoHeaderIds, with both enabled by default. I have not implemented a switch to disable this. My suggestion for a workaround for the improbable case of desiring a literal “{#…}” at the end of a header is to replace `}` with `&#125;`. The algorithm I have used is not identical to [that which Hugo uses][0], because Hugo’s looks to work at the source level, whereas here we work at the pulldown-cmark event level, which is generally more sane, but potentially limiting for extremely esoteric IDs. Practical differences in implementation from Hugo (based purely on reading [blackfriday’s implementation][0], not actually trying it): - I believe Hugo would treat `# Foo {#*bar*}` as a heading with text “Foo” and ID `*bar*`, since it is working at the source level; whereas this code turns it into a heading with HTML `Foo {#<em>bar</em>}`, as it works at the pulldown-cmark event level and doesn’t go out of its way to make that work (I’m not familiar with pulldown-cmark, but I get the impression that you could make it work Hugo’s way on this point). The difference should be negligible: only *very* esoteric hashes would include magic Markdown characters. - Hugo will automatically generate an ID for `{#}`, whereas what I’ve coded here will yield a blank ID instead (which feels more correct to me—`None` versus `Some("")`, and all that). In practice the results should be identical. Fixes #433. [0]: https://github.com/russross/blackfriday/blob/a477dd1646916742841ed20379f941cfa6c5bb6f/block.go#L218-L234
5 years ago
7 years ago
7 years ago
7 years ago
7 years ago
Allow manual specification of header IDs (#685) Justification for this feature is added in the docs. Precedent for the precise syntax: Hugo. Hugo puts this syntax behind a preference named headerIds, and automatic header ID generation behind a preference named autoHeaderIds, with both enabled by default. I have not implemented a switch to disable this. My suggestion for a workaround for the improbable case of desiring a literal “{#…}” at the end of a header is to replace `}` with `&#125;`. The algorithm I have used is not identical to [that which Hugo uses][0], because Hugo’s looks to work at the source level, whereas here we work at the pulldown-cmark event level, which is generally more sane, but potentially limiting for extremely esoteric IDs. Practical differences in implementation from Hugo (based purely on reading [blackfriday’s implementation][0], not actually trying it): - I believe Hugo would treat `# Foo {#*bar*}` as a heading with text “Foo” and ID `*bar*`, since it is working at the source level; whereas this code turns it into a heading with HTML `Foo {#<em>bar</em>}`, as it works at the pulldown-cmark event level and doesn’t go out of its way to make that work (I’m not familiar with pulldown-cmark, but I get the impression that you could make it work Hugo’s way on this point). The difference should be negligible: only *very* esoteric hashes would include magic Markdown characters. - Hugo will automatically generate an ID for `{#}`, whereas what I’ve coded here will yield a blank ID instead (which feels more correct to me—`None` versus `Some("")`, and all that). In practice the results should be identical. Fixes #433. [0]: https://github.com/russross/blackfriday/blob/a477dd1646916742841ed20379f941cfa6c5bb6f/block.go#L218-L234
5 years ago
Allow manual specification of header IDs (#685) Justification for this feature is added in the docs. Precedent for the precise syntax: Hugo. Hugo puts this syntax behind a preference named headerIds, and automatic header ID generation behind a preference named autoHeaderIds, with both enabled by default. I have not implemented a switch to disable this. My suggestion for a workaround for the improbable case of desiring a literal “{#…}” at the end of a header is to replace `}` with `&#125;`. The algorithm I have used is not identical to [that which Hugo uses][0], because Hugo’s looks to work at the source level, whereas here we work at the pulldown-cmark event level, which is generally more sane, but potentially limiting for extremely esoteric IDs. Practical differences in implementation from Hugo (based purely on reading [blackfriday’s implementation][0], not actually trying it): - I believe Hugo would treat `# Foo {#*bar*}` as a heading with text “Foo” and ID `*bar*`, since it is working at the source level; whereas this code turns it into a heading with HTML `Foo {#<em>bar</em>}`, as it works at the pulldown-cmark event level and doesn’t go out of its way to make that work (I’m not familiar with pulldown-cmark, but I get the impression that you could make it work Hugo’s way on this point). The difference should be negligible: only *very* esoteric hashes would include magic Markdown characters. - Hugo will automatically generate an ID for `{#}`, whereas what I’ve coded here will yield a blank ID instead (which feels more correct to me—`None` versus `Some("")`, and all that). In practice the results should be identical. Fixes #433. [0]: https://github.com/russross/blackfriday/blob/a477dd1646916742841ed20379f941cfa6c5bb6f/block.go#L218-L234
5 years ago
Allow manual specification of header IDs (#685) Justification for this feature is added in the docs. Precedent for the precise syntax: Hugo. Hugo puts this syntax behind a preference named headerIds, and automatic header ID generation behind a preference named autoHeaderIds, with both enabled by default. I have not implemented a switch to disable this. My suggestion for a workaround for the improbable case of desiring a literal “{#…}” at the end of a header is to replace `}` with `&#125;`. The algorithm I have used is not identical to [that which Hugo uses][0], because Hugo’s looks to work at the source level, whereas here we work at the pulldown-cmark event level, which is generally more sane, but potentially limiting for extremely esoteric IDs. Practical differences in implementation from Hugo (based purely on reading [blackfriday’s implementation][0], not actually trying it): - I believe Hugo would treat `# Foo {#*bar*}` as a heading with text “Foo” and ID `*bar*`, since it is working at the source level; whereas this code turns it into a heading with HTML `Foo {#<em>bar</em>}`, as it works at the pulldown-cmark event level and doesn’t go out of its way to make that work (I’m not familiar with pulldown-cmark, but I get the impression that you could make it work Hugo’s way on this point). The difference should be negligible: only *very* esoteric hashes would include magic Markdown characters. - Hugo will automatically generate an ID for `{#}`, whereas what I’ve coded here will yield a blank ID instead (which feels more correct to me—`None` versus `Some("")`, and all that). In practice the results should be identical. Fixes #433. [0]: https://github.com/russross/blackfriday/blob/a477dd1646916742841ed20379f941cfa6c5bb6f/block.go#L218-L234
5 years ago
7 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406
  1. use lazy_static::lazy_static;
  2. use pulldown_cmark as cmark;
  3. use regex::Regex;
  4. use syntect::easy::HighlightLines;
  5. use syntect::html::{
  6. start_highlighted_html_snippet, styled_line_to_highlighted_html, IncludeBackground,
  7. };
  8. use crate::context::RenderContext;
  9. use crate::table_of_contents::{make_table_of_contents, Heading};
  10. use config::highlighting::{get_highlighter, SYNTAX_SET, THEME_SET};
  11. use errors::{Error, Result};
  12. use front_matter::InsertAnchor;
  13. use utils::site::resolve_internal_link;
  14. use utils::slugs::slugify_anchors;
  15. use utils::vec::InsertMany;
  16. use self::cmark::{Event, LinkType, Options, Parser, Tag};
  17. use pulldown_cmark::CodeBlockKind;
  18. const CONTINUE_READING: &str = "<span id=\"continue-reading\"></span>";
  19. const ANCHOR_LINK_TEMPLATE: &str = "anchor-link.html";
  20. #[derive(Debug)]
  21. pub struct Rendered {
  22. pub body: String,
  23. pub summary_len: Option<usize>,
  24. pub toc: Vec<Heading>,
  25. pub internal_links_with_anchors: Vec<(String, String)>,
  26. pub external_links: Vec<String>,
  27. }
  28. // tracks a heading in a slice of pulldown-cmark events
  29. #[derive(Debug)]
  30. struct HeadingRef {
  31. start_idx: usize,
  32. end_idx: usize,
  33. level: u32,
  34. id: Option<String>,
  35. }
  36. impl HeadingRef {
  37. fn new(start: usize, level: u32) -> HeadingRef {
  38. HeadingRef { start_idx: start, end_idx: 0, level, id: None }
  39. }
  40. }
  41. // We might have cases where the slug is already present in our list of anchor
  42. // for example an article could have several titles named Example
  43. // We add a counter after the slug if the slug is already present, which
  44. // means we will have example, example-1, example-2 etc
  45. fn find_anchor(anchors: &[String], name: String, level: u8) -> String {
  46. if level == 0 && !anchors.contains(&name) {
  47. return name;
  48. }
  49. let new_anchor = format!("{}-{}", name, level + 1);
  50. if !anchors.contains(&new_anchor) {
  51. return new_anchor;
  52. }
  53. find_anchor(anchors, name, level + 1)
  54. }
  55. // Returns whether the given string starts with a schema.
  56. //
  57. // Although there exists [a list of registered URI schemes][uri-schemes], a link may use arbitrary,
  58. // private schemes. This function checks if the given string starts with something that just looks
  59. // like a scheme, i.e., a case-insensitive identifier followed by a colon.
  60. //
  61. // [uri-schemes]: https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml
  62. fn starts_with_schema(s: &str) -> bool {
  63. lazy_static! {
  64. static ref PATTERN: Regex = Regex::new(r"^[0-9A-Za-z\-]+:").unwrap();
  65. }
  66. PATTERN.is_match(s)
  67. }
  68. // Colocated asset links refers to the files in the same directory,
  69. // there it should be a filename only
  70. fn is_colocated_asset_link(link: &str) -> bool {
  71. !link.contains('/') // http://, ftp://, ../ etc
  72. && !starts_with_schema(link)
  73. }
  74. // Returns whether a link starts with an HTTP(s) scheme.
  75. fn is_external_link(link: &str) -> bool {
  76. link.starts_with("http:") || link.starts_with("https:")
  77. }
  78. fn fix_link(
  79. link_type: LinkType,
  80. link: &str,
  81. context: &RenderContext,
  82. internal_links_with_anchors: &mut Vec<(String, String)>,
  83. external_links: &mut Vec<String>,
  84. ) -> Result<String> {
  85. if link_type == LinkType::Email {
  86. return Ok(link.to_string());
  87. }
  88. // TODO: remove me in a few versions when people have upgraded
  89. if link.starts_with("./") && link.contains(".md") {
  90. println!("It looks like the link `{}` is using the previous syntax for internal links: start with @/ instead", link);
  91. }
  92. // A few situations here:
  93. // - it could be a relative link (starting with `@/`)
  94. // - it could be a link to a co-located asset
  95. // - it could be a normal link
  96. let result = if link.starts_with("@/") {
  97. match resolve_internal_link(&link, context.permalinks) {
  98. Ok(resolved) => {
  99. if resolved.anchor.is_some() {
  100. internal_links_with_anchors
  101. .push((resolved.md_path.unwrap(), resolved.anchor.unwrap()));
  102. }
  103. resolved.permalink
  104. }
  105. Err(_) => {
  106. return Err(format!("Relative link {} not found.", link).into());
  107. }
  108. }
  109. } else if is_colocated_asset_link(&link) {
  110. format!("{}{}", context.current_page_permalink, link)
  111. } else {
  112. if is_external_link(link) {
  113. external_links.push(link.to_owned());
  114. }
  115. link.to_string()
  116. };
  117. Ok(result)
  118. }
  119. /// get only text in a slice of events
  120. fn get_text(parser_slice: &[Event]) -> String {
  121. let mut title = String::new();
  122. for event in parser_slice.iter() {
  123. match event {
  124. Event::Text(text) | Event::Code(text) => title += text,
  125. _ => continue,
  126. }
  127. }
  128. title
  129. }
  130. fn get_heading_refs(events: &[Event]) -> Vec<HeadingRef> {
  131. let mut heading_refs = vec![];
  132. for (i, event) in events.iter().enumerate() {
  133. match event {
  134. Event::Start(Tag::Heading(level)) => {
  135. heading_refs.push(HeadingRef::new(i, *level));
  136. }
  137. Event::End(Tag::Heading(_)) => {
  138. let msg = "Heading end before start?";
  139. heading_refs.last_mut().expect(msg).end_idx = i;
  140. }
  141. _ => (),
  142. }
  143. }
  144. heading_refs
  145. }
  146. pub fn markdown_to_html(content: &str, context: &RenderContext) -> Result<Rendered> {
  147. // the rendered html
  148. let mut html = String::with_capacity(content.len());
  149. // Set while parsing
  150. let mut error = None;
  151. let mut background = IncludeBackground::Yes;
  152. let mut highlighter: Option<(HighlightLines, bool)> = None;
  153. let mut inserted_anchors: Vec<String> = vec![];
  154. let mut headings: Vec<Heading> = vec![];
  155. let mut internal_links_with_anchors = Vec::new();
  156. let mut external_links = Vec::new();
  157. let mut opts = Options::empty();
  158. let mut has_summary = false;
  159. opts.insert(Options::ENABLE_TABLES);
  160. opts.insert(Options::ENABLE_FOOTNOTES);
  161. opts.insert(Options::ENABLE_STRIKETHROUGH);
  162. {
  163. let mut events = Parser::new_ext(content, opts)
  164. .map(|event| {
  165. match event {
  166. Event::Text(text) => {
  167. // if we are in the middle of a code block
  168. if let Some((ref mut highlighter, in_extra)) = highlighter {
  169. let highlighted = if in_extra {
  170. if let Some(ref extra) = context.config.extra_syntax_set {
  171. highlighter.highlight(&text, &extra)
  172. } else {
  173. unreachable!(
  174. "Got a highlighter from extra syntaxes but no extra?"
  175. );
  176. }
  177. } else {
  178. highlighter.highlight(&text, &SYNTAX_SET)
  179. };
  180. //let highlighted = &highlighter.highlight(&text, ss);
  181. let html = styled_line_to_highlighted_html(&highlighted, background).unwrap();
  182. return Event::Html(html.into());
  183. }
  184. // Business as usual
  185. Event::Text(text)
  186. }
  187. Event::Start(Tag::CodeBlock(ref kind)) => {
  188. if !context.config.highlight_code {
  189. return Event::Html("<pre><code>".into());
  190. }
  191. let theme = &THEME_SET.themes[&context.config.highlight_theme];
  192. match kind {
  193. CodeBlockKind::Indented => (),
  194. CodeBlockKind::Fenced(info) => {
  195. highlighter = Some(get_highlighter(info, &context.config));
  196. }
  197. };
  198. // This selects the background color the same way that start_coloured_html_snippet does
  199. let color = theme
  200. .settings
  201. .background
  202. .unwrap_or(::syntect::highlighting::Color::WHITE);
  203. background = IncludeBackground::IfDifferent(color);
  204. let snippet = start_highlighted_html_snippet(theme);
  205. Event::Html(snippet.0.into())
  206. }
  207. Event::End(Tag::CodeBlock(_)) => {
  208. if !context.config.highlight_code {
  209. return Event::Html("</code></pre>\n".into());
  210. }
  211. // reset highlight and close the code block
  212. highlighter = None;
  213. Event::Html("</pre>".into())
  214. }
  215. Event::Start(Tag::Image(link_type, src, title)) => {
  216. if is_colocated_asset_link(&src) {
  217. let link = format!("{}{}", context.current_page_permalink, &*src);
  218. return Event::Start(Tag::Image(link_type, link.into(), title));
  219. }
  220. Event::Start(Tag::Image(link_type, src, title))
  221. }
  222. Event::Start(Tag::Link(link_type, link, title)) if link.is_empty() => {
  223. error = Some(Error::msg("There is a link that is missing a URL"));
  224. Event::Start(Tag::Link(link_type, "#".into(), title))
  225. }
  226. Event::Start(Tag::Link(link_type, link, title)) => {
  227. let fixed_link = match fix_link(
  228. link_type,
  229. &link,
  230. context,
  231. &mut internal_links_with_anchors,
  232. &mut external_links,
  233. ) {
  234. Ok(fixed_link) => fixed_link,
  235. Err(err) => {
  236. error = Some(err);
  237. return Event::Html("".into());
  238. }
  239. };
  240. Event::Start(Tag::Link(link_type, fixed_link.into(), title))
  241. }
  242. Event::Html(ref markup) if markup.contains("<!-- more -->") => {
  243. has_summary = true;
  244. Event::Html(CONTINUE_READING.into())
  245. }
  246. _ => event,
  247. }
  248. })
  249. .collect::<Vec<_>>(); // We need to collect the events to make a second pass
  250. let mut heading_refs = get_heading_refs(&events);
  251. let mut anchors_to_insert = vec![];
  252. // First heading pass: look for a manually-specified IDs, e.g. `# Heading text {#hash}`
  253. // (This is a separate first pass so that auto IDs can avoid collisions with manual IDs.)
  254. for heading_ref in heading_refs.iter_mut() {
  255. let end_idx = heading_ref.end_idx;
  256. if let Event::Text(ref mut text) = events[end_idx - 1] {
  257. if text.as_bytes().last() == Some(&b'}') {
  258. if let Some(mut i) = text.find("{#") {
  259. let id = text[i + 2..text.len() - 1].to_owned();
  260. inserted_anchors.push(id.clone());
  261. while i > 0 && text.as_bytes()[i - 1] == b' ' {
  262. i -= 1;
  263. }
  264. heading_ref.id = Some(id);
  265. *text = text[..i].to_owned().into();
  266. }
  267. }
  268. }
  269. }
  270. // Second heading pass: auto-generate remaining IDs, and emit HTML
  271. for heading_ref in heading_refs {
  272. let start_idx = heading_ref.start_idx;
  273. let end_idx = heading_ref.end_idx;
  274. let title = get_text(&events[start_idx + 1..end_idx]);
  275. let id = heading_ref.id.unwrap_or_else(|| {
  276. find_anchor(
  277. &inserted_anchors,
  278. slugify_anchors(&title, context.config.slugify.anchors),
  279. 0,
  280. )
  281. });
  282. inserted_anchors.push(id.clone());
  283. // insert `id` to the tag
  284. let html = format!("<h{lvl} id=\"{id}\">", lvl = heading_ref.level, id = id);
  285. events[start_idx] = Event::Html(html.into());
  286. // generate anchors and places to insert them
  287. if context.insert_anchor != InsertAnchor::None {
  288. let anchor_idx = match context.insert_anchor {
  289. InsertAnchor::Left => start_idx + 1,
  290. InsertAnchor::Right => end_idx,
  291. InsertAnchor::None => 0, // Not important
  292. };
  293. let mut c = tera::Context::new();
  294. c.insert("id", &id);
  295. let anchor_link = utils::templates::render_template(
  296. &ANCHOR_LINK_TEMPLATE,
  297. context.tera,
  298. c,
  299. &None,
  300. )
  301. .map_err(|e| Error::chain("Failed to render anchor link template", e))?;
  302. anchors_to_insert.push((anchor_idx, Event::Html(anchor_link.into())));
  303. }
  304. // record heading to make table of contents
  305. let permalink = format!("{}#{}", context.current_page_permalink, id);
  306. let h =
  307. Heading { level: heading_ref.level, id, permalink, title, children: Vec::new() };
  308. headings.push(h);
  309. }
  310. if context.insert_anchor != InsertAnchor::None {
  311. events.insert_many(anchors_to_insert);
  312. }
  313. cmark::html::push_html(&mut html, events.into_iter());
  314. }
  315. if let Some(e) = error {
  316. Err(e)
  317. } else {
  318. Ok(Rendered {
  319. summary_len: if has_summary { html.find(CONTINUE_READING) } else { None },
  320. body: html,
  321. toc: make_table_of_contents(headings),
  322. internal_links_with_anchors,
  323. external_links,
  324. })
  325. }
  326. }
  327. #[cfg(test)]
  328. mod tests {
  329. use super::*;
  330. #[test]
  331. fn test_starts_with_schema() {
  332. // registered
  333. assert!(starts_with_schema("https://example.com/"));
  334. assert!(starts_with_schema("ftp://example.com/"));
  335. assert!(starts_with_schema("mailto:user@example.com"));
  336. assert!(starts_with_schema("xmpp:node@example.com"));
  337. assert!(starts_with_schema("tel:18008675309"));
  338. assert!(starts_with_schema("sms:18008675309"));
  339. assert!(starts_with_schema("h323:user@example.com"));
  340. // arbitrary
  341. assert!(starts_with_schema("zola:post?content=hi"));
  342. // case-insensitive
  343. assert!(starts_with_schema("MailTo:user@example.com"));
  344. assert!(starts_with_schema("MAILTO:user@example.com"));
  345. }
  346. #[test]
  347. fn test_is_external_link() {
  348. assert!(is_external_link("http://example.com/"));
  349. assert!(is_external_link("https://example.com/"));
  350. assert!(is_external_link("https://example.com/index.html#introduction"));
  351. assert!(!is_external_link("mailto:user@example.com"));
  352. assert!(!is_external_link("tel:18008675309"));
  353. assert!(!is_external_link("#introduction"));
  354. assert!(!is_external_link("http.jpg"))
  355. }
  356. }