Как я могу разобрать URL YouTube?

Question 1

Как извлечь только

http://www.youtube.com/watch?v=qdRaf3-OEh4

из URL, например,

http://www.youtube.com/watch?v=qdRaf3-OEh4&playnext=1&list=PL4367CEDBC117AEC6&feature=results_main

Меня интересует только параметр «v».

Question 2

Обновление:

Лучшие были бы:

sed 's/^.\+\(\/\|\&\|\?\)v=\([^\&]*\).*/\2/'
awk 'match($0,/((\/|&|\?)v=)([^&]*)/,x){print x[3]}'
grep -Po '(?<=(\/|&|\?)v=)[^&]*'
# Saying match / or & then v=

Состояния RFC 3986:

   URI           = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

   query         = *( pchar / "/" / "?" )
   fragment      = *( pchar / "/" / "?" )

   pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
   unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
   sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
                 / "*" / "+" / "," / ";" / "="
   …

Таким образом быть безопасным использованием:

 | sed 's/#.*//' | - to remove #fragment part

впереди.

Т.е.

| sed 's/#.*//' | grep -Po '(?<=(\/|&)v=)[^&]*'

SED (2):

echo 'http://www.youtube.com/watch?v=qdRaf3-OEh4&playnext=1&list=PL4367CEDBC117AEC6&feature=results_main' \
| sed 's/^.\+\Wv=\([^\&]*\).*/\1/'

Объяснение:


's       
/…/…/    /THIS/WITH THIS/

'substitute/MATCH 0 or MORE THINGS and GROUP them in ()/WITH THIS/

+-------------------------- s    _s_ubsititute
|+------------------------- /    START MATCH
||                    +---- /    END MATCH
||                    | +-- \1   REPLACE WITH - \1==Group 1. Or FIRS low ().
||                    | | +- /   End of SUBSTITUTE
s/^.\+\Wv=\([^\&]*\).*/\1/'
  +++-+-+-+-+-----+-+------- ^        Match from beginning of line
   ++-+-+-+-+-----+-+------- .        Match any character
    +-+-+-+-+-----+-+------- \+       multiple times (grep (greedy +, * *? etc))
      +-+-+-+-----+-+------- \W       Non-word-character
        +-+-+-----+-+------- v=       Literally match "v="
          +-+-----+-+------- \(       Start MATCH GROUP
            +-----+-+------- [^\&]*   Match any character BUT & - as many as possible
                  +-+------- \)       End MATCH GROUP
                    +------- .*       Match anything; *As many times as possible 
                                      - aka to end of line; as there is no 

         [abc]  would match a OR b OR c
         [abc]* would match a AND/OR b AND/OR c - as many times as possible
         [^abc] would match anything BUT a,b or c

/\1/     Replace ENTIRE match with MATCH GROUP number 1.
         That would be - everything between \( and \) - which his anything but "&"
         after the literal string "v=" - which in turn has a non word letter in 
         front of it.

         That also means that no match means no substitution which ultimately result in 
         no change.

Результат: qdRaf3-OEh4

Примечание: Если никакое соответствие вся строка не будет возвращено.

(G) AWK:

echo 'http://www.youtube.com/watch?v=qdRaf3-OEh4&playnext=1&list=PL4367CEDBC117AEC6&feature=results_main' \
| awk 'match($0,/(\Wv=)([^&]*)/,v){print v[2]}'

Результат: qdRaf3-OEh4

Объяснение:

В Awk match(string, regexp) функция, которая ищет самое длинное, крайнее левое, соответствие regexp в строке. Здесь я использовал расширение, которое идет с Простофилей. (см. Awk, Простофилю; MAwk и т.д.), который помещает отдельные соответствия - который является: что между круглой скобкой - в массиве соответствий.

Шаблон справедливо похож на Perl/Grep один ниже.


  +-------------------------------------- Built in function
  |    +--------------------------------- Entire input ($1 would have been filed 1)
  |    |                                  etc. (Using default delimiters " "*)
  |    |
  |    |
  |    |  (....)(....) ------------------ Places \Wv= in one group 1, and [^&]* group 2.
match($0, /(\Wv=)([^&]*)/, v){print v[2]}
                           |   |    | |
                           |   |    +-+---- Use "v" from /, v; v is a user defined name
                           |   |      +---- 2 specifies index in v, which is group from
                           |   |            what is between ()'s in /…/
                           |   |
                           |   +----------- Print is another built in function.
                           +--------------- Group name that one can use in print.

GREP (Используя совместимый с Perl):

echo 'http://www.youtube.com/watch?v=qdRaf3-OEh4&playnext=1&list=PL4367CEDBC117AEC6&feature=results_main' | \
grep -Po '(?<=\Wv=)[^&]*'

Результат: qdRaf3-OEh4

Объяснение:


-P  Use Perl compatible
-o  Only print match of the expression.
    - That means: Of our pattern only print/return what it matches.
    If nothing matches; return nothing.

          +------- ^    Negate math to - do not match (ONLY as it is FIRST between [])
          |+------ &    A literal "&" character
          || 
(?<=\Wv=)[^&]*
|   | |  |  ||
|   | |  |  |+---- *     Greedy; as many times as possible.
|   | |  +--+----- []    Wild order/any order of what is inside []
|   | +----------- v=    Literal v=
|   +------------- \W    Non Word character
+----------------- (?<=  What follows should be (mediately) preceded by.
                    ?=Huh, <=left, = =Equals to

So: Match literal "v=" where "v" is preceded by an non-word-character. Then match
anything; as many times as possible until we are at end of line or we meet an "&".

As you can't have "&" in an URL between key/value pairs this should be OK.

Question 3

Question 4

echo 'http://www.youtube.com/watch?v=qdRaf3-OEh4&playnext=1&list=PL4367CEDBC117AEC6&feature=results_main' | sed -e 's/&.*//' -e 's/.*watch?//'

получит вас v=qdRaf3-OEh4.

Runium · Accepted Answer · 18 December 2012 в 23:58

Обновление:

Лучшие были бы:

sed 's/^.\+\(\/\|\&\|\?\)v=\([^\&]*\).*/\2/'
awk 'match($0,/((\/|&|\?)v=)([^&]*)/,x){print x[3]}'
grep -Po '(?<=(\/|&|\?)v=)[^&]*'
# Saying match / or & then v=

Состояния RFC 3986:

   URI           = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

   query         = *( pchar / "/" / "?" )
   fragment      = *( pchar / "/" / "?" )

   pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
   unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
   sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
                 / "*" / "+" / "," / ";" / "="
   …

Таким образом быть безопасным использованием:

 | sed 's/#.*//' | - to remove #fragment part

впереди.

Т.е.

| sed 's/#.*//' | grep -Po '(?<=(\/|&)v=)[^&]*'

SED (2):

echo 'http://www.youtube.com/watch?v=qdRaf3-OEh4&playnext=1&list=PL4367CEDBC117AEC6&feature=results_main' \
| sed 's/^.\+\Wv=\([^\&]*\).*/\1/'

Объяснение:


's       
/…/…/    /THIS/WITH THIS/

'substitute/MATCH 0 or MORE THINGS and GROUP them in ()/WITH THIS/

+-------------------------- s    _s_ubsititute
|+------------------------- /    START MATCH
||                    +---- /    END MATCH
||                    | +-- \1   REPLACE WITH - \1==Group 1. Or FIRS low ().
||                    | | +- /   End of SUBSTITUTE
s/^.\+\Wv=\([^\&]*\).*/\1/'
  +++-+-+-+-+-----+-+------- ^        Match from beginning of line
   ++-+-+-+-+-----+-+------- .        Match any character
    +-+-+-+-+-----+-+------- \+       multiple times (grep (greedy +, * *? etc))
      +-+-+-+-----+-+------- \W       Non-word-character
        +-+-+-----+-+------- v=       Literally match "v="
          +-+-----+-+------- \(       Start MATCH GROUP
            +-----+-+------- [^\&]*   Match any character BUT & - as many as possible
                  +-+------- \)       End MATCH GROUP
                    +------- .*       Match anything; *As many times as possible 
                                      - aka to end of line; as there is no 

         [abc]  would match a OR b OR c
         [abc]* would match a AND/OR b AND/OR c - as many times as possible
         [^abc] would match anything BUT a,b or c

/\1/     Replace ENTIRE match with MATCH GROUP number 1.
         That would be - everything between \( and \) - which his anything but "&"
         after the literal string "v=" - which in turn has a non word letter in 
         front of it.

         That also means that no match means no substitution which ultimately result in 
         no change.

Результат: qdRaf3-OEh4

Примечание: Если никакое соответствие вся строка не будет возвращено.

(G) AWK:

echo 'http://www.youtube.com/watch?v=qdRaf3-OEh4&playnext=1&list=PL4367CEDBC117AEC6&feature=results_main' \
| awk 'match($0,/(\Wv=)([^&]*)/,v){print v[2]}'

Результат: qdRaf3-OEh4

Объяснение:

В Awk match(string, regexp) функция, которая ищет самое длинное, крайнее левое, соответствие regexp в строке. Здесь я использовал расширение, которое идет с Простофилей. (см. Awk, Простофилю; MAwk и т.д.), который помещает отдельные соответствия - который является: что между круглой скобкой - в массиве соответствий.

Шаблон справедливо похож на Perl/Grep один ниже.


  +-------------------------------------- Built in function
  |    +--------------------------------- Entire input ($1 would have been filed 1)
  |    |                                  etc. (Using default delimiters " "*)
  |    |
  |    |
  |    |  (....)(....) ------------------ Places \Wv= in one group 1, and [^&]* group 2.
match($0, /(\Wv=)([^&]*)/, v){print v[2]}
                           |   |    | |
                           |   |    +-+---- Use "v" from /, v; v is a user defined name
                           |   |      +---- 2 specifies index in v, which is group from
                           |   |            what is between ()'s in /…/
                           |   |
                           |   +----------- Print is another built in function.
                           +--------------- Group name that one can use in print.

GREP (Используя совместимый с Perl):

echo 'http://www.youtube.com/watch?v=qdRaf3-OEh4&playnext=1&list=PL4367CEDBC117AEC6&feature=results_main' | \
grep -Po '(?<=\Wv=)[^&]*'

Результат: qdRaf3-OEh4

Объяснение:


-P  Use Perl compatible
-o  Only print match of the expression.
    - That means: Of our pattern only print/return what it matches.
    If nothing matches; return nothing.

          +------- ^    Negate math to - do not match (ONLY as it is FIRST between [])
          |+------ &    A literal "&" character
          || 
(?<=\Wv=)[^&]*
|   | |  |  ||
|   | |  |  |+---- *     Greedy; as many times as possible.
|   | |  +--+----- []    Wild order/any order of what is inside []
|   | +----------- v=    Literal v=
|   +------------- \W    Non Word character
+----------------- (?<=  What follows should be (mediately) preceded by.
                    ?=Huh, <=left, = =Equals to

So: Match literal "v=" where "v" is preceded by an non-word-character. Then match
anything; as many times as possible until we are at end of line or we meet an "&".

As you can't have "&" in an URL between key/value pairs this should be OK.

evilsoup · Answer 2 · 18 December 2012 в 23:58

echo 'http://www.youtube.com/watch?v=qdRaf3-OEh4&playnext=1&list=PL4367CEDBC117AEC6&feature=results_main' | sed -e 's/&.*//' -e 's/.*watch?//'

получит вас v=qdRaf3-OEh4.

Как я могу разобрать URL YouTube?

2 ответа

SED (2):

(G) AWK:

GREP (Используя совместимый с Perl):

Другие вопросы по тегам:

Похожие вопросы: